How does a computer store a floating point number?

Hello, for my college homework I have been asked to research how a floating point number is stored inside the memory. We have learned how a single whole number is stored like a integer number is stored...

Now I'm not asking you to do my homework, just want to know if I'm on the right track with what I have found or if there is anything you could add to help me...

What I have found:

I have come across one website that talks about decimal point numbers or floating numbers are stored in the exponential form. Like 0.0012345 is stored as 0.12345×102. But that doesn't to me say how these numbers are stored in binary form like a integer number. I also found a website that talked about IEEE 745-1985 standard. Which included a signed bit for negative values, then a bias and fraction thing.

Is this the correct information I am finding or is there something else, these websites don't really explain it that well for a beginner.

If you can help thanks :)
Check this out:
https://en.wikipedia.org/wiki/IEEE_754-1985
There is a pretty good example here.

Let's take another example:
-0.00527 in decimal can also be represented as -1 * 527 * 10-3.
Now we have some interesting parts here:

-1 is the sign. It's always either -1 or 1, so this takes 1 bit to represent.
527 is the number itself. We could choose to represent it with a finite number of bits.
-3 is an exponent. We could also choose to represent it with a specified number of bits.

The 10 is an arbitrary base which humans use to make things easy to read. 109 is easier to read than 1000000000. Floating point numbers use base-2. Therefore we could re-write our question.

Let's take -1010000b.

We have a negative number, so let's set the sign bit to 1.
We have 101b as our number.
We have 4 trailing 0s, which is 100b for an exponent.

If we defined our own convention, we might choose to make an 8 bit number. The first bit is the sign, then three bits for the exponent, then four bits for the number. Packing all of this togeather, we would have:
1 100 0101b or 11000101b.

The IEEE-745-1985 defines how many bits are used for the exponent and how many are used for the number.

1 bit for the sign, 8 bits for the exponent, 23 bits for the number.
Last edited on
That's kind of a shaky definition. They sourced the correct RFC but maybe you were just to sparse on retelling it? Do you still have the link?

Anyway I'd say to check out section 4.2.2 and let us know if you need anything cleared up: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf

I don't know if such an assignment can be made to require any specific format. Common issues among most floating-point formats:

(1) mantissa is stored without the leading '1' bit.
(2) exponent is biased in some way, typically a power of two
(3) "normalized" form is for large numbers. Numbers close to zero can have a special form (which is a lot like just storing an unsigned integer).
(4) Things like -0.0 are possible
(5) It is possible to store things that cannot be represented as a number. Typically some subset of these are chosen to represent NaN (not a number -- used for computing invalid or unreachable values) and +INF and -INF.
(6) Storage method may vary significantly from this.

Floating point numbers are notoriously difficult to convert to text from binary representation (if you're interested, google around the dragon algorithm, IIRC). But it is essential because the safest transport for floating point numbers is in textual form.

Hope this helps.
Thanks for the input, all of you really helped. From what you have all said and given me, I should be able to put together some work for college by myself now. Cheers.
Topic archived. No new replies allowed.