Problem converting double to float.


I do not understand why do I get 20120312 in v2 value. Thanks!

    double v1 = 20120313.0;
    float v2 = (float) v1;
float is less precise so you get small errors.
It errors by one, it's not small. I still do not understand how this could happened - I can see the value of v1 = 20120313 in debugger, it's a whole number with no decimals. Why float conversion rounding it down?
floating point numbers cannot store all values exactly. A float has about 6-7 digits precision while double has 15-16 digits precision.
Some numbers can be stored exactly in float such as 0.5 but some numbers cannot. So if I do:

    double v3 = 0.5;        // Can be represented exactly.
    float v4 = (float) v3;

    double v5 = 0.4;        // Cannot be represented exactly.
    float v6 = (float) v5;

The v3 and v4 values are both 0.5 exactly - that's what debugger shows.

On the other hand v5 is 3.99...7 and v6 is 4.0000006.

Back to v1 and v2, I see v1 in debugger is 20120313 exactly, no decimal numbers, so I assume it can be represented exactly - so it should be represented exactly both in double and float.

Am I missing something?
A floating point number consist of an exponent and a fraction part. 1.frac * 2^exp
1.19926411*2^24 == 20120313
The fractional part is actually stored in memory as binary 001100110000001011111001000001. This is 30 binary digits! double has a 52 bits fractional part so there is no problem.

The float on the other hand is smaller and has a 23 bits fractional part. We can't store a 30 bit number in only 23 bits so we have to make approximations. It turns out that 1.19926405*2^24 == 20120312 is the the closest we can get. The fractional part here is 001100110000001011111 (21 bits).
Last edited on
Thank you for the effort doing all the calculations!
Topic archived. No new replies allowed.