double to int conversion

1
2
3
    auto maximum = std::numeric_limits<uint64_t>::max() & ~uint64_t(0xFFF);
    double d = maximum;
    std::cout << (maximum - c);  


I would expect this to have a small integral result, but uint64_t(d) gives zero. What is the rule for this?
Last edited on
No it doesn't. What are you talking about? Your question is senseless. What in the world is c? Where is the uint64_t(d) in your code? What in the world are you talking about? What is the point of this?

BTW, intelligent people post complete programs.
double to int is really simple. It drops the decimal. 3.14 is 3 and so is 3.99999999

this can be a problem if you have integer values stored (or converted to) doubles and back, as 5 can be converted to 4.999999999999999999999999999999 and then back to 4 (!!!).

So the default truncate is not usually what you want, you usually want to invoke an extra routine like round, floor, ceil, etc.

5 can be converted to 4.999999999999999999999999999999 and then back to 4

No it can't.
As long as the int doesn't have too many digits to fit into the precision of the double it won't be changed at all.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <iostream>
#include <iomanip>

int main() {
    uint64_t u = 0x1FFFFFFFFFFFFF;  // 53 bits (max precision of double)
    double d = u;
    uint64_t u2 = d;
    std::cout << std::hex << u2 << '\n'; // prints 1fffffffffffff

    u = 0x3FFFFFFFFFFFFF; // 54 bits (1 too many)
    d = u;
    u2 = d;
    std::cout << std::hex << u2 << '\n'; // prints 40000000000000
}

Last edited on
yes, the 5 example illustrates what happens visually only, it clearly does not happen with small values like 5. It also happens when you mix types in math, though. It isnt hard to find a value that won't give x back for integer x, y = (int)(sqrt(x)*sqrt(x)); the check to find that y == x is not true. Happens on something like 50% of the numbers where sqrt(x) is irrational...

1
2
3
4
5
6
7
8
9
int main()
{
    
    int x = 13;
    int y = (int)(sqrt(x)*sqrt(x));
     cout << x << endl << y << endl;
    return 0;
}
prints 12 and 13...



Last edited on
it clearly does not happen with small values like 5.

Right, but it isn't really about "small values", is it. It's about fractional values and the difference between decimal and binary notation. E.g., 0.1 in decimal cannot be exactly stored in binary. That's the main point.
I didn't think so :)
The main point is that converting double to int is truncated unless you use a function to do it differently, as best as I see the question (which is not well worded, so I could be wrong). The rest of it is examples of how using the default truncation is probably a bad idea for various reasons. But sure, the issue always comes back to "some values can't be represented exactly".
I obviously didn't mean the main point of the question (which is indecipherable and therefore has no main point), but the main point about doubles not holding exact values.

You clearly were under the misapprehension that a value like 5 could be turned into 4.99999 or whatever. If you thought it was only about large values you would have mentioned it and it would have had nothing to do with any fractional part.

When I pointed out your mistake you were embarassed and started lying. Although I am happy to teach you the facts, I am not happy to have you lie about it. :-)

(Don't forget to "report" this post!)
You need to find less aggressive ways of getting your points across, man.
I agree. Beyond the pale. Sorry about that. Won't happen again.
Last edited on
@elossha

My first question is also why you would want to do this?

It looks to me as though you are trying to mask off the exponent to leave the mantissa as an integer. However I am sure that there are extra optimizations on the way a double is stored: One of them is that the mantissa and exponent are adjusted so that first binary digit of the mantissa is a 1, then that 1 is implied.

The thing about representing integers as a double, mentioned by others:

wiki wrote:
Any integer with absolute value less than 224 can be exactly represented in the single precision format, and any integer with absolute value less than 253 can be exactly represented in the double precision format. Furthermore, a wide range of powers of 2 times such a number can be represented. These properties are sometimes used for purely integer data, to get 53-bit integers on platforms that have double precision floats but only 32-bit integers.

https://en.wikipedia.org/wiki/Floating-point_arithmetic#IEEE_754:_floating_point_in_modern_computers

Also worth mentioning are the decimal FP, these can represent a real value exactly. On Linux, one can typically use them with g++ by including "decimal/decimal." These are not yet part of the C++ standard though.

I hope this helps a little :+)
Last edited on
Here is the complete example of what I meant:

https://wandbox.org/permlink/brsDkmftRK2VWlB1

Basically I get that the fraction/mantissa is 54 bits, but was wondering why:

1. The double rounds up, instead of flooring to the nearest value it can accurately represent
2. Converting that double back to an integer gives 0.

It seems the answer is here:

https://www.felixcloutier.com/x86/CVTSI2SD.html

"When conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register."

So by default (and this must be in the standard somewhere), rounding isn't floored but is to the nearest value. When converting back, I am not sure what happens but the standard must just say that a failed conversion meant set the int64 to zero.
Last edited on
There is this, not sure if it will make any difference:

https://en.cppreference.com/w/cpp/numeric/fenv/feround

There are a bunch of other links in there, maybe std::nearbyint ?

https://en.cppreference.com/w/cpp/numeric/math/nearbyint

And again the decimal FP , can you use those?

https://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a00460.html
Topic archived. No new replies allowed.