Need good workaround for simple double errors.

Pages: 12
closed account (Ezyq4iN6)
I understand accumulating errors in math where numbers can have infinite decimal places (ie. sin, cos), but for simple numbers that don't produce infinite decimal places, I find myself having to do extra programming to ensure I avoid needless decimal errors. Consider the following code fragment:

1
2
3
4
5
6
7
8
double someVal = 0.0;
double delta = 0.1;

while (something)
{
  std::cout << "The value is now: " << someVal << "\n";
  someVal += delta;
}


While the code will output the series like 0.1, 0.2, 0.3, ..., eventually it will start giving numbers like 0.800000007, 1.300000015, etc. Subtraction, especially when moving from positive to negative, seems to make it happen sooner. Right now I have been using the following solution:

1
2
3
4
5
6
7
8
9
int someVal = 0;
int delta = 1;
double tenth = 0.1;

while (something)
{
  std::cout << "The value is now: " << (double)someVal * tenth << "\n";
  someVal += delta;
}


As you can see, it isn't a lot of extra code, but I can't help but feel that this might be confusing to people reading it and might cause trouble later on for certain number combinations. Is there an easier (more readable/intuitive) way?
but for simple numbers that don't produce infinite decimal places
That assumption is wrong. Because as it turns out, 0.1 does have an infinite amount of decimal places when expressed in binary, which is what floating-point numbers use.
So there is no such thing as an exact value for 0.1 in floating-point numbers, only the closest representable floating-point number. But nevertheless, that's a separate issue from accumulation error.

Honestly I think the second bit of code is fine. But I think calling the variable name "delta" is misleading, because it's an integer, and not a very small value. Your variable called "tenth" is what you should actually be calling "delta".

I would prefer something that does the same thing, but looks like this:
1
2
3
4
5
6
    const double delta = 0.1;
    for (int i = -2; i < 5; i++)
    {
        double t = delta * i;
        std::cout << t << '\n';
    }
Last edited on
if your issue is with printing, check out std::setprecision

if your issue is with generating a sequence, you may calculate the number of iterations and use a for-loop
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def linspace(start, end, n):
    delta = (end-start)/(n-1)
    v = []

	#error prone
    value = start
    while value <= end:
        v.append(value)
        value += delta
    return v

	#may use instead
    v = []
    for K in range(n):
        v.append(start + delta*K)
    return v

use the full precision as long as you can and clean it up right before you print it for the user.
there are some tools for that, cout has formatting, and there are floor, ceil, round, and other such routines to help as well. If you work with math a lot, I suggest you write something that does what you need, a custom double print routine or something. The issue here is that everyone has different requirements for this kind of task, so there is not really a common tool to do it the way you want it done that works for everyone, and if there were it would be slow due to needing a ton of excess logic for all the options people would want.

Personally, I find that printf has a lot of merits for this.

remember: this is not an error. It is the nature of representing infinite values in finite space, which has consequences. Learn how to deal with it early and embrace it!
Last edited on
closed account (Ezyq4iN6)
@Ganado
Perhaps I should have worded my question differently. I realize that 0.1 has an infinite amount of decimal places in binary. My point is that when a number can be calculated to an exact value (in decimal form), it seems that floats and doubles end up causing trouble, and I was hoping there was a "proper" way or function to avoid those issues.

I know that adding 0.1 a million times will give me 100,000. However, when I run code that does so in a loop versus multiplying 1,000,000 by 0.1, then I get the following results:

Addition: 100000.000001333
Multiply: 100000

Even though the second result probably has error, it is MUCH smaller. And if we go up to 2 million, the first result will have added a lot more error, whereas the second one will still have pretty much the same amount of insignificant error.

I guess I figured that since programming is used for so much research where numeric precision is paramount, that surely there would be built-in functions that already addressed this issue.

And yes, my variable names aren't the best, but just what I picked for the example.


@ne555
Sorry, should have specified that it was more the math itself than the printing of the number. My example was simply to illustrate my current way I deal with mitigating as much error as possible.

@jonin
I agree, using full precision as long as possible is definitely a good approach. However, in my case I have low-precision decimals being wrecked by infinite precision binary. My point is that if a person calculated this stuff by hand, there would be a precise answer like 10.123. The computer though, would not be able to achieve this result using floats or doubles without somehow truncating the extra error. I just thought there might be a special function for dealing with such situations, as surely it would be beneficial to many, but as you said, I guess there is not.
64 bit integers are pretty big and stuff.
if you want 3 or 4 decimals on things with a total of 8 or so digits, why not do your own floating point:
10.123 = 10123
0.001 = 1
etc. use one more decimal place than you need, so you can round up if you want to do that?
you can look at how banking handles money (all in integers) as an example?

there isnt any magic fix for it, no. because, again, while people have similar needs, they are often different needs that can't be solved by a generic tool.
Last edited on
@opisop, what you need first is a bit of research on the IEEE 754 floating point format(s). There are copious articles, posts, blogs and papers on the subject.

To be terse, this is the way it is. They are not precise, and they have limited precision representation. There is no way around it.

A short "intro" into the concept begins with a reference to scientific notation, where a value is "normalized" into a single digit, followed by some useful number of digits past a decimal point, which is then shown, in the format, to be multiplied by a power of ten. Engineering notation is similar, but may choose 3 digits, then a decimal, then more digits up to a required precision.

Even scientists and engineers do not expect precision beyond that given required (say 4 to 6 decimal places in many engineering use cases). They expect the summation of thousands or millions would generate error.

The floating point format is a base 2 version of scientific notation. Since decimal digits don't precisely map to binary digits beyond the decimal, there is an incomplete encoding of that final "digit" relative to base ten.

The format is known to generate errors, even on common numbers a casual observer would expect to be easily stated. This is simply a side effect of trade offs engineered into the format representation.

There is nothing that can be done to avoid it if floating point formats are used. Absolutely, positively nothing.

Where absolute precision of a sum of ranges of numbers is required, the only possible solution is a version of integer math which is then shifted to form the required decimal precision, as @jonnin offered above.

This is a widely discussed, well known nuisance most students marvel over, even laugh or scoff at. Many are perplexed that it seems absurd.

That's why a study of the IEEE 754 format (and all of the controversies which arise from that discussion, including the name itself) is informative and useful.

I don't have exact numeric examples in memory, but one trivial example students discover is that a simple integer can't be represented, like 2.0. Again, maybe it's 1.0, or 3.0, I don't recall, but a float (not a double) assigned to one of these integers returns, say, 1.9999xxx (garbage) when assigned 2.0 (or whatever integer it is that demonstrates this). It seems inexplicable and frustrating. Many have thought it was a bug.

I wouldn't go so far as to call it a feature, but it is a consequence of the design.
Niccolo, everything you said is great, except I think you're mistaken on the last part. All small integers that I know of can be represented exactly in IEEE 754.
https://stackoverflow.com/questions/3793838/which-is-the-first-integer-that-an-ieee-754-float-is-incapable-of-representing-e

I especially like your analogy to scientific notation, because that's essentially what's happening here.

The common example I see used a lot where simple arithmetic fails for equality is the following,
where 0.1 + 0.2 != 0.3
1
2
3
4
5
6
7
8
9
#include <iostream>

int main()
{
    double a = 0.1;
    double b = 0.2;
    double c = a + b;
    std::cout << (c == 0.3) << '\n';
}

(Output is 0)
Last edited on
closed account (Ezyq4iN6)
@jonnin

Like the banks, I'll probably stick to integer math, as it seems to produce results more in line with what I need. I guess the reason it seems strange to me is that I'm so used to thinking of 123 as an integer and 1.23 as a double or float, and to change that in these situations feels confusing at first. I guess I'll stick to my workaround.


@Niccolo

I'll take a look at that format.

I'm fine that float and double are limited. I understand that they are the best option in most cases. It was just my own inexperience that had me questioning if there was an alternative for situations like mine. I'm not working with complex math, just very simple math, and was hoping that there existed a better workaround.

For now, I think I'll stick to integer math for all my addition and subtraction, as that will make equality testing easier (no need for epsilon).

I am curious about that integer that can't be represented as a float. I wonder why that is?
you can make a friendly output if it bothers you.
cout << x/1000 <<"." << x%1000; //123456 prints 123.456
put a ?: thing in there to round it if you want, wrap it in a function and call it good, or assume its been rounded by the time you want to print it, as your design dictates.
Last edited on
closed account (Ezyq4iN6)
@jonnin

That sounds like a good idea, thanks.
@opisop,

With apologies, I think I'm recalling a 16 bit floating point specific behavior.

The basic reason it happens, though, is the boundaries available between powers of 2 and the available bits in the mantissa, where there are "bumps" in the representation. These happen in various "positions" for the different formats.

Of late I've been working a lot with 16 bit floats in GPU code, so I'm a little overfocused on their side effects.
closed account (Ezyq4iN6)
@Niccolo

That's very interesting, and something I never knew could cause a problem. Is there a place that lists these positions? Also, do you need code to avoid using those numbers, or can you just accept the small amount of error it adds?
What Every Computer Scientist Should Know About Floating-Point Arithmetic
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

The trick is to design your mathematical operations to calculate values as directly as possible. Any repetitive accumulation or small epsilon can prove problematic.

Part of the thing to realize is that FP is a discrete sampling over the real numbers, with a piece-meal logarithmic domain. (‘Normalized’ floats are more dense around zero.)

Hence, any FP code that functions like your examples (either version) is incorrect. Don’t sum. Instead, compute a final value using an appropriate equation as-needed.

Goldberg’s paper addresses all these issues. And none of them are going away any time soon. If you need more precision than standard (processor supported) FP types can provide, use a bignum. Boost Bignum provides a nice frontend for a number of bignum engines, including floating-point bignums.
@PhysicsIsFun asks about a similar issue in this lengthy thread:
http://www.cplusplus.com/forum/beginner/250976/

If accumulated error is becoming significant, search for numerically stable algorithms that reduce it.

mbozzi wrote:
Why do values on the order of 10^-12 or smaller have any substantial effects on the result of your algorithm? Do you really need your quantities to span fifteen orders of magnitude?

...

Choose an algorithm that is more tolerant of [floating-point errors.] If this is possible, that would be the preferred solution, before looking for more precise math.

For example, sums can be made more accurate with William Kahan's algorithm:
https://en.wikipedia.org/wiki/Kahan_summation_algorithm
Last edited on
closed account (Ezyq4iN6)
@Duthomhas

Thank you for the link. I'm surprised how much information goes into this issue.

Speaking of designing a math operation... how would you design one where you are constantly taking input? For example, moving a character 0.1 pixels a frame depending on what key is pressed. Eventually, I would imagine that the error build-up would cause some issues, albeit minor ones.

Can you elaborate on why my code is wrong and what you would do to fix it? Since I am trying to deal with a large amount of additions from user input, I'm not sure how I could find a final value, as the number is always being changed at the user's will.

I'll look into bignum for possible future use.

@mbozzi

Thanks, I'll take a look over that thread. I really don't know how much of an effect small errors will have in the long run, but I always want to be prepared, because a few of the people I have had classes with have said that such things can cause undesirable results when programming for real world situations.

That algorithm looks helpful. I wonder if I should start using that on all of my floats each time my program changes any of them. I have used epsilon values to test for equality within a range, but sometimes even that gets off.
bignum libraries are going to be slower. The CPU/FPU cannot handle them native, so you are doing everything in pieces from a container, and it just takes longer. They are awesome for what they do, but be aware of the performance hit. It is significant if you have a LOT of numbers.

real world problems require the real world :) What I mean is that its very important in some fields (like banking, again: no one wants the superman effect where your bank loses or creates millions of dollars with roundoff problems over time). It is not important at all in other fields where an approximate answer is good enough, like the average age of everyone in India, or something... oh noes, it said 42.7 an it it was really 41.973. Do you care? Correcting for the issue is almost always going to incur a cost (performance, development time, stuff). Leave it be if you don't *really* need the precision.

Respect your peers' experience and opinions, and that of ppl on the internet. But think for yourself... don't rely on 'someone said so and so' for anything until you understand the full picture. A great many professors don't actually have real world experience, or it was 25+ years ago, etc. One of my professors was always telling us to do things a certain weird way and it turned out he last coded on punch cards.
Last edited on
closed account (Ezyq4iN6)
@jonnin

I'll keep that in mind about the bignum library. I may one day need to calculate that something precise. I don't think I have enough numbers right now to cause a slowdown, but then again when using graphics and other things every time interval, who knows?

I guess my problem is much closer to the Indian average age in that a slight error probably won't be the end of the world.
Speaking of designing a math operation... how would you design one where you are constantly taking input? For example, moving a character 0.1 pixels a frame depending on what key is pressed. Eventually, I would imagine that the error build-up would cause some issues, albeit minor ones.

Sure. Incremental stuff stinks.

Keep track of an anchor point, and compute all differences as some function of how many times the key is pressed relative to a specific distance.

Get it? Rather than try to add zillions of small quantities together, compute the final value in one step. The FPU gets about the same workout, so you are not costing yourself anything but gaining a more accurate result.
I have used epsilon values to test for equality within a range, but sometimes even that gets off.

One problem is that rounding error is a relative measure - that is, it typically increases with the magnitude of the value. As a result, attempts to use an absolute epsilon are typically problematic.

To illustrate, double precision floating point numbers in [2^52, 2^53) can only represent integers. And numbers in [2^53, 2^54) can only represent even integers. It doesn't make any sense to test whether values of those magnitudes are within a tenth of each other, for instance: the epsilon needs to be big enough for the values involved.

Typically, for normal numbers, this means that the epsilon needs to be a relative value expressed in terms of units in the last place, or ULPs. Goldberg's paper explains all this.
Last edited on
Pages: 12