### The kind of bugs I deal with

Pages: 12
Howdy everybody,

As I've been contributing on this website since 2008 I thought I'd post the kind of bugs I work with.

My work is primarily in high performance multi-platform scientific development. I design and develop population/abundance/biomass models.

With our preference movement method each cell in the model NxM has a preference of it's population moving to N`xM`. In this particular model the number of preference calculations was 300million+ and I decided to cache them if the model could determine that all of the supporting values would be static for the iteration.

After caching it I had a minor change in the results coming from the model. So I put a bunch of checks in to compare the real-time values to the cached values with an acceptable difference of 1e-12. All >300million+ checks passed with no issues, but still the results were different... strange.

So I put in a straight double == double comparison and got about 1,000 values from the 300mil+. Not too bad but the printed values were identical so I knew the variance was tiny (less than 1e-12). Unfortunately even with this tiny variance the model output changed quite a bit.

After generating txt debug files that were ~500MBs that were not comparable and some around 50MBs each that had ~5-10 values that would be different between them I was very confused.

Looking through my calculations line by line they were perfect, so after sleeping for the night and a cup of coffee I figured it out, some quick code changes to verify and solved.

The problem.
 ``123456789101112131415161718192021222324252627282930313233343536373839`` ``````// Bad Code void buildCache() { vector> runningTotals; for (int i = 0; i < world_height; ++i) { for (int j = 0; j < world_width; ++j) { runningTotal = 0.0; for (int k = 0; k < cell_height; ++k) { for (int l = 0; l < cell_width; ++l) { cache[i][j][k][l] = cell_value; runningTotal += cell_value; } } runningTotals[i][j] = runningTotal; // This value was off by <1e-12 } } } // Good Code void buildCache() { vector> runningTotals; for (int i = world_height - 1; i >= 0; --i) { for (int j = world_width - 1; j >= 0; --j) { runningTotal = 0.0; for (int k = cell_height - 1; k >= 0; --k) { for (int l = cell_width - 1; l >= 0; --l) { vCache[i][j][k][l] = cell_value; running_total += cell_value; } } runningTotals[i][j] = runningTotal; // This value was off by <1e-12 } } }``````

Now, considering that
- The code works perfectly
- This code is only illustrative, but it does show the problem
- Everything works, so it's not syntax (e.g I haven't missed an allocation)

Who thinks they know what the cause was? I'll give you guys a chance to figure out what it was before I post the actual cause.

FYI: Using the code I've posted you should be able to replicate the issue.
Last edited on
 FYI: Using the code I've posted you should be able to replicate the issue.

Since the code you posted is trying to access nonexistent elements of runningTotals, that doesn't seem likely.
 - This code is only illustrative, but it does show the problem

Please read my points afterwards. The project is something like 20,000 LOC so posting sections of it is not feasible.
Last edited on
Is cell_value changes somewere in code, you didn't show, or is it constant in the loops?

Only thing I could think about it now is that adding doubles with large difference in exponent can lead in loss of precision (If you add tiny number to huge, tiny one could even be less than epsilon for huge, try to add billion of them and it still won't change, add tiny ones together before and result could be even larger than huge number)
@MiiNiPaa you pretty much nailed it. The order that the doubles are added together in the loops is different. So the result is doing to be different.

The best accepted way to handle it is to order all of the values from smallest to largest then sum them that way. Unfortunately for us the overhead of that is too great at this point :)
Could you keep two running totals? One for small values one for large then add them together before putting them in the runningtotals array?
How does the order matter? Addition is commutative.
chrisname Huge floating point value can have epsilon (distance between two representable values) larger than tiny value itself. So HUGE + TINY == HUGE.
But if you add two tiny va;ues before, their sum can exceed huges epsilon. In other words:
(HUGE + TINY) + TINY -> HUGE + TINY == HUGE
HUGE + (TINY + TINY) -> HUGE + SMALL > HUGE

welcome to the floating point hell

Edit: https://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
 While floating-point addition and multiplication are both commutative (a + b = b + a and a×b = b×a), they are not necessarily associative. That is, (a + b) + c is not necessarily equal to a + (b + c).
There is an example to it.
Last edited on
Oh, I didn't know that. I almost never need to use floats, and when I do, it's with small values without a need of much precision.
EDIT: People posted while I was typing, lol.

@chrisname: larger floating point numbers have bigger 'jumps' between each possible value. Adding small numbers to them will not actually change their value because it is not enough to make the jump. If you add all the small numbers first, they become larger and can therefore make the jump for the bigger numbers.

Floating point math sucks :p
Last edited on
 The order that the doubles are added together in the loops is different. So the result is doing to be different.

Unfortunately, that wasn't illustrated by your illustrative code, since all the values were explicitly set to the same value. runningTotal would've been equally wrong with both versions of the code.

 How does the order matter? Addition is commutative.

It has more to do with precision and which digits are the most significant than it does with the qualities of addition.

When you're adding very small values to very large ones, the precision of the types may mean that the very small value has no effect on the larger one. Adding the smaller values first ensures that the cumulative effect of the small values has a chance to impact the larger value.

I would think Lachlan Easton's proposed solution would work reasonably well without the overhead of a sort.

[Edit: Holy ninjas, Batman!]
Last edited on
Addition is associative, but in computer science an add operation may involve additional operations which are not associative, such as range-limiting and rounding. For floating-point numbers, a lot of operations (including addition of a large and a small number) have results which can not be accurately represented so another operation is inserted to get a number which can be represented.

For example, if we have six digits of precision:

A) 100000. + 123.456 - 1000000.
= 100123. - 100000 (rounding)
= 123. (subtraction)
= 123.000 (rounding)

B) 100000. - 100000. + 123.456
= 000.000 + 123.456 (rounding)
= 123.456 (rounding)

Internally, most floating-point implementations would treat 100000. + 123.456 as 100000.0 + 000123.x where x is an indication that our right operand is really a value that is greater than 000123.0 but less than 000123.5 (x is typically 3 bits called the round, guard, and sticky bits).

For extra fun, there are several processors out there with a multiply-and-add instruction which gives different results from using seperate multiply and add instructions because the multiply-and-add instruction doesn't perform the intermediate rounding step between the multiply and add.
Last edited on
What about arbitrary-precision arithmetic libraries like GMP?
If sorting primitive types is too slow, I hardly think arbitrary-precision is acceptable. But I am sure it would be fully precise.
I realise it'd be too slow, but do the same errors apply to it?
If the implementation is actual arbitrary precision, I'd guess no. It would automatically adjust to be as precise as needed. It certainly works that way with Java's built-in arbitrary precision classes (java.math.*)
> The best accepted way to handle it is to order all of the values from smallest to largest then sum them that way.

Ordering the values on magnitude would not solve the problem of non-associativity of floating point operations. Compensated summation can minimize the error. (Because the IS prohibits optimizers from reordering floating-point expressions base on pure mathematical associativity rules.)

 ``123456789101112131415161718192021222324252627282930313233`` ``````#include float naive_sum( const float* arr, int n ) { float sum = n ? arr[0] : 0.0 ; for( int i = 1 ; i < n ; ++i ) sum += arr[i] ; return sum; } float kahan_sum( const float* arr, int n ) { double sum = n ? arr[0] : 0.0 ; double correction = 0.0 ; for( int i = 1 ; i < n ; ++i ) { double corrected_val = arr[i] - correction ; double new_sum = sum + corrected_val ; correction = (new_sum - sum) - corrected_val ; sum = new_sum; } return sum; } int main() { const float arr[5] = { -1e10, -11.6, -10.0, +10.0, +1e10 } ; std::cout << "naive summation: " << naive_sum( arr, 5 ) << '\n' << "compensated summation: " << kahan_sum( arr, 5 ) << '\n' ; }``````

http://ideone.com/Hm9vqW

Compensated summation is quite a bit more expensive than naive summation; so you might want to use pairwise summation instead (almost as fast as naive summation, but has a higher round-off error than compensated summation).

> If the implementation is actual arbitrary precision, I'd guess no. It would automatically adjust to be as precise as needed.

Yes. But it would probably run out of memory quite early, and even if it did't one would grow old while waiting for 300million+ calculations to be completed.

Ps. Why is this programming topic in the lounge? Though I see that several programmers have posted (LB, cire, MiiNiPaa), many other programmers would miss it altogether.
Last edited on
It's in the lounge because it's more general than just C++ - many languages have caveats with floating point types and the concept is not restricted to programming (think sig-figs).

Also, I think some exotic bignum libraries keep everything in exact form until they are required to approximate. A good example is the TI-89, which can perform all calculations in exact form and let you choose when to approximate a floating point number.

JLBorges wrote:
 ``12345678`` ``````float naive_sum( const float* arr, int n ) { float sum = n ? arr[0] : 0.0 ; for( int i = 1 ; i < n ; ++i ) sum += arr[i] ; return sum; }``````
Why do you pre-include the first number? Is it actually more efficient to check and start with the number than to start with 0.0 and add? Now that I think about it, it makes sense that a binary branch would be much faster than adding to 0.0
 The best accepted way to handle it is to order all of the values from smallest to largest then sum them that way. Unfortunately for us the overhead of that is too great at this point :)

*Note: I never work in float, only in double because float doesn't offer enough precision*

My background for this statement is actual scientific implementation in models. By starting with the smallest values and summing up you'll end up with the lowest possible variance in value by summing them. The fact you have variance is widely accepted, so you just try to minimise it if it becomes a problem.

The 300 million calculations I gave resulted from only 1 of the processes doing 214 years worth of work. This model has ~6 processes and needs to do 1 million iterations (1 million x 214). The amount of calculations is HUGE.
 Ps. Why is this programming topic in the lounge? Though I see that several programmers have posted (LB, cire, MiiNiPaa), many other programmers would miss it altogether.

It's in the lounge because I'm not looking for help. Many developers know that floating point precision is inaccurate but few people actually get to see the effects of it in real life. 99% of applications developed these days would function perfectly well with the inaccuracies.

Only a few specific fields actually rely on such high levels of precision (the majority of which are scientific or high frequency trading).
Pages: 12