Floating point underflow

I have written my code in C, hope that's ok.

The purpose of this code is to simulate floating point underflow.

My understanding of FLT_MIN is that it is a positive number that has the smallest possible exponent and also the smallest value that still uses all the bits available to represent the mantissa. If I then divide this number by 100, it should shift the digits in the mantissa. Instead it reduces the exponent by 2. I would have thought it impossible to have a float number with an exponent of -40, yet you can see by the result that this is what has happened.

1
2
3
4
5
6
7
8
9
10
11
12
#include <stdio.h>
#include <float.h>

int main(void)
{
	printf("\nFloating-point underflow:\n");
	float x = FLT_MIN;
	float y = x / 100;
	printf("%e %e\n", x, y);
	
	return 0;
}


result:
1
2
Floating-point underflow:
1.175494e-38 1.175493e-40
FLT_MIN is (according to http://en.cppreference.com/w/c/types/limits ) the minimum normalized positive value of float.

If you think about the normalized representation for a bit, you'll notice that it's actually impossible to represent zero. That implies there's another representation used for representing zeros and the numbers around 0. Such values are called denormal (or subnormal, per IEEE754) numbers.
https://en.wikipedia.org/wiki/Denormal_number

You will begin to lose precision as the denormal value gets smaller.

Maybe a better starting place for your experiment would be FLT_TRUE_MIN.
Last edited on
The best I can say is that you're risking the loss of precision if underflow occurs. Sometimes the computer is able to deal with underflow in a corrective manner with simple operations (like dividing then multiplying once by the same number), but with added complexity it will eventually fail to get the exact precision.

Underflow is also undefined behavior, it might be left to the operating system or even hardware to decide what to do, so some computers might handle more complex operations than others when dealing with underflow. Generally, when underflow is hit on a double it is so small that it doesn't matter even to NASA, but it has to be kept in mind that it does mean that a number multiplied by pi may not pass an "equality test" and a similarity test may have to be devised.

(I quoted out "equality test" above since the default float/double equality test in c and c++ is actually just a similarity test, the similarity test you'd have to devise would be one that tests at a less-precise precision-point than the default one.)


Wikipedia:
Handling of underflow

The occurrence of an underflow may set a ('sticky') status bit, raise an exception, at the hardware level generate an interrupt, or may cause some combination of these effects.

As specified in IEEE 754 the underflow condition is only signaled if there is also a loss of precision. Typically this is determined as the final result being inexact. However if the user is trapping on underflow, this may happen regardless of consideration for loss of precision. The default handling in IEEE 754 for underflow (as well as other exceptions) is to record as a floating point status that underflow has occurred. This is specified for the application programming level, but often also interpreted as how to handle it at the hardware level.



EDIT : I deleted my previous code because it didn't handle the reconstitution correctly (I didn't realize I was dealing with "n factorial" ). Here's the updated example;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <float.h>
#include <limits>
#include <iomanip>
#include <iostream>
#include <math.h>

using namespace std;

int main(void)
{
	cout << "Pi = " << M_PI << endl;
	cout.precision( (numeric_limits<double>::max_digits10));

	double i = M_PI;
	double divisor = 1.0;
	double count = 1.0; 
	while(count < 30.0)
	{

		if(i * divisor == M_PI)
		{
			cout << i << " ) " << i * divisor << " still equals Pi" << endl;
		}
		else
		{
			cout << i << " ) " << i * divisor << " is not Pi" << endl;

		}
		count ++;
		divisor *= count;
		i /= count;
	}
	
	return 0;
}


Output on my computer:
Pi = 3.14159
3.1415926535897931 ) 3.1415926535897931 still equals Pi
1.5707963267948966 ) 3.1415926535897931 still equals Pi
0.52359877559829882 ) 3.1415926535897931 still equals Pi
0.1308996938995747 ) 3.1415926535897931 still equals Pi
0.026179938779914941 ) 3.1415926535897931 still equals Pi
0.0043633231299858239 ) 3.1415926535897931 still equals Pi
0.0006233318757122606 ) 3.1415926535897936 is not Pi
7.7916484464032575e-05 ) 3.1415926535897936 is not Pi
8.6573871626702861e-06 ) 3.1415926535897936 is not Pi
8.6573871626702859e-07 ) 3.1415926535897936 is not Pi
7.8703519660638957e-08 ) 3.1415926535897931 still equals Pi
6.5586266383865797e-09 ) 3.1415926535897931 still equals Pi
5.0450974141435232e-10 ) 3.1415926535897931 still equals Pi
3.6036410101025166e-11 ) 3.1415926535897931 still equals Pi
2.4024273400683445e-12 ) 3.1415926535897936 is not Pi
1.5015170875427153e-13 ) 3.1415926535897936 is not Pi
8.8324534561336195e-15 ) 3.1415926535897936 is not Pi
4.9069185867408994e-16 ) 3.1415926535897931 still equals Pi
2.5825887298636311e-17 ) 3.1415926535897931 still equals Pi
1.2912943649318155e-18 ) 3.1415926535897931 still equals Pi
6.1490207853895981e-20 ) 3.1415926535897931 still equals Pi
2.7950094479043629e-21 ) 3.1415926535897931 still equals Pi
1.2152214990888533e-22 ) 3.1415926535897931 still equals Pi
5.0634229128702223e-24 ) 3.1415926535897931 still equals Pi
2.0253691651480889e-25 ) 3.1415926535897931 still equals Pi
7.7898814044157269e-27 ) 3.1415926535897931 still equals Pi
2.8851412608947135e-28 ) 3.1415926535897931 still equals Pi
1.0304075931766834e-29 ) 3.1415926535897931 still equals Pi
3.553129631643736e-31 ) 3.1415926535897927 is not Pi


In this case loss of precision shows up as a rounding error...
Last edited on
hmmm... according to the book I am working on, FLT_MIN is defined as the smallest normal float, which I suppose is the same as saying the minimum normalized positive value of float.

However, my book (C Primer Plus 6th ed by Stephen Prata) also says, and I quote: So dividing the smallest positive normal floating-point value by 2 results in a subnormal value.

This all seems to imply that if I take FLT_MIN and divide by 100, I should get a subnormal value, which Prata defines as a value that has lost the full precision of the type. Which is not what happened.

As far as I can tell, this all means that either my source is wrong or your source is wrong.

p.s. this post is in response to mbozzi
Last edited on
Time to check the horse's mouth, I suppose:

The C standard (late draft) 5.2.4.2.2, paragraph 13 confirms that FLT_MIN is the
minimum normalized positive floating-point number
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf

IEEE754:2008, 2.1.51 defines
subnormal number: In a particular format, a non-zero floating-point number with magnitude less than the magnitude of that format’s smallest normal number. A subnormal number does not use the full precision available to normal numbers of the same format.

(No link, sorry).

Your assumption is that precision isn't lost by the division, but sub-normal numbers always have less than p significant digits (where p is the size of the mantissa, 24 for binary32, 53 for binary64, and so on).

Wikipedia's wording is a little bit misleading: you're not assuming that a floating point exception is thrown immediately upon loss of precision, are you?
IEEE754 7.5 indicates that the underflow exception is thrown when the implementation decides:
The underflow exception shall be signaled when a tiny non-zero result is detected. ... The implementer shall choose how tininess is detected ...
Last edited on
I can see that I have some research to do. I will return to this thread in due time. In the mean time I tried your suggestion of of using FLT_TRUE_MIN with the following results

1
2
3
4
5
6
7
8
9
10
11
12
#include <stdio.h>
#include <float.h>

int main(void)
{
	printf("\nFloating-point underflow:\n");
	float x = FLT_TRUE_MIN;
	float y = x / 2;
	printf("%e %e\n", x, y);
	
	return 0;
}


result:
1
2
Floating-point underflow:
1.401298e-45 0.000000e+00


so after dividing FLT_TRUE_MIN by 2 I get 0 (zero) which is not what I have in mind when I think of subnormal values

regards
so after dividing FLT_TRUE_MIN by 2 I get 0 (zero) which is not what I have in mind when I think of subnormal values

What result did you expect?
Topic archived. No new replies allowed.