When using g++ on Windows, how may I disable floating point?

Pages: 12
-What are the compiler switches on g++ to disable
floating point overflow and underflow on float and double in c++?

-Is there a source code way to disable floating point overflow and underflow on float and double in c++? Is there a way to do this for one class, and every operation and variable called by your code thereafter?

Could someone please reply specifically to me, please? What are the solutions
that won't require some non-default library?
I don't know what exactly you mean by 'disable' but this might helpt to detect it:

http://www.cplusplus.com/reference/limits/numeric_limits/
closed account (E0p9LyTq)
There is the floating point environment header (fenv.h/cfenv).

http://en.cppreference.com/w/cpp/numeric/fenv
double a = 0.1D;

double b = 0.1D;

double x = a*b;

cout << x << endl;


The result of this should be exactly 0.01, however due to floating point it isn't.
In order to enforce accuracy mode, and have these lines of code exactly produce 0.01,
I can either introduce a compiler switch or a # statement at the top of the class.


-If I am using 64 bit g++ compiling on windows, what are the compiler switches
that will set my floating point mathematics to the intuitively expected result,
with floating point underflow and overflow eliminated?

-Is it possible to do the same thing without switches? How can that be acheived
inside the source code?

-Can someone reply to me with exact, specific examples please?
-Is there anyone who checks this forum able to give me an answer in terms of g++ on windows please?

-What are the compiler switches that do this?

-What are the # commands in c++ source code which do this?
I can either introduce a compiler switch or a # statement at the top of the class.

What compiler, switch, and statement?
The compiler is GNU G++. The switch, or # statement or otherwise, I do not yet quite know. What could one do?
Negative powers of ten are not exactly representable by float or double, under any circumstances. There exists no sum of finitely-many powers of two that gives a negative power of ten.

If you need exact representation of values such as 0.01 you should use a decimal floating point type or a different fractional representation, such as rationals.
Last edited on
You can't do it (as has already been pointed out).

You are probably aware that many numbers can not be written down in base 10 in a finite number of terms. e.g
1/3 = 0.333... = 0.{3}...
1/27 = 0.037037037 ... = 0.{037}...
with the curly brackets delimiting the repeating segment. (Irrationals, like sqrt(2) and pi don't even repeat).

Similarly, most numbers can not be written down in the computer's preferred base, 2, in a finite number of terms. You can, however, work out their binary representation by repeatedly multiplying by 2 and taking the integer part as the next binary digit. Thus:
Decimal      Binary
0.01
0.02         0.0
0.04         0.00
0.08         0.000
0.16         0.0000
0.32         0.00000
0.64         0.000000
1.28(0.28)   0.0000001
0.56         0.00000010
1.12(0.12)   0.000000101
0.24         0.0000001010
0.48         0.00000010100
0.96         0.000000101000
1.92(0.92)   0.0000001010001
1.84(0.84)   0.00000010100011
1.68(0.68)   0.000000101000111
1.36(0.36)   0.0000001010001111
0.72         0.00000010100011110
1.44(0.44)   0.000000101000111101
0.88         0.0000001010001111010
1.76(0.76)   0.00000010100011110101
1.52(0.52)   0.000000101000111101011
1.04(0.04)   0.0000001010001111010111
0.08         0.00000010100011110101110       should repeat from here
                 ^                   ^                        .
                 |                   |                        .
            start of repeat      next repeat                  .

Somebody please check my maths!
Give or take my probable maths errors, 0.01 in binary is
0.00{00001010001111010111}....
with the bit in curly braces REPEATING.

Therefore, the computer cannot give you an exact answer.


So you could:
(1) Write your own "base 10" class, defining all the relevant algebraic operators;
(2) Write your own "fraction" class, in which {1;10} * {1;10} gives {1;100}
(3) Live with it, and write cout << setprecision(2) when required.

I'd be inclined to do (3).


Last edited on
The compiler is GNU G++. The switch, or # statement or otherwise, I do not yet quite know. What could one do?

Sorry. I did thought that you did imply that you have a compiler on Windows that is not GCC and that does have "switch or statement", which miraculously generate the non-C++ exact floats.
So, what do these switches do to the example source code?

-mpc32, -mpc64, -mfpmath=sse, or -msse2


double a = 0.1d;

double b = 0.1d;

double x = a*b;

std::cout << x << std::endl;


-Are there any others compiler switches?

-Are there any source code statements I can use, withouth any special library?


Last edited on
So, what do these switches do to the example source code?

-mpc32, -mpc64, -mfpmath=sse, or -msse2
They hint the compiler to try to use SSE or SSE2, which may increase the precision of floating point operations.
However, like I said, if you're looking for exact values, it's mathematically impossible with float or double. Just because 0.01 gets printed doesn't mean that the variable holds that exact value.
Last edited on
That much I know. What I mean is, 'accurate with the range of the type', ie. 'accurate between the minimum and maximum limit.

-Are there any others compiler switches?

-Are there any source code statements I can use, withouth any special library?
GCC has manual.
man g++

also:
g++ --help



Are there any source code statements

Depends what you ask.

If you want to store and use exact values in memory, then answer is
it's mathematically impossible with float or double.


If you want to output values rounded to some precision:
http://www.cplusplus.com/reference/iomanip/setprecision/
I have been going through the man pages and help pages, with confusion and limited success.

If such is impossible, then exactly what do these switches do to code?

-mpc32, -mpc64, -mfpmath=sse, or -msse2

double a = 0.1d;

double b = 0.1d;

double x = a*b;

std::cout << x << std::endl;
From man page:
-mpc32 -mpc64 -mpc80
Set 80387 floating-point precision to 32, 64 or 80 bits. When -mpc32 is specified, the significands of results of floating-point operations are rounded to 24 bits (single precision); -mpc64 rounds the significands of results of floating-point operations to 53 bits (double precision) and -mpc80 rounds the significands of results of floating-point operations to 64 bits (extended double precision), which is the default. When this option is used, floating-point operations in higher precisions are not available to the programmer without setting the FPU control word explicitly.

Setting the rounding of floating-point operations to less than the default 80 bits can speed some programs by 2% or more. Note that some mathematical libraries assume that extended-precision (80-bit) floating-point operations are enabled by default; routines in such libraries could suffer significant loss of accuracy, typically through so-called "catastrophic cancellation", when this option is used to set the precision to less than extended precision.

In other words, -mpc32 or -mpc64 reduces accuracy.

-msse2
This switch enables the use of instructions in the SSE2 extended instruction set.

GCC depresses SSEx instructions when -mavx is used. Instead, it generates new AVX instructions or AVX equivalence for all SSEx instructions when needed.

These options enable GCC to use these extended instructions in generated code, even without -mfpmath=sse.


To generate SSE/SSE2 instructions automatically from floating-point code (as opposed to 387 instructions), see -mfpmath=sse.


-mfpmath=unit
Generate floating-point arithmetic for selected unit unit. The choices for unit are:

387
Use the standard 387 floating-point coprocessor present on the majority of chips and emulated otherwise. Code compiled with this option runs almost everywhere. The temporary results are computed in 80-bit precision instead of the precision specified by the type, resulting in slightly different results compared to most of other chips. See -ffloat-store for more detailed description.

This is the default choice for i386 compiler.

sse
Use scalar floating-point instructions present in the SSE instruction set. This instruction set is supported by Pentium III and newer chips, and in the AMD line by Athlon-4, Athlon XP and Athlon MP chips. The earlier version of the SSE instruction set supports only single-precision arithmetic, thus the double and extended-precision arithmetic are still done using 387. A later version, present only in Pentium 4 and AMD x86-64 chips, supports double-precision arithmetic too.

For the i386 compiler, you must usdouble x = a*b;e -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. Fordouble x = a*b; the x86-64 compiler, these extensions are enabled by default.

The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80 bits.

This is the default choice for the x86-64 compiler.

sse,387 sse+387 both
Attempt to utilize both instruction sets at once. This effectively doubles the amount of available registers, and on chips with separate execution units for 387 and SSE the execution resources too. Use this option with care, as it is still experimental, because the GCC register allocator does not model separate functional units well, resulting in unstable performance.

https://en.wikipedia.org/wiki/SSE2#Differences_between_x87_FPU_and_SSE2

-mpc* and -mfpmath=sse thus mutually exclusive; you either compute with x87_FPU or with SSE*

That does not really affect
double x = a*b;
Although the a*b does use 80 bit registers in x87_FPU (vs 64 bit ? registers in SSE), the temporary value is probably transferred and truncated to double in RAM.
Nothing for binary coded decimal / decimal floating point..

There's libraries for it and such:
https://www.google.com/search?q=gcc+%22decimal+floating+point%22
Last edited on
What will -msse2 do?
I guess the big question is why? A double has 15 or 16 sf, this is normally good enough for most people. With numbers in the range of 1 million there is still 9 or 10 dp. What is your particular concern over the non exactness of a double?

If you std::cout the value with the normal 6 dp output, do you get the expected answer?

g++ has a non standard decimal library, it does exact decimals. One just #includes the "decimal/decimal.hpp" header

One can still use doubles, but the usual question is: Is my variable close enough to a certain value or within some precision value. This is easy to test with a simple if statement:

1
2
3
4
5
6
7
8
9
10
11
12
constexpr double Precision = 1e-6;
double a{0.1};
double b{0.1};

double c = a * b;

if (abs(c - 0.01) < Precision) {
   std::cout << "Equal within Precision\n";
}
else {
   std::cout << "NOT Equal within Precision\n";
}


I don't know what OP's trying to do, but binary floating point is unusable for calculations dealing with money.
Pages: 12