double vs long double in calculations

Pages: 12
Greetings,

I would like to run my physical simulation with larger precision so I intend to exchange my double variables to long double variables for a first try.
In order to save computing time, I wondered wether it made sense to store only the absolute necessary double values as long double.

But then I got the question: What if I accidentally missed a double that should have been converted, and the program tries to do a calculation that includes a double and long doubles.
Will the double be promoted to a long double automatically or will it be the other way around?


Best,
PhysicsIsFun
Last edited on
If one operand of a binary arithmetic operator is long double,
the other operand is implicitly converted to long double
Hello PhysicsIsFun,

As by the table below:

type               lowest()               min()                  max()

char                         -128                  -128                   127 and is 1 bytes
uchar                           0                     0                   255 and is 1 bytes
int                   -2147483648           -2147483648            2147483647 and is 4 bytes
uint                            0                     0            4294967295 and is 4 bytes
long         -9223372036854775808  -9223372036854775808   9223372036854775807 and is 4 bytes
ulong                           0                     0  18446744073709551615 and is 4 bytes
float               -3.402823e+38          1.175494e-38          3.402823e+38 and is 4 bytes
double             -1.797693e+308         2.225074e-308         1.797693e+308 and is 8 bytes
long double        -1.797693e+308         2.225074e-308         1.797693e+308 and is 8 bytes


Are the same size and hold the same values.

The precision for a double or long double is still 20 digits compared to a float which is 15 digits.

For your other question:
When you mix an "int" and a "double" the "int" will be promoted to the greater before a calculation. So my thinking is that a "double" would be promoted to a 'long double", not that there is any difference in the numbers that they hold. Actually I have never used anything like that, so I am not 100% sure, but the logic seems sound.

Someone else may have a better explanation.

Hope that helps.

Andy
Handy Andy wrote:
double -1.797693e+308 2.225074e-308 1.797693e+308 and is 8 bytes
long double -1.797693e+308 2.225074e-308 1.797693e+308 and is 8 bytes

Here's the thing,
https://en.wikipedia.org/wiki/Long_double

long double must be at least as large as double, but can be larger.



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <iostream>
#include <limits>
#include <string>

template <typename T>
void print_limits(const std::string& type_str)
{
    using std::cout;
    cout << std::boolalpha;
    cout << "Minimum value for " + type_str + ": " << std::numeric_limits<T>::min() << '\n';
    cout << "Maximum value for " + type_str + ": " << std::numeric_limits<T>::max() << '\n';
    cout << type_str + " is signed: " << std::numeric_limits<T>::is_signed << '\n';
    cout << "Non-sign bits in " + type_str + ": " << std::numeric_limits<T>::digits << '\n';
    cout << type_str + " has infinity: " << std::numeric_limits<T>::has_infinity << '\n';
    cout << '\n';
}

int main()
{
    print_limits<double>("double");
    print_limits<long double>("long double");
}


On my machine (w/ MinGW), this prints

Minimum value for double: 2.22507e-308
Maximum value for double: 1.79769e+308
double is signed: true
Non-sign bits in double: 53
double has infinity: true

Minimum value for long double: 3.3621e-4932
Maximum value for long double: 1.18973e+4932
long double is signed: true
Non-sign bits in long double: 64
long double has infinity: true
Last edited on
Thank you guys!

On my system, long double holds 128 bits whereas double holds 64 bits.
This corresponds to quadruple precision (https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format). The number of significant digits is thus at least 33, which is more than twice of the number of significant digits of a typical double.


So if I write
1
2
3
double a=0.6; //cannot be represented perfectly as binary number
long double b=0.4; 
long double result=a+b; 


I asssume that the result will be less accurate than if I also implemented variable a as a long double, right?
So when it comes to accuracy, I better implement all the participating variables as long double from the get go.
Last edited on
Right, 0.6 will be stored as an inexact double, and then converted to a long double in a+b, but the information has already been lost and can't be regained.

Note that 0.6 as a literal is of type double.
You need to do 0.6L and 0.4L to properly write a long double literal.
Last edited on
@Ganado,

long double must be at least as large as double, but can be larger.


I realize that and did read it somewhere in a table. Thank you for the input I will adjust my program accordingly.

On my Windoze computer using VS1017 the above output is what I received with the lines of code:
1
2
3
4
5
6
7
8
9
std::cout << std::left << std::setw(CWIDTH) << "double" << std::right
	<< std::setw(WIDTH) << std::numeric_limits<double>::lowest() << std::right
	<< std::setw(WIDTH) << std::numeric_limits<double>::min()
	<< std::setw(WIDTH) << std::numeric_limits<double>::max() << " and is " << sizeof(double) << " bytes\n";

std::cout << std::left << std::setw(CWIDTH) << "long double" << std::right
	<< std::setw(WIDTH) << std::numeric_limits<long double>::lowest() << std::right
	<< std::setw(WIDTH) << std::numeric_limits<long double>::min()
	<< std::setw(WIDTH) << std::numeric_limits<long double>::max() << " and is " << sizeof(double) << " bytes\n";

I would take that to mean my "limits" header file is different than yours.

I will have to try the program with my MinGW compiler.

Andy
There can be differences in the size of a long double between compilers, and if the app is compiled as 32-bit or 64 bit.

MS VS has a long double at 8 bytes no matter what, the same as a double (8 bytes).

TDM-GCC has a long double at 12 bytes for 32-bit code, and 16 bytes for 64-bit.
Last edited on
@Ganado

Do I even have to include this L in 0.4L even though I declared the variable as long double?
1
2
3
4
5
6
7
8
9
10
11
12
#include <iostream>
#include <iomanip>

int main()
{
    const long double a = 0.4 ;
    const long double b = 0.4L ;

    std::cout << std::fixed << std::setprecision(20)
              << "a == " << a << '\n'
              << "b == " << b << '\n' ;
}

http://coliru.stacked-crooked.com/a/9b1aefc153430a59
@PhysicsIsFun
Why do you tend to enhanced precission for your physical simulation? Did you consider the calculation error similar to this one here: https://math.stackexchange.com/questions/1191072/how-to-calculate-the-errors-of-single-and-double-precision ? Is your simulation based on measured values? How propagates measuring errors to the final result? -- compared to errors introduced by single precision computing? (I assume your physical simulation is not just acceleration by force of gravity.)
@JLBorges
thank you!

@MikeStgt
I am simulating a determistic-chaotic system with hard walls, meaning smallest rounding errors can lead to drastically different results.


So another question:
If I want to pass an input parameter, a number, to my main() upon call of the program in the terminal, do I also have to type the "L" in the terminal?
Like main.exe 0.5L?
And is it legit to use the 'stold()' function to turn this input parameter into a long double?
long double a=stold(argv[0]);

@PhysicsIsFun
I am simulating a determistic-chaotic system with hard walls, meaning smallest rounding errors can lead to drastically different results.
My first thought was Brownian motion -- https://en.wikipedia.org/wiki/Brownian_motion -- but I am not sure if this is deterministic-chaotic like a pseudo random number generator is. If by hard walls you mean 100% elastic and a closed system, then you will observe chaos, computing it with doubled precission you will observe a drastically different chaos.
(In case you simulate an open system, then you could find differing dissipative structures. Did you ever see the odds of Mandelbrot Sets computetd single vs double/enahnced precision?)

Edit: Example of a dissipative structure -- http://ojensen.ruhosting.nl/grad1.gif
Last edited on
> If I want to pass an input parameter, a number, to my main() upon call of the program in the terminal,
> do I also have to type the "L" in the terminal?

No.


> is it legit to use the 'stold()' function to turn this input parameter into a long double?

Yes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <iostream>
#include <iomanip>
#include <string>

int main( int argc, char* argv[] )
{
    const long double a = 0.4 ;
    const long double b = 0.4L ;

    std::cout << std::fixed << std::setprecision(20)
              << "a == " << a << '\n'
              << "b == " << b << '\n' ;

    if( argc > 1 )
    {
        std::cout << "argv[1] == " << std::quoted( argv[1] ) << '\n' ;

        try
        {
            const long double c = std::stold( argv[1] ) ;
            std::cout << "c == " << c << '\n' ;
        }
        catch( const std::exception& )
        {
            std::cout << "argv[1] is not a floating point number\n" ;
        }
    }
}

http://coliru.stacked-crooked.com/a/5ed122f03f3876fb
@MikeStgt
I really have hard walls. It is not a trivial system. We are observing some strange deviations from the theoretical predictions and we want to test wether this has to do with a lack of accuracy. Thus, for a first test, I want to test what happens when using long doubles.
Are you a physicist :)?

@JLBorges
Thank you again!
Another question:

1
2
3
long double b = 2.3L;
long double a = 2 * b;
long double a = 2L * b; 


There is no difference between the two, since '2' is an integer and thus can be stored as a float type without loss of accuracy, right?

Is there a rule of good practice, i.e. should one write the 'L' there, does it help the compiler or such?
Last edited on
2L is a long.
2.0L (or just 2.L if you prefer) is a long double.

I'm not sure what the recommended practice is but personally I tend to try and use the correct literal type when working with floating-point numbers to avoid unnecessary conversions and loss of precision. If you later want to change the multiplication factor to 2.1 there is probably more chance of making a mistake, and forget to add the L, if it was previously written as an integer literal.
Last edited on
@PhysicsIsFun
Are you a physicist?

Almost, I made my diploma in the central research department of a pharmaceutical company, however made my living with computing. What they do at CERN is far outside my sphere, but I am still interested in "everyday's science". When your experiment shows deviations from the theoretical predictions, it may be related to the 'closed system' is not so closed as in theory. You could try to compensate by appropriate measures (trace heating, ...), but that is just an idea, a quite theoretical one ;)

I mentioned Mandelbrot set because there is (was?) the rumour, its shape would be the result of deviations at every iteration, the sum of many small errors. You have to magnify very strong to get a significant effect from your system accuracy. Or for "normal" views you have to alter quite distinct the iteration results to cause abnormalities. Also the a. m. paper (discussion to be precise) "How to calculate the errors of single and double precision" shows, that with 10e3 iterations there is almost no difference, whereas 10e5 iterations show a distinct effect.

Back to the subject: there is a FORTRAN compiler option autodbl, that propagates all variables to their enhanced precision homologous without the need of changing source code. If there is a similar option for C++ you could very quickly check the effect of doubling variables' size.
Thanks for the hint, I forgot about 'long'.

So you would write 2.L then. Your point makes sense!
some strange deviations from the theoretical predictions
I am nosy if the enhanced accuracy had a significant effect on the results. If you don't mind to post the outcome (of this facet only) of your trial.
Pages: 12