double vs long double in calculations

Pages: 12

I met a chap recently, who compiles unoptimized in order to keep the binaries more comparable on different platforms. Apparently GCC's optimizations do affect numerical results.

TheIdeasMan (6781)

Hi,

There is the non standard decimal TR, but it is limited to 64 bits.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3407.html

https://stackoverflow.com/questions/14096026/c-decimal-data-types#14096071

On my Linux system there is the decimal/decimal.h header file.

PhysicsIsFun wrote:
On my system, long double holds 128 bits whereas double holds 64 bits. This corresponds to quadruple precision (https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format). The number of significant digits is thus at least 33, which is more than twice of the number of significant digits of a typical double.

But I still wonder why you need all this extra precision. For example wiki lists the radius of an electron as 2.8179403227(19)e-15 metres, that's 13 sf. With an ordinary double with 15sf, one has 2 extra digits of precision, or 1/10e15 of the radius of the electron, or 1 quadrillionth or the radius of the electron. With a long double of 18sf, one has 5 extra digits of precision; with your quadruple precision of 33 sf, that's more than twice the number of digits in the original number. And you want more ?

It just seems strange that you are wanting to use values so far beyond what any experimental inaccuracies might be. Isn't the normal situation for each value to have an associated error value, and these propagate through the calculations? I am struggling to see how a round off error at about 1e-48 could effect values at the decimal place of 1e-28.

Maximum apologies in advance if this is a ridiculous question: I sincerely hope that you know that C/ C++ has always been able to do exponents, as in:

constexpr double re {2.81794032271919e-15 }; // Electron Radius

Again, I apologise for even asking this question, but it's not the first time that someone hasn't understood this.

Edit: Can we see a small part of your code where you define values?

Last edited on

Ganado (6783)

Based on my reading of this thread, it's not about measuring one particular value (like the radius of an electron). It's rather about a chaotic system, where the error adds up over time. A small difference, say 2.81794032271919e-15 vs. 2.81794032271918e-15 can produce a huge difference in 100 more iterations of the system. (You know, the butterfly effect.)

That being said, due to the inherent nature of chaotic systems, the same thing will inevitably happen whether we're dealing with 128-bit floating points or 64-bit floating points. The simulation will just blow up some number of iterations sooner when we're working with less bits. But perhaps just a few more iterations of accuracy is all PhysicsIsFun needs.

There are probably libraries that let us use even more bits (say a 224 or 256-bit floating point), but that comes at the cost of much slower calculations.
https://en.wikipedia.org/wiki/Octuple-precision_floating-point_format

Last edited on

JLBorges (13770)

> On my system, long double holds 128 bits whereas double holds 64 bits.
> The number of significant digits is thus at least 33,
> which is more than twice of the number of significant digits of a typical double.

Verify that this is a valid assumption.

#include <iostream>
#include <limits>
#include <boost/multiprecision/cpp_bin_float.hpp>

template < typename T > void print_limits( const char* name )
{
    using limits = std::numeric_limits<T> ;
    std::cout << name
              << "\n\tsize: " << sizeof(T) 
              << "\n\t#bits in the mantissa: " << limits::digits
              << "\n\t#decimal digits that can be represented: " << limits::digits10
              << "\n\t1/43 == " << std::setprecision(limits::max_digits10) << ( T() + 1 ) / 43 
              << "\n\n" ;
}

#define PRINT_LIMITS(type) print_limits<type>( #type )

int main()
{
    using namespace boost::multiprecision ;

    PRINT_LIMITS(float) ;
    PRINT_LIMITS(double) ;
    PRINT_LIMITS(long double) ;
    PRINT_LIMITS(cpp_bin_float_quad) ;
    PRINT_LIMITS(cpp_bin_float_50) ;
    PRINT_LIMITS(cpp_bin_float_100) ;
}

float
	size: 4
	#bits in the mantissa: 24
	#decimal digits that can be represented: 6
	1/43 == 0.0232558139

double
	size: 8
	#bits in the mantissa: 53
	#decimal digits that can be represented: 15
	1/43 == 0.023255813953488372

long double
	size: 16
	#bits in the mantissa: 64
	#decimal digits that can be represented: 18
	1/43 == 0.0232558139534883720939

cpp_bin_float_quad
	size: 32
	#bits in the mantissa: 113
	#decimal digits that can be represented: 33
	1/43 == 0.0232558139534883720930232558139534869

cpp_bin_float_50
	size: 64
	#bits in the mantissa: 168
	#decimal digits that can be represented: 50
	1/43 == 0.023255813953488372093023255813953488372093023255813975

cpp_bin_float_100
	size: 80
	#bits in the mantissa: 334
	#decimal digits that can be represented: 100
	1/43 == 0.02325581395348837209302325581395348837209302325581395348837209302325581395348837209302325581395348837193

http://coliru.stacked-crooked.com/a/c5c343bd6afd9c51

MikeStgt (466)

the butterfly effect

@Ganado
Please have a look at this bifurcation diagram -- https://science.kairo.at/bifurk_diag_ausschnitt.png -- and take into account all the prerequisites for a butterfly to have an effect. IMO the computing precision is of minor ponderousness (or prevails only in few distinct constelations), other factors of the calculation are more important. For example, the shape of the Mandelbrot set is not the result of "moths" (or bugs?) summing-up in many iterations.

Topic archived. No new replies allowed.

Pages: 12

C++

Forum

double vs long double in calculations