Numeric problems

Forum

Forum
General C++ Programming
Numeric problems

Hi,

Note: The question is basically what i've written at the bottom of this text, everything above "Long story short" is just a bit of background information.

I am currently writing a program where i want to be able to switch the base data type (float / double / long double / custom) which is used for calculations using a simple preprocessor #define PRECISION = 2 -like statement.

The problem is that I need to use various constants throughout the code like 0, 0.5, 1.0, 5.0e-7, ... and so on. First i did something like the following:

#if PRECISION == 1 //float
 #define m_half = 0.5f
 #define m_one  = 1.0f
 [...]
#endif

#if PRECISION == 2 //double
 #define m_half = 0.5
 #define m_one  = 1.0
 [...]
#endif

This list of course gets quite long very quickly and surely isn't a nice solution to the problem.

So i decided to try out a new c++11 feature and define a custom suffix, _V, which would do the work for me and convert all the constants at compile time:

template <char... Digits> constexpr
float operator "" _V() {
    return ( intPart<Digits...>() + fracPart<Digits>() )
           * Power<10, expPart<Digits...>>::value
}

So if i write float value = 123.456e-7_V, the intPart function calculates 123.f, fracPart calculates 0.456f and expPart calculates -7.

In theory it should work nicely, but in practice we're working with finite precision of course, so i get rounding errors. That would not be a big deal if it isn't different from typing float value = 123.456e-7f, but my algorithm introduces some rounding errors and so the result is less precise than the built in calculation.

Long story short: What's the best way (or at least a better approach than mine) of converting a decimal scientific number to a floating point value at compile time using c++11's feature of user defined literals? How does the compiler change 123.456e-7f to the internal representation of a floating point value?

ne555 (10692)

I doubt that float pi = 3.14f; would have a different value than float pi = 3.13;
so I suggest you to use long long double for writing the constants (and use constants instead of defines)

#if PRECISION == 1
typedef float value_type;
#elseif PRECISION == 2
typedef double value_type;
//...
#endif

const value_type 
   m_half = 0.5L,
   m_one = 1L;

AleaIactaEst (98)

That's the solution i want to avoid, because i do not want to define a new constant m_half, m_one, m_two, etc. for each and every new float i want to use.

What i want to know is how the compiler calculates the mantisse and exponent of numbers given in decimal format. (That's what the compiler is doing i guess?)

Edit: I guess it can't be bad if i give some example output of the program:

double precision:

printf("%10.20lf \n%10.20lf", 1.2345678901234567890123456789_V, 1.2345678901234567890123456789);

prints the following to standart output:

1.23456789012345691248 
1.23456789012345669043

long double precision:

printf("%10.25Lf \n%10.25Lf", 1293.19240214283283829823238e-3_V, 1293.19240214283283829823238e-3L);

prints to stdout:

1.2931924021428328384041018 
1.2931924021428328382956816

Obviously my implementation of operator "" _V() has some numeric flaws, so im looking for other ideas how it could be implemented to get the same result as the built-in suffixes f, nothing(double), L.

Last edited on

mik2718 (347)

Just guessing, but wouldn't it be the case that the compiler optimiser will convert a literal to the type of the variable to which it is assigned?

Therefore use long long double for all literals and maybe a templated type to be able to represent the differing precisions?

AleaIactaEst (98)

A conversion from a float constant value to int for example is done at compile time (using g++ with -O2), so i guess you're right. But one of the main reasons, why i want to use a custom suffix is that this can simplify input of constant values of custom types very much.

For example i have a floating point template class FLOAT that takes the size of the mantisse and exponent as template parameters(in bits), allowing to store floating point values with fixed, but arbitrary precision. Now imagine i want to do the following:

1
2
3

const int mantisse = 200;
const int exponent = 19;
FLOAT<mantisse, exponent> pi = 3.1415926535897932384626433832795028841971693993751058209749446;

pi will have a poor accuracy with respect to 200 mantisse bits, because the rvalue gets casted to a double value that can hold only 52 mantisse bits (64bit double) before it is assigned to pi. Using a custom suffix _V, i can avoid this problem and simply write

FLOAT<mantisse, exponent> pi = 3.1415926535897932384626433832795028841971693993751058209749446_V

without loosing precision.

Duthomhas (13129)

I'm starting to reconsider the most complex Hello World thread...

AleaIactaEst (98)

what?

AleaIactaEst (98)

The problem is still not solved, and im really looking forward to some answers that are more helpful than what Duoas has written.

Basically i just want to know how to mimic the behaviour of the built-in floating point suffixes 1.2f for float, 1.2 for double and 1.2L for long double. There has to be a way :/

Last edited on

mik2718 (347)

Did some googling around this.

It looks to me like you want to make the constructor of your special class into a constexpr. Then the literal can be constructed at compile time.

However, there are severe constraints on what code is allowed in a constexpr so good luck with programming a constructor that conforms.

This website might be interesting.

http://akrzemi1.wordpress.com/2011/05/11/parsing-strings-at-compile-time-part-i/

AleaIactaEst (98)

Well my (new) idea is to write a constexpr algorithm for each data type (this also includes float, double and long double) that computes the binary representation of the given string (interpreted as a scientific number). Constructing a float, double, long double or custom data type with this binary representation at compile time will then be no problem because i only need to copy the bits.

Implementing the first part, the conversion from scientific to binary format, is going to be difficult tho, where the main problem is the change of the exponent's base from 10 to 2. For example if i have a number like 1.e4= 1*10^4, what is a proper way to compute 1,111101000*2^9?

And thanks for the link by the way, helped me to get a better understanding of compile time computations with constexpr.

JLBorges (13770)

> Constructing a float, double, long double or custom data type with this binary representation
> at compile time will then be no problem

Subject to this caveat:

Although in some contexts constant expressions must be evaluated during program translation, others may be evaluated during program execution. Since this International Standard imposes no restrictions on the accuracy of floating-point operations, it is unspecified whether the evaluation of a floating-point expression during translation yields the same result as the evaluation of the same expression (or the same operations on the same values) during program execution.

Example:

bool f() 
{
  char array[ 1 + int(1 + 0.2 - 0.1 - 0.1)] ; // Must be evaluated during translation
  int size = 1 + int( 1 + 0.2 - 0.1 - 0.1 ) ; // May be evaluated at runtime
  return sizeof(array) == size ;
}

It is unspeciﬁed whether the value of f() will be true or false.

Footnote: Nonetheless, implementations are encouraged to provide consistent results, irrespective of whether the evaluation was actually performed during translation or during program execution.

Topic archived. No new replies allowed.