puzzling output uint16_t(128.00) returns 127

This really has me puzzled. Easiest to show with code and output.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
uint8_t numberOfBits = 8;
  Serial.print("pow(2,numberOfBits-1) = "); Serial.println (pow(2,numberOfBits-1));
  Serial.print("uint8_t(pow(2,numberOfBits-1)) = "); Serial.println (uint8_t(pow(2,numberOfBits-1)));
  Serial.print("uint16_t(pow(2,numberOfBits-1)) = "); Serial.println (uint16_t(pow(2,numberOfBits-1)));
  Serial.print("uint32_t(pow(2,numberOfBits-1)) = "); Serial.println (uint32_t(pow(2,numberOfBits-1)));
  Serial.println();

  for(int i=0 ; i<3 ; i++)
  {
    Serial.print("i = ");Serial.print(i);Serial.print("  pow(2,numberOfBits-1-i) = "); Serial.println (pow(2,numberOfBits-1-i));
    Serial.print("i = ");Serial.print(i);Serial.print("  uint8_t(pow(2,numberOfBits-1-i)) = "); Serial.println (uint8_t(pow(2,numberOfBits-1-i)));
    Serial.print("i = ");Serial.print(i);Serial.print("  uint16_t(pow(2,numberOfBits-1-i)) = "); Serial.println (uint16_t(pow(2,numberOfBits-1-i)));
    Serial.print("i = ");Serial.print(i);Serial.print("  uint32_t(pow(2,numberOfBits-1-i)) = "); Serial.println (uint32_t(pow(2,numberOfBits-1-i))); 
    Serial.println();
  }


and the output

pow(2,numberOfBits-1) = 128.00
uint8_t(pow(2,numberOfBits-1)) = 128 // Good... casting 128.00 to different integers = 128
uint16_t(pow(2,numberOfBits-1)) = 128
uint32_t(pow(2,numberOfBits-1)) = 128

i = 0 pow(2,numberOfBits-1-i) = 128.00
i = 0 uint8_t(pow(2,numberOfBits-1-i)) = 127 //BAD... casting 128.00 to different integers = 127???
i = 0 uint16_t(pow(2,numberOfBits-1-i)) = 127
i = 0 uint32_t(pow(2,numberOfBits-1-i)) = 127

i = 1 pow(2,numberOfBits-1-i) = 64.00
i = 1 uint8_t(pow(2,numberOfBits-1-i)) = 63 // //BAD... casting 64.00 to different integers = 63???
i = 1 uint16_t(pow(2,numberOfBits-1-i)) = 63
i = 1 uint32_t(pow(2,numberOfBits-1-i)) = 63

i = 2 pow(2,numberOfBits-1-i) = 32.00
i = 2 uint8_t(pow(2,numberOfBits-1-i)) = 31 ////BAD... casting 32.00 to different integers = 31???
i = 2 uint16_t(pow(2,numberOfBits-1-i)) = 31
i = 2 uint32_t(pow(2,numberOfBits-1-i)) = 31

This is running on an Arduino Mega 2650.

Any ideas?
Last edited on
Curious, what does Serial.println(sizeof(numberOfBits - 1)); print for you?
<edit: nvm, that isn't important>

My first guess is something is going awry relating to floating-point numbers on your platform. Try rounding before converting to uint{N}_t?


I "translated" your code to run with a normal compiler (GCC on cpp.sh), and the output is pretty uneventful
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// Example program
#include <iostream>
#include <cstdint>
#include <cmath>
using std::uint8_t;
using std::uint16_t;
using std::uint32_t;

template <typename T>
void println(T value)
{
    std::cout << value << '\n';   
}

void println()
{
    std::cout << '\n';   
}

template <typename T>
void print(T value)
{
    std::cout << value;  
}

int main()
{
  uint8_t numberOfBits = 8;
  print("        pow(2,numberOfBits-1) = "); println (pow(2,numberOfBits-1));
  print("uint8_t(pow(2,numberOfBits-1)) = "); println ((int)uint8_t(pow(2,numberOfBits-1)));
  print("uint16_t(pow(2,numberOfBits-1)) = "); println (uint16_t(pow(2,numberOfBits-1)));
  print("uint32_t(pow(2,numberOfBits-1)) = "); println (uint32_t(pow(2,numberOfBits-1)));
  println();

  for(int i=0 ; i<3 ; i++)
  {
    print("i = "); print(i); print("           pow(2,numberOfBits-1-i)  = "); println (pow(2,numberOfBits-1-i));
    print("i = "); print(i); print("  uint8_t (pow(2,numberOfBits-1-i)) = "); println ((int)uint8_t(pow(2,numberOfBits-1-i)));
    print("i = "); print(i); print("  uint16_t(pow(2,numberOfBits-1-i)) = "); println (uint16_t(pow(2,numberOfBits-1-i)));
    print("i = "); print(i); print("  uint32_t(pow(2,numberOfBits-1-i)) = "); println (uint32_t(pow(2,numberOfBits-1-i))); 
    println();
  }
}

        pow(2,numberOfBits-1) = 128
uint8_t(pow(2,numberOfBits-1)) = 128
uint16_t(pow(2,numberOfBits-1)) = 128
uint32_t(pow(2,numberOfBits-1)) = 128

i = 0           pow(2,numberOfBits-1-i)  = 128
i = 0  uint8_t (pow(2,numberOfBits-1-i)) = 128
i = 0  uint16_t(pow(2,numberOfBits-1-i)) = 128
i = 0  uint32_t(pow(2,numberOfBits-1-i)) = 128

i = 1           pow(2,numberOfBits-1-i)  = 64
i = 1  uint8_t (pow(2,numberOfBits-1-i)) = 64
i = 1  uint16_t(pow(2,numberOfBits-1-i)) = 64
i = 1  uint32_t(pow(2,numberOfBits-1-i)) = 64

i = 2           pow(2,numberOfBits-1-i)  = 32
i = 2  uint8_t (pow(2,numberOfBits-1-i)) = 32
i = 2  uint16_t(pow(2,numberOfBits-1-i)) = 32
i = 2  uint32_t(pow(2,numberOfBits-1-i)) = 32

Note: The (int) casting is to avoid it trying to print characters instead of numbers.
Last edited on
What does this print?

1
2
3
4
5
6
7
8
9
10
11
12
uint8_t bits = 8;

Serial.println("uint16_t: ");
Serial.println(uint16_t(128.0));
Serial.println(uint16_t(127.99));

Serial.println("pow: ");
Serial.println(pow(2, bits - 1), 8);

Serial.println("loop:");
for (int i = 0; i < 3; ++i)
  Serial.println(pow(2, bits - 1 - i), 8);

My first guess is something is going awry relating to floating-point numbers on your platform.

Looks like Arduino. Atmel's avr-gcc is a relatively standard GCC port.

1. Try linking libm; and
2. Try linking libm and using single-precision floating point.
Often, if you didn't link libm soft floating-point emulation is provided by libgcc, which isn't always correct.

If that doesn't work, one next step is to answer these
3. What compiler (version) are you using?
4. What is the link line?
5. What is a working link to your uC's datasheet?
And then examine some generated code.
Last edited on
this looks like it might be better served in a lookup table, which would both correct the issue and probably go faster, depending on how pow() is implemented? But that means a little more memory used. Normal c++ pow is a heavy thing designed for pow(double, double) which is a complex algorithm...
Last edited on
If I had to bet, mbozzi's post is the solution here, but still relying on floating-point operations for integer math is in general a bad decision, not to mention potentially expensive as jonnin mentioned. Doing bit shifting for integer powers of 2 is really quick for CPUs to do, so I would say you might not even need a lookup table.

Still, I don't think I actually directly said what the issue is, so allow me to do that now:
- Due to imprecisions in how pow is being calculated, the result of your pow function may actually be something like 127.999999999 instead of 128.
- When you call Serial.println(double), the default precision of your output stream rounds it to 128 for you, but when you cast 127.999999 to an integer, it truncates to 127.
I certainly agree there are more efficient ways to do this. The thing is I stumbled on something here that I want to really understand and not just patch it. I am trying to learn deeply, not just hack along.

Look at the output and note that the first set of output casts correctly but adding another integer to the exponent of the pow function causes this error. How does that make and sense? I have run this one two platforms so far with identical results.
Genado, your code example and output may point to an issue in the Arduino math.h library vs the cmath library.
Funny, that's actually what originally confused me but then I forgot about that part.

uint8_t(pow(2,numberOfBits-1)) = 128
i = 0, uint8_t(pow(2,numberOfBits-1-i)) = 127

In this case, it might be that the compiler is resolving it to two different pow calls. In the first one, it sees that you are only computing with constants that are already known at compile time, so it pre-computed the output.

In the second case, it's doing the computation at run time and using floating-point math.

That's my guess, anyway. Would need to see assembly to know for sure (not that I'm anything close to an expert in assembly, but you could at least see if numbers are being baked in or not).
Last edited on
Looked for libm for arduino on PlatformIO without success.
Dutch, here's the output. Clearly a rounding issue.

uint16_t:
128
127
pow:
128.00000000
loop:
127.99995422
63.99997711
31.99998855
It's still not entirely clear what's going on since these two lines are printing different values, which is very strange.

1
2
3
4
uint8_t bits = 8;
Serial.println(pow(2, bits - 1), 8);       // 128.00000000
int i = 0;
Serial.println(pow(2, bits - 1 - i), 8);   // 127.99995422 

What does the following print? From what I know it shouldn't make a difference, but then again, the above two should be the same (either both exactly correct or off by a little).

1
2
int i = 0;
Serial.println(pow(2.0, bits - 1.0 - i), 8);

Do you know the sizes of your variables?
For instance, float, double, and long double may all be 4 bytes (32-bits).
And an int may be only 2 bytes (16-bits).
What is the output of this (just remove any variable types that aren't supported):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
short s;
int i;
long l;
long long ll;
size_t st;
Serial.println("s i l ll st");
Serial.print(sizeof s);
Serial.print(sizeof i);
Serial.print(sizeof l);
Serial.print(sizeof ll);
Serial.println(sizeof st);

float f;
double d;
long double ld;
Serial.println("f d ld");
Serial.print(sizeof f);
Serial.print(sizeof d);
Serial.println(sizeof ld);

Anyway, if you just want a solution to the problem you can either add 0.5 or use the round() function. Or, as jonnin mentioned, a table would be faster, if that's important, at the cost of a little memory.
I'm more confident now that it's GCC is recognizing and replacing the intrinsic pow call at compile-time.
See: https://stackoverflow.com/questions/32902730/unusual-behavior-of-pow-in-c-during-compilation

Shafik Yaghmour wrote:
basically gcc can optimize away calls to certain builtins when using constant expression arguments. Try using -fno-builtin
Maxim Egorushkin wrote:
pow is an intrinsic function. When you specify all its arguments as compile time constants, as in pow(2, 5), the compiler replaces the call with its value.

That only happens for integer arguments. Because when you pass floating point arguments the result of pow depends on the current floating point rounding mode, which is not known at compile time.
Last edited on
Looked for libm for arduino on PlatformIO without success
Just add -lm directly to the link line. Because it is part of the C standard library distribution it is likely already installed as part of your cross compiler.

The soft float implementation in libgcc will consist of weak (overridable) symbols.

These are things OP can check by examining the binaries.

Even if this isn't the cause of the problem, the optimized math routines are likely to be much faster than the generic implementation in libgcc
Last edited on
So the answer is that the compiler running on your regular computer is pre-calculating the non-looped pows, since they are using integer arguments and are known at compile time. The other pows are calculated at runtime by software on the arduino.

Presumably this is what the following will print:

1
2
3
4
5
uint8_t bits = 8;
Serial.println(pow(2, bits - 1), 8);         // 128.00000000
int i = 0;
Serial.println(pow(2, bits - 1 - i), 8);     // 128.00000000
Serial.println(pow(2.0, bits - 1.0 - i), 8); // 127.99995422 

Of course, the proper way to calculate powers of two is with the left-shift operator:

1
2
3
uint8_t bits = 8;
for (int i = 0; i < 3; ++i)
  Serial.println(1 << (bits - 1 - i));

Dutch, I stumbled on this while I was writing the code to demonstrate the difference in speed between shifting bits and using the math library. How ‘bout them apples.
Dutch, from your demonstration we now know that inflation has effected everything...including the value of zero. Sigh.
the shift is going to beat the lookup table too (my earlier suggestion). Shifting is one of the fastest circuits, its just a straight copy with the wires moved if you think about the cpu guts. I have a love-hate with pow. Its awesome if you need fractional powers. Its terrible if you need integer powers. The c++ guys really need to make fast pow(double, int) and pow(int, int) overloads for it.
Last edited on
@PeteDD, mbozzi is the one who solved the mystery. The compiler gives exact answers for integer constant arguments while the board is (very slowly!) generating approximate answers in software for the other situations.

@jonnin, The board's cpu is 8-bit! It doesn't even have a division instruction, let alone pow. It boasts of having a multiply instruction. No floating-point registers or anything like that. Although it is RISC, with 32 8-bit registers. It's mostly about low-power consumption with quite speedy computation for what it's capable of.

Datasheet pdf download:
http://ww1.microchip.com/downloads/en/DeviceDoc/ATmega640-1280-1281-2560-2561-Datasheet-DS40002211A.pdf
Last edited on
Topic archived. No new replies allowed.