Read 32-bit signed vs unsigned integer from file

I have a binary file that contains the number 44100 encoding into 4 bytes.
The file, in binary format, is the following bytes:
68 172 0 0.

The problem was that once I tried to read it, it would at first read it as -21436 instead of 44100. I figured out my problem was that I was using signed 32-bit instead of unsigned format.

So my original problem is solved... but I still don't exactly get why it was a problem. The maximum value for a 32-bit signed integer should be 2,147,483,647, well above 44100.

So why does reading it as a signed integer not read correct while reading it as an unsigned integer does read it correctly?

I think I'm just overseeing something obvious? I know about 2's compliment and all that stuff.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

#include <cstdint>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
uint32_t read_u32_le(std::istream& file)
{
    uint32_t value;
    uint8_t  bytes[4];
    
    file.read( (char*)bytes, 4);
    value = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
    return value; 
}
int32_t read_s32_le(std::istream& file)
{
    int32_t value;
    int8_t  bytes[4];
    
    file.read( (char*)bytes, 4);
    value = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
    return value; 
}
int main()
{
    std::ifstream f;
    
    
    // written in binary format is the following bytes:
    // 68 172 0 0
    
    f.open("test44100", std::ios::binary);
    unsigned char b1;
    unsigned char b2;
    unsigned char b3;
    unsigned char b4;
    f.read((char*)&b1, 1);
    f.read((char*)&b2, 1);
    f.read((char*)&b3, 1);
    f.read((char*)&b4, 1);
    std::cout << (int)b1 << " " << (int)b2 << " " << (int)b3 << " " << (int)b4 << std::endl;
    f.close();
    
    // prints -21436:
    f.open("test44100", std::ios::binary);
    int32_t sampleRate_s = read_s32_le(f);
    std::cout << sampleRate_s << std::endl;
    f.close();
    
    // prints correctly 44100:
    f.open("test44100", std::ios::binary);
    uint32_t sampleRate_u = read_u32_le(f);
    std::cout << sampleRate_u << std::endl;
    f.close();

}
Last edited on
Whether the integer is signed or not is trivial; signed and unsigned variables are literally exactly the same on asm level. (I don't think there is a single architecture where this isn't true!)

The only difference is the way the data is interpreted:
An unsigned variable represents the maximum amount that an integer can represent with the specific byte size.

A signed variable sacrifices the maximum with the ability to "contain" negative numbers. In reality it just sacrifices the upper half of integer's range and maps that to the negative numbers.

For a number smaller than ~2 billion, the result should be identical, so I don't know what's going on here.

I've modified the code to run in ideone (removed file input), and this is what I get when I run it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <cstdint>
#include <iostream>
#include <string>
#include <vector>
#include <cstdint>

uint32_t read_u32_le(uint8_t* bytes)
{
    uint32_t value;
    value = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
    return value; 
}
int32_t read_s32_le(uint8_t* bytes)
{
    int32_t value;
    value = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
    return value; 
}
int main()
{
    // written in binary format is the following bytes:
    // 68 172 0 0
    
    unsigned char b1 = 68;
    unsigned char b2 = 127;
    unsigned char b3 = 0;
    unsigned char b4 = 0;
    
    std::cout << (int)b1 << " " << (int)b2 << " " << (int)b3 << " " << (int)b4 << std::endl;
    
    uint8_t bytes[4] = {b1, b2, b3, b4};
    // prints -21436:
    int32_t sampleRate_s = read_s32_le(bytes);
    std::cout << sampleRate_s << std::endl;
    
    // prints correctly 44100:
    uint32_t sampleRate_u = read_u32_le(bytes);
    std::cout << sampleRate_u << std::endl;

}


68 127 0 0
32580
32580


You can try it here: http://ideone.com/Hg7MNa

As you can see the last two values are the same. I am dumbfounded on why it did not work for you.
Last edited on
The problem was that once I tried to read it, it would at first read it as -21436 instead of 44100. I figured out my problem was that I was using signed 32-bit instead of unsigned format.

Actually to me it looks like you're reading it as a signed 16 bit value to get -21436.

Why are you reading in bytes instead of just reading the number?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
    std::ifstream fin("data.bin", std::ios::binary);
    int16_t readValue_16s;
    fin.read(reinterpret_cast<char*>(&readValue_16s), sizeof(readValue_16s));

    fin.seekg(0);
    uint16_t readValue_16u;
    fin.read(reinterpret_cast<char*>(&readValue_16u), sizeof(readValue_16u));

    fin.seekg(0);
    int32_t readValue_32s;
    fin.read(reinterpret_cast<char*>(&readValue_32s), sizeof(readValue_32s));

    fin.seekg(0);
    uint32_t readValue_32u;
    fin.read(reinterpret_cast<char*>(&readValue_32u), sizeof(readValue_32u));

    std::cout << readValue_16s <<  " " << readValue_16u << " " << readValue_32s << " " << readValue_32u << std::endl;



Actually to me it looks like you're reading it as a signed 16 bit value to get -21436.

Do you have an explanation for why that would be happening? I am not using any 16-bit stuff in my example.

Why are you reading in bytes instead of just reading the number?

That's how one of the tutorials on the site (Disch's I think?) showed how to read in 32-bit little endian numbers. wav format happens to be little endian. If the file was in big endian format, I don't think that reinterpret_cast would work.

If I set up the byte array as an
uint8_t bytes[] = {68, 172, 0, 0} they both give the correct value, signed or unsigned.

I had a similar problem when reading 16-bit amplitude values that were supposed to be signed from a wav file, but when I changed it to uint16_t instead of int16_t, it started working correctly, but now everything is unsigned so I am relying on the behavior of how unsigned stuff rolls back to 0+ on overflow. The file parsing and reading/writing is confusing me, I still am not understanding what exactly the problem in my post is :/

Btw, is this the right way to write a 4-byte integer, whether signed or unsigned?
1
2
3
4
void write_s32(std::ostream& file, int32_t data)
{
    file.write((char*)(&data), 4);
}
I think the problem is here:
16
17
18
19
20
21
22
23
24
int32_t read_s32_le(std::istream& file)
{
    int32_t value;
    int8_t  bytes[4];
    
    file.read( (char*)bytes, 4);
    value = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
    return value; 
}


The decimal value 172 is stored in a single byte as the binary value 10101100
Since the type of bytes is int8_t, that is, a signed value, the '1' in the high-order bit position indicates a negative value, -84 rather than +172.

As far as I'm aware (open to other comments), a left-shift of a negative value is undefined.

Changing line 19 to
 
    uint8_t  bytes[4];
may fix the problem.
When you treat the bytes as signed quantites, the values are 68 -84 0 0. When you combine these with
value = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
you get 68 + (-84)*256 + 0 + 0 = -21436.

The key here is in how something like bytes[1] << 8 gets evaluated. bytes[1] is a signed 8 bit integer and 8 is a signed full-length integer. So it sign-extends the 8 bit integer and then does the shift.
That's how one of the tutorials on the site (Disch's I think?) showed how to read in 32-bit little endian numbers. wav format happens to be little endian.

A link to the page would probably be helpful.

What micro-processor does your system use? An Intel processor is probably little Endian so you shouldn't need to read the individual bytes, if you're using an Intel based system.


Btw, is this the right way to write a 4-byte integer, whether signed or unsigned?
1
2
3
4
void write_s32(std::ostream& file, int32_t data)
{
    file.write((char*)(&data), 4);
}

No, not really, you should use something like:

1
2
3
void write_s32(std::ostream& file, int32_t data)
{
    file.write(reinterpret_cast<char*>(&data), sizeof(data));




Last edited on
BTW, do you have control of the format if that file? If so, I'd recommend that you store the numbers in big-endian format because there are common routines to convert between the host endianness and big-endian. These are used in network programming where most numbers are transmitted in big-endian format. The routines are htonl(), ntohl(), htons() and ntohs(). For example, htonl() converts a 32-bit number from host format to network (big endian) format. ntohl() does the opposite.
Thank you! You all are right, the problem was the int8_t for the individual bytes as opposed to uint8_t, everything seems to be working fine now, as far as I can tell.

A link to the page would probably be helpful.

What micro-processor does your system use? An Intel processor is probably little Endian so you shouldn't need to read the individual bytes, if you're using an Intel based system.

Yeah my intel processor is little endian, and I agree, it wouldn't matter. I just originally had it like that because I worked with both BE and LE files (some files from a game called Fallout are BE format).
Here's one of the links I followed: http://www.cplusplus.com/forum/general/106624/ about the file stuff, I can't find the original link, bu it's not important.

And jlb, thanks I'll use that function from now on instead.

BTW, do you have control of the format if that file?

Nah it's a wav file. But it's no big deal, the endianness isn't an issue. Thanks for the help guys.
Last edited on
Topic archived. No new replies allowed.