4-byte Big-Endian file reading

I am trying to understand Disch's tutorial on binary files (http://www.cplusplus.com/articles/DzywvCM9/ ).

Trying to learn based off the above link and this picture: http://img1.wikia.nocookie.net/__cb20130603095305/falloutmods/images/4/49/FRM_Endianess.jpg

Can someone confirm that this is the correct way to read a 4-byte piece of a file encoded in Big-Endian formatting?
1
2
3
4
5
6
7
8
9
10
uint32_t read_u32(std::istream& f)
{
    uint32_t val;
    char bytes[4];
    f.read(bytes, 4);

    val = bytes[3] | (bytes[2] << 24) | (bytes[1] << 16) | (bytes[0] << 8);
    
    return val;
}


Also, this is the correct way for Big-Endian 16-bit?
1
2
3
4
5
6
7
8
9
uint16_t read_u16(std::istream& f)
{
    uint16_t val;
    char bytes[2];
    f.read(bytes, 2);
    
    val = bytes[1] | (bytes[0] << 8);
    return val;
}

(Compare with Disch's code in the tutorial)

main would look something like
1
2
3
4
5
6
int main()
{
    std::ifstream f("edg5002.frm", std::ios::in | std::ios::binary);
    
    std::cout << read_u32(f) << std::endl; //Version
    std::cout << read_u16(f) << std::endl; //FPS 


Edit: Confirmed that my 16-bit code is working using
1
2
3
4
5
6
7
8
9
10
11
12
int main()
{
    std::ofstream f("test_big.txt", std::ios::out | std::ios::binary);
    uint8_t a = 0x3F;
    uint8_t b = 0x2B;
    f << a;
    f << b;
    f.close();
    
    std::ifstream f2("test_big.txt", std::ios::in | std::ios::binary);
    std::cout << read_u16(f2) << std::endl; //16171
}

Still trying stuff for 32-bit. It's hard to tell if I have it backwards or not... the problem might be how it handles 0x00 bytes?
Last edited on
bytes[3] | (bytes[2] << 24) | (bytes[1] << 16) | (bytes[0] << 8);
Nope. The shifts are backwards. They must grow as the index shrinks. Also, don't use char, you may get weird semantics when the MSB is on. Use an unsigned type.

val = bytes[1] | (bytes[0] << 8);
This is okay, but, again don't use char.
Thanks for the reply, I will try to apply that, I thought that it needed to be backwards because disch's tutorial was showing little-endian. So I should rather use "unsigned char" instead of just "char"? Huh, I didn't even know that was a thing until I just looked it up. Edit: Or I could just use uint8_t, as I'm not targeting those 12-bit machines. thanks again.

Was my first post's 4-byte code implementing little-endian by mistake, or just non-sense?
Last edited on
It was neither 0123 (BE) nor 3210 (LE). It was 2103.
You have an array of 4 bytes: [0], [1], [2], and [3]

In big endian, the first byte ([0]) is the most significant, and the last byte ([3]) is the least significant.

This means that:

[3] = moves to bit positions 0-7
[2] = moves to bit positions 8-15
[1] = moves to bit positions 16-23
[0] = moves to bit positions 24-31

val = bytes[3] | (bytes[2] << 8) | (bytes[1] << 16) | (bytes[0] << 32);

This is close, but not quite right. You are skipping over positions 24-31 and moving [0] to positions 32-39 (which don't exist in a 32-bit var, hence the warning).

Also, to reiterate what helios already said... don't use char. It's a signed type and therefore you might get weirdness when shifting into a larger variable type.
Yep I deleted what I had, I saw the mistake right away :p Guess I didn't delete quick enough. Also I already had it like
1
2
    uint8_t bytes[2];
    f.read((char*)bytes, 2);
now, I just didn't update the post, my bad. I should be good now, thank you both.
Last edited on
Topic archived. No new replies allowed.