endianess in c++

I have an example code:

1
2
3
4
5
6
7
8
9
#include <cstdio>

int main()
{
    char arr=0x11;

    printf("c=%c\n", arr);
    return 0;
}


With the example, It is will use: "IMPLICITTYPE CONVERSIONS"
look like :
1
2
3
4
5
6
7
8
#include <cstdio>

int main()
{
  char arr = 17;
  printf("c=%c\n", static_cast<int>(arr));
  return 0;
}

I understand that, the output is not the same between Little endian machine, and big endian machine.Is it correct?
(I dont have Big endian machine to test)
Last edited on
The output is exactly the same whether big endian, little endian, middle endian, mixed endian, or anything you can dream up.

The static_cast is casting the value, not reinterpreting the bytes, so the endianness doesn't matter.

And I think even in C++ a variadic function like printf will have arguments smaller than an int upgraded to an int whether or not you cast it. So both of your programs are passing ints to printf, where you print it as a character.
Last edited on
> I understand that, the output is not the same between Little endian machine, and big endian machine.Is it correct?
No.

int printf(const char * restrict format, ...);

Arguments which match the ellipsis undergo default promotion rules. So anything narrower than an int or double is promoted to int or double respectively.

Your cast is just doing explicitly what the compiler would do implicitly anyway.
I don't think you're going to see anything different because of a change of architecture.
why bellow code affect by little endian and big endian (https://www.geeksforgeeks.org/little-and-big-endian-mystery/)
1
2
3
4
5
6
7
8
9
#include <stdio.h> 
int main() 
{ 
    unsigned char arr[2] = {0x01, 0x00}; 
    unsigned short int x = *(unsigned short int *) arr; 
    printf("%d", x); 
    getchar(); 
    return 0; 
} 

when endianness affect to code?
Thank for your support.
Last edited on
Because that just moves a bunch of bits around with no regard to type.

It's the same as
memcpy(&x, arr, 2);
The static_cast is casting the value, not reinterpreting the bytes, so the endianness doesn't matter.

as @dutch comment then.
1
2
3
4
5
6
7
8
#include <stdio.h> 
int main() 
{ 
    unsigned char i = 0x11;
    unsigned int x = *(unsigned int *)&i; 
    printf("%d", x); 
    return 0; 
} 

and
1
2
3
4
5
6
7
8
#include <stdio.h> 
int main() 
{ 
    unsigned char i = 0x11;
    unsigned int x = i; 
    printf("%d", x); 
    return 0; 
} 

the out put not the same ( little endian and big endian). Is it correct?
(I try with https://cppinsights.io/ )
Last edited on
> unsigned int x = *(unsigned int *)&i;
vs
> unsigned int x = i;
These two things are absolutely NOT equivalent in any way.

One is "make a pointer, pretend it points to something else and then dereference it".
Assuming it doesn't actually blow up when it tries to read invalid memory locations, look forward to getting garbage values from the missing bytes.
Pointer casting is the "bash the square peg into the round hole" operator.

The other is a proper value preserving assignment made through the language type promotion rules.

Oh, and in case you're wondering
- there's also something called middle endian.
- some machines can be configured to be big-endian or little-endian.
http://c-faq.com/misc/endiantest.html
Although, I 'm not clear about affect endianness with code ,
but Thank you very much @salem c
Mostly, you don't have to worry about it within the confines of your program.

It's only when you start interacting with the outside world that it can become an issue.

So for example, if you wanted to send a short over the network, you would use this.
https://linux.die.net/man/3/htons
On machines which are big-endian to start with, this function is effectively a NOP.
But for little-endian machines, it does some actual work to rearrange the short into the correct byte order.

Your code calls htons() all the same, and works regardless of the machine it's actually running on. This is how you achieve portability.

https://en.wikipedia.org/wiki/Portable_Network_Graphics#%22Chunks%22_within_the_file
Similarly when dealing with files. Files containing embedded sizes, lengths, dimensions etc pick an endian representation.
https://linux.die.net/man/3/endian
Portable code uses portability layers to either eliminate or minimise the disruption when porting the code from one platform to another.

Thanks @salem c
Because I worry about my code correct in my machine (ex: little endian), but may be not correct in another machine (ex: big endian). I want my code correct on any machine.
Let's start with your original example:
1
2
    char arr=0x11;
    printf("c=%c\n", arr);

The compiler converts char arr to an int when passing it to printf. It does this using whatever endianness the current computer uses.

When printf() sees the %c, it knows that the next argument was passed as an int so it converts the to a char. The compiler does this using whatever endianness the current computer uses.

So it doesn't matter what the endian-ness is. Since the compiler was aware of the endian-ness when it created the program, it does the conversions correctly either way.

Now the second example:
1
2
    unsigned char arr[2] = {0x01, 0x00}; 
    unsigned short int x = *(unsigned short int *) arr;

At line 2 when you convert arr to an unsigned short int *, you're telling the compiler "I'm guaranteeing that the memory at arr contains a properly formatted short int." But it doesn't contain that. The first byte contains 0x01 and the second contains 0x00. So when you assign those bytes to x, you get whatever value those two bytes represent. The answer is different depending on on the endian-ness of the computer.
Although, I 'm not clear about affect endianness with code...

here is a real world example... I had a stack of 10 or 12 or something CDs with binary data from a backwards machine. It was one of my first real tasks at my first real job :) What happens when you read binary data in bulk in c++ is it does it BYTEWISE. But the bytes are backwards on all the integers (except 1 byte size). So the guy I was replacing was reading a 'record' and flipping the integers (dozens per record) in CODE with his hand-rolled junk code. It was taking 4 or 5 hours to read each disc. I took that out and used the chip's built in flip command and read each disk in a few min per. Fun times.

Examples aside, you see it when you get data from an external source (file, network, etc) that comes in bytewise (not as text such as typed into a gui) in so-called binary format. If you have that in your program it is at risk. The second place where you see it is if you are doing byte level access directly on integers via hard coded offsets in your code, and you move it to the other platform where the bytes are in the wrong place. And that does not always matter; if you are treating the ints as a container for bits and the integer's value is not used, it probably works anyway. But if you needed the integer's numeric value, it will break.
Last edited on
By the way, most development systems have fast, easy, mostly-portable ways to handle this stuff, thanks to the internet. IP packet headers use big-endian ordering. This is called "network byte ordering". That must be converted to the "host byte ordering." There are standard functions for this:
1
2
3
4
htons() // convert short (16 bit) number from host byte odering to network byte ordering
ntohs() // convert short from network order to host order.
htonl() // convert long (32 bit) number from host to network order
ntohl() // convert long from network to host order. 

On big-endian machines these are no-ops. On other machines they are optimized. The x86 CPU can do it in one instruction.
Topic archived. No new replies allowed.