Charr array element spacing

Forum

Forum
Beginners
Charr array element spacing

Charr array element spacing

Correct me if I'm wrong:

The smallest unit possible to allocate is 1 byte, so :
002BF560 - points on some byte
002BF561 - points on some byte+1

Char always takes only one byte of memory. I've also checked that int takes up 4 bytes on my machine.

A short code before the final question ;) :

  #include <iostream>

int main()
{
	char array[] = { 1, 2, 3, 4, 5, 6, 7 };

	std::cout << "Size of char: " << sizeof(char) << "\n";
	std::cout << "Size of int: " << sizeof(int) << "\n";

	std::cout << std::hex << *(reinterpret_cast<int*>(array)) << std::endl;

	return 0;
}

Why does the last cout line print 4030201 which are array[] elements in reverse order with zeros in between ?

If int takes up 4 bytes while char 1 byte, shouldnt it print 1234 (as the pointer was casted form 1-byte char to 4-byte int) ?

As for the reverse order, I think it has something to do with endianness, but why zeros are added in between ?

Thanks in advance.

tipaye (535)

Good question, simple answer :-)

Try using multiple digits instead, and the answer will reveal itself.

char array[] = { 171, 205, 239, 186, 5, 6, 7 };
or
char array[] = { 0xab, 0xcd, 0xef, 0xba, 5, 6, 7 };

rafae11 (264)

you are displaying the byte value. it is reversed because of the endianness.

http://en.wikipedia.org/wiki/Endianness

MiiNiPaa (8886)

Correct me if I'm wrong:

The smallest unit possible to allocate is 1 byte, so :
002BF560 - points on some byte
002BF561 - points on some byte+1

Strictly speaking it is wrong. Representation of pointers in memory, number of bits in byte, those are not fixed by standard.
If you move away from x86, you will find many strange and bizzare things.

oseri (30)

Good question, simple answer :-)

Try using multiple digits instead, and the answer will reveal itself.

char array[] = { 171, 205, 239, 186, 5, 6, 7 };
or
char array[] = { 0xab, 0xcd, 0xef, 0xba, 5, 6, 7 };

So, is it because 1 byte = 8 bits, while single hex digit is only 4 bits, so an extra hex digit is needed to display whole char value ?

Although I've read that 1 byte isnt always equal to 8 bits.

tipaye (535)

Yes Oseri, if the "cout <<" had forced decimal output, OP would have never had this question.

I think in the 21st century, a byte is always 8 bits.

However, a char may or may not be a byte, although it is on most systems.

MiiNiPaa (8886)

I think in the 21st century, a byte is always 8 bits.

Sadly, no :(.
In specific embedded hardware a byte can be 32 or even 64 bits. DSP chips often have that:
http://lists.xiph.org/pipermail/speex-dev/2006-April/004379.html

Because these processors have 16 bit char size, there is some special code
in bits.c to handle the packing.

oseri (30)

Yes Oseri, if the "cout <<" had forced decimal output, OP would have never had this question.

Well, I'm the OP ;)

Thank you all for help.

tipaye (535)

@MiiNiPaa Just checked out your link, I didn't see any references to 32 or 64 bit bytes. Could you copy and paste please?

MiiNiPaa (8886)

Just checked out your link, I didn't see any references to 32 or 64 bit bytes

It is probably because it discuss chip with 16 bit byte.
It is hard to find documentation on such chips in public access as it usually is closed information. Here is wiki link:
http://en.wikipedia.org/wiki/Super_Harvard_Architecture_Single-Chip_Computer

it knows nothing of 8-bit or 16-bit values since each address is used to point to a whole 32-bit word, not just a octet. It is thus neither little-endian nor big-endian, though a compiler may use either convention if it implements 64-bit data and/or some way to pack multiple 8-bit or 16-bit values into a single 32-bit word. Analog Devices chose to avoid the issue by using a 32-bit char in their C compiler.

http://discuss.fogcreek.com/joelonsoftware/?cmd=show&ixPost=75195

I've worked on a system (using a TI DSP processor) where char, short, int and long were ALL 32 bits; the CPU only natively handled 32-bit chunks, nothing shorter.

https://books.google.ru/books?id=70scPIwK7cUC&pg=PA486&lpg=PA486&dq=32+bit+char+dsp&source=bl&ots=4bkmmDGSBQ&sig=v_YUmbNSpg9HVlKSevQLjJpd_kg&hl=ru&sa=X&ei=rNH-VKm-Lqn6ywPD9oGYDA&redir_esc=y#v=onepage&q=32%20bit%20char%20dsp&f=false (The table is self-descriptive)

Last edited on

tipaye (535)

It still appears to me that you're talking about 16-bit or 32-bit char, not byte.

Do you explicitly mean char which could be different sizes on different architectures

However, a char may or may not be a byte

or byte which is actually defined as 8-bits in the dictionary?

MiiNiPaa (8886)

Do you explicitly mean char which could be different sizes on different architectures
However, a char may or may not be a byte

or byte which is actually defined as 8-bits in the dictionary?

char size and byte can be used interchangeably, as char is defined to have size of 1 byte:

Standard wrote:
1.7.1 The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation defined. 5.3.3.1 The `sizeof` operator yields the number of bytes in the object representation of its operand. [...] `sizeof(char)`, `sizeof(signed char)` and `sizeof(unsigned char)` are 1.

Standard wrote:

1.7.1
The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation defined.

5.3.3.1
The sizeof operator yields the number of bytes in the object representation of its operand.
[...]
sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1.

byte which is actually defined as 8-bits in the dictionary?

Byte was never ~~ever~~ defined as containing only 8 bytes before 2008 IEC standard, which is not internationally followed. You are mixing it with octet.
In fact, when term byte was introduced, it contained only 4 or 6 bits depending on machine. With the creaton of ASCII, most commongly used byte size was 7 bits. That created ugliest thing in existence called KOI7.
http://en.wikipedia.org/wiki/Byte

The byte (/ˈbaɪt/) is a unit of digital information in computing and telecommunications that most commonly consists of eight bits.
[...]
The unit octet was defined to explicitly denote a sequence of 8 bits because of the ambiguity associated at the time with the byte.

I regularly see code which checks CHAR_BIT, and I did saw implementation which define it to something other than 8. And I saw platform where size of every single base type was 1 because of 64bit byte.

Last edited on

tipaye (535)

Thanks for that, but don't misunderstand me. I'm not insisting that a byte is 8 bits. I was trying to understand clearly if you meant byte or char.

I vaguely remember the old-timey bytes and things like 0.5 bits that were tried in the past, but I honestly haven't seen them used since leaving school, so I was curious about what architectures are using it, and I didn't want to set off on a wild goose chase if you had meant char not byte.

Also, blame my dodgy "dictionary" :-)

I had another look at the links you posted before, and if I had gone far enough in the discussion I'd have seen the non 8-bit byte DSP posts.

I've worked on a system (using a TI DSP processor) where char, short, int and long were ALL 32 bits; the CPU only natively handled 32-bit chunks, nothing shorter.

So that would make for 32-bit bytes then...

MiiNiPaa (8886)

things like 0.5 bits that were tried in the past, but I honestly haven't seen them used since leaving school

You have probbly seen them when learning about information theory. It is roughly like 1.5 person.

Topic archived. No new replies allowed.