Bits in a byte

I've read that the bits in a byte (in c++) are implementation or system dependent. What does that mean? Does it mean implementation of c++ compiler or the processor architecture or some other thing?

And I thought a byte is always 8-bit but:

"A byte usually means an 8-bit unit of memory. Byte in this sense is the unit of measurement that describes the amount of memory in a computer, with a kilobyte equal to 1,024 bytes and a megabyte equal to 1,024 kilobytes.

However, C++ defines byte differently. The C++ byte consists of at least enough adjacent bits to accommodate the basic character set for the implementation. That is, the number of possible values must equal or exceed the number of distinct characters. In the United States, the basic character sets are usually the ASCII and EBCDIC sets, each of which can be accommodated by 8 bits, so the C++ byte is typically 8 bits on systems using those character sets. However, international programming can require much larger character sets, such as Unicode, so some implementations may use a 16-bit byte or even a 32-bit byte."

I mean on a particular system, how do I know how many bits are being used in a byte by c++? Or am I thinking wrong and am just confused? Please help

Last edited on

helios (17504)

However, international programming can require much larger character sets, such as Unicode, so some implementations may use a 16-bit byte or even a 32-bit byte.

What load of bollocks. They're confusing byte and character.

The size of a byte is indeed CPU-dependent, but I don't know of any modern CPU where byte is not the same as octet. Maybe some embedded CPUs use something like that, but I don't know for sure.
IIRC, differentiating byte and octet is also important in low-level networking.

I don't think there's any way to know what size of byte the CPU uses without knowing the CPU you're writing for, or using system calls (as the OS knows what the CPU it was written for is).

Bazzy (6281)

how do I know how many bits are being used in a byte by c++?

There should be a macro called CHAR_BIT from climits which gives the number of bits per byte

Last edited on

gnobber (11)

Ok that clears it up. Thanks!

Hammurabi (399)

From the C++ Standard:

1.7 The C++ memory model
The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set and is composed of a contiguous sequence of bits, the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.

kempofighter (1183)

http://en.wikipedia.org/wiki/Byte

Even wiki states that a byte is "most often" 8 bits. I never really thought about it. I have yet to write a program where a byte isn't 8 bits. What confuses me about the statement is the info on unicode. Yes there is a unicode std but that doesn't change the fact that a byte is still 8 bits on most computer architectures. A unicode character is simply 2 bytes. I guess that stems from the fact that ascii is still the most basic execution character set so the byte being 8 bits must have to do with that.

Duthomhas (13116)

A unicode character is 21 bits, or 3-bytes. This typically translates to 32-bit characters that can handle any unicode character, or 16-bit characters that can handle just western characters.

However, hardware exists where a 'byte' is anything from 6 to 12 bits (IIRC). The C and C++ standards don't permit less than 8 bits per byte, however. Fortunately the era of the 7-bit byte is bygone.

I presume (and this is opinion) that the standard was written to accommodate ASCII and possibly various extended ASCII code pages. Note also that the standard doesn't say whether the type char is the same as unsigned char or signed char -- leading me to stick with the ASCII suffices prognosis.

Fortunately these days we have things like UTF-8... but in any case, an 8-bit byte representation is a convenient requirement when writing multi-platform C and C++ software. Even if your hardware actually has 9-bit bytes, presenting it as 8-bit to the programmer makes life much easier.

Topic archived. No new replies allowed.