EXTENDED ASCII - How to 'cout' ?

Pages: 12
Hi to everyone!
I'm trying to output in my console application characters of the OEM Extended Ascii which is the one in the photo:
https://www.atlantiswordprocessor.com/en/forum/files/ascii_oem_155.gif

I'm using CodeBlocks17 on Windows 8.1.

When I input in the editor the command for the corresponding character I obtain the correct character, for example with "ALT + 200" it writes "╚".
But if I write:
 
  cout << "╚";

The console output 3 different characters!

To obtain one of the OEM Extendend Ascii characters i have to use the corresponding integer like:
 
  cout << char(200);



How can I output correctly this Extended Ascii characters directly writing them in the 'cout' using "ALT+numbers"?
Just a guess but check the formatting of your environment.

for (int i=0; i<257; i++) cout << '\n' << '[' << i << ' = ' << (char)i << ']';
ASCII defines 128 characters only so that for-loop might help you identify whether you compiler supports the respective character (with char data-type).

I think it's because your compiler might not support the character.
Last edited on
Since those "extended" ASCII characters have values outside the range of a signed char you will need to print them as either unsigned char or int values.

To obtain one of the OEM Extendend Ascii characters i have to use the corresponding integer like: cout << char(200);

Actually that result would be undefined behavior on most common systems since the value is larger than can be held in a signed char.

Hmm jib but that's not a problem on VS2017 (i.e char being signed or unsigned)

1
2
3
4
5
6
7
	unsigned char foo;
	foo = 129;

	signed char bar;
	bar = 129;

	cout << foo << ' ' << bar << '\n' << (int)foo << ' ' << (int)bar;

ü ü
129 -127


But maybe it's only VS2017 that handles like that, I would not know.

(that was referring to your second para)

Also I thought extended ASCII meant characters 128 - 256 (I know that you were referring to the pic though)

edit: But why is the 130th character -127 for signed??
Last edited on
The console output 3 different characters!

It's outputting the character in UTF-8, which requires three bytes for box drawing characters.
But apparently the console isn't set to recognize UTF-8.
If I remember correctly, windows uses UTF-16, so you might try using wcout instead of cout and prefix your strings with L.

─  ━  │  ┃  ┄  ┅  ┆  ┇    ┈  ┉  ┊  ┋  ┌  ┍  ┎  ┏

┐  ┑  ┒  ┓  └  ┕  ┖  ┗    ┘  ┙  ┚  ┛  ├  ┝  ┞  ┟

┠  ┡  ┢  ┣  ┤  ┥  ┦  ┧    ┨  ┩  ┪  ┫  ┬  ┭  ┮  ┯

┰  ┱  ┲  ┳  ┴  ┵  ┶  ┷    ┸  ┹  ┺  ┻  ┼  ┽  ┾  ┿

╀  ╁  ╂  ╃  ╄  ╅  ╆  ╇    ╈  ╉  ╊  ╋  ╌  ╍  ╎  ╏

═  ║  ╒  ╓  ╔  ╕  ╖  ╗    ╘  ╙  ╚  ╛  ╜  ╝  ╞  ╟

╠  ╡  ╢  ╣  ╤  ╥  ╦  ╧    ╨  ╩  ╪  ╫  ╬  ╭  ╮  ╯

╰  ╱  ╲  ╳  ╴  ╵  ╶  ╷    ╸  ╹  ╺  ╻  ╼  ╽  ╾  ╿

▀  ▁  ▂  ▃  ▄  ▅  ▆  ▇    █  ▉  ▊  ▋  ▌  ▍  ▎  ▏

▐  ░  ▒  ▓  ▔  ▕  ▖  ▗    ▘  ▙  ▚  ▛  ▜  ▝  ▞  ▟


Unicode: http://jrgraphix.net/r/Unicode/
UTF-8 Tool: http://www.ltg.ed.ac.uk/~richard/utf-8.html
Last edited on
dutch are you using linux?

Windows consoles support only few characters and for even those you need to set the console to mode.
@Grime, yes, I'm on linux. Try this on windows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
int main() {
    using std::wcout;

    wcout << L'┌';
    for (int i = 0; i < 18; ++i)
        wcout << L'─';
    wcout << L'┐' << L'\n';

    for (int i = 0; i < 8; ++i) {
        wcout << L'│';
        for (int i = 0; i < 18; ++i)
            wcout << L' ';
        wcout << L'│' << L'\n';
    }

    wcout << L'└';
    for (int i = 0; i < 18; ++i)
        wcout << L'─';
    wcout << L'┘' << L'\n';
}

Last edited on
????????????????????
?                  ?
?                  ?
?                  ?
?                  ?
?                  ?
?                  ?
?                  ?
?                  ?
????????????????????
Press any key to continue . . .


saved file as unicode and got this:
Press any key to continue . . .


edit:
But you can use
_setmode(_fileno(stdout), _O_U8TEXT);
(sets console to utf8)

┌──────────────────┐
│                  │
│                  │
│                  │
│                  │
│                  │
│                  │
│                  │
│                  │
└──────────────────┘


However Windows console only supports a few utf characters not nearly as many as Linux.
Last edited on
That's running the program in my last post, the one that uses wcout? I thought those characters would be UTF-16, not UTF-8. Did you compile it as unicode (I think you need to set that in the windows IDE somewhere)? Maybe the main should be wmain?

Anyway, it looks like it basically works, and as long as it supports the box-drawing characters then that's pretty useful.
Last edited on
Hmm jib but that's not a problem on VS2017 (i.e char being signed or unsigned)

It is a problem. Just because it appears to work doesn't make it correct. Overflowing a signed integral value produces Undefined Behavior, according to the standard, and with VS2017 a char is signed, meaning that valid values are between -126 to +127.
Last edited on
A while ago while I was printing characters of char in a loop I noticed that the characters were period.

i.e
cout << (char)64 << ' ' << char(64 + 256);


So I thought the compiler, when it found a value higher than 255, did something like this:
while (value > 255) value -= 255;

Overflowing a signed integral value


But it shouldn't be possible to overflow in datatype = integer assignment right? All other datatypes don't allow you to store a larger number than what is able to be stored, then why doesn't char do that? With char you can even assign 1205358923567823409724308574380673698743345 and it won't complain.

So why is char different?

And what exactly is happening when I write signed char val = 255? What happens when there is an integral flow for this? What is happening?

assigning to signed and unsigned any value gives them both the same character. Why is this like this if it's undefined behavior? And even at a number like 3535353535 this doesn't break..

Confusing.
But it shouldn't be possible to overflow in datatype = integer assignment right?

Yes it's possible.

Look at this small program:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <iostream>
#include <iomanip>

using namespace std;

int main()
{
    char test1 = 300;
    char test = 500;
    //cin >> test;

    cout << test << "  " << static_cast<char>(test);

}


And the messages produced by my compiler:

1
2
3
4
5
6
||=== Build: Debug in c++homework (compiler: GCC 8-1) ===|
main.cpp||In function ‘int main()’:|
main.cpp|598|warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘300’ to ‘','’ [-Woverflow]|
main.cpp|599|warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘500’ to ‘'\37777777764'’ [-Woverflow]|
main.cpp|598|warning: unused variable ‘test1’ [-Wunused-variable]|
||=== Build finished: 0 error(s), 3 warning(s) (0 minute(s), 1 second(s)) ===|

Remember that those "magic numbers" are of the int type.

overflow does not really work like that while loop, but its the same result.
binary..
1110
add 1:
1111
add 1, overflow:
0000
(really 10000 but the leading 1 is not stored but 'lost') so it just wraps around 'naturally' without the cpu or compiler 'doing' anything.

and when its signed, this is the magic of 2s complement. 2s comp says that a negative number is the positive number, but you not all the bits and add 1.
so
0011 is 2 + 1 is 3. negative 3 is then
1100+1 is 1101

lets add those up:
0011
1101
___0 carry 1
__00 carry 1 again
_000 carry 1 again
0000 carry 1 again (lost, -3 + 3 is zero)

when you put 255 into a signed byte its going to just assign the bits. then when you evaluate it you are going to get the value of those bits as if it were the negative number.


Okay so if you had a large number like 3643968430968346 which you had to take to a char,
Only 8 least significant bits are taken to the char (correct me if I'm wrong).

And if the char is unsigned, the 8th most significant bit would represent the sign, which can be overwritten by a value above 128.

Then for signed values, values above 128 shall overflow and give negative numbers. But as we noticed, both signed and unsigned will have the same character when assigned any value.

So for signed values, the first 128 characters are 0 to 127 and next 128 characters are -127 to 1.
I would have thought that the first 128 characters are -127 to 1 and next 128 characters are 0 to 128.

Is that correct? That's confusing.

In that case the binary logic is taking care of overflow perfectly (the last para about how the values of signed are stored), then in what case will there be a problem for overflowing char?
Last edited on
Okay so if you had a large number like 3643968430968346 which you had to take to a char,
Only 8 least significant bits are taken to the char (correct me if I'm wrong).

Possibly wrong. Remember a char can be signed or unsigned depending on implementation. Most of the current common implementations use a signed char and in this case your assumption would be incorrect.

And if the char is unsigned, the 8th most significant bit would represent the sign, which can be overwritten by a value above 128.

Wrong. An unsigned char has no sign bit since an unsigned char can only contain positive values in the range of 0 to 255.


Then for signed values, values above 128 shall overflow and give negative numbers.

Wrong. Overflowing or underflowing a signed integral value produces UB. Which means anything is possible.


So for signed values, the first 128 characters are 0 to 127 and next 128 characters are -127 to 1.
I would have thought that the first 128 characters are -127 to 1 and next 128 characters are 0 to 128.

Is that correct? That's confusing.

Yes you do seem confused. A signed char can hold values in the range -126 to +127. There is really no use in trying to break the positive and negative values into "first" or "second", this is a single range.


In that case the binary logic is taking care of overflow perfectly (the last para about how the values of signed are stored), then in what case will there be a problem for overflowing char?

The problem is that the standard specifically states that overflowing a signed integral type produces Undefined Behavior. The standard doesn't specify that an twos complement implementation is required or any or type of implementation of that matter. This is not the case for unsigned integral types.




Last edited on
+ A char has 8 bits. 1 bit is for sign in the case of signed char right?

+ Now suppose you try to cast a number 128 to signed char.
7 bits can only represent 128 values, if 0 is included, 128 cannot be represented by the 7 bits.

So the sign bit is overwritten because the compiler thinks that the sign bit represents 128.
And our unsigned character looks like this with 8 bits: 1 0 0 0 0 0 0 0

Which is -0??

+ Also, in the case where a number with 9 bits is trying to be casted to a signed char, 1 bit most significant bit is lost by the number and the 8th most significant bit represents the sign. Correct?

I just want to know what happens when you try to cast a high integer to char though I know it's not useful.
Grime wrote:
Which is -0??

10000000 as a signed byte in 2's complement representation is the value -128.
Last edited on
Like this?

1 0 0 0 0 0 0 0 
              
= 0 1 1 1 1 1 1 1
                    + 1
______________
= 1 1 1 1 1 1 1 1 = -127

because -(1+ 2 + 4 + 8 + 16 + 32 + 64)

But I got -127..

But,

-127 is the first value of signed char. We get -127 if we cast 128 to signed char.

But,

If we cast 128 to unsigned char, unsigned char is able to represent that with its 8 bits.
It will be the 129th value.

But,

128 casted to signed and 128 casted to unsigned, both appear to be printing the same character. How!?


Also,

1 1 1 1 1 1 1 1
is -127 in signed
-(1+ 2 + 4 + 8 + 16 + 32 + 64)

1 2 4 8 16 32 64
0 1 1 1 1 1 1 1
is +127 in signed
+(1+ 2 + 4 + 8 + 16 + 32 + 64)

So including 0 the total values between [-127, 127] inclusive is 255 values. But char should be able to have 256 values. Where have I gone wrong??

Last edited on
11111111 in 2's complement is -1.

You need to read up on 2's complement. It works like a binary odometer. To represent a negative number, start at all zeroes and roll it backwards that many clicks. So 00000000 minus 1 is 11111111 which represents -1, then you get 11111110 for -2, and eventually 10000000, which is -128. Note that this means the magnitude of the most negative value is one more than that of the most positive value! 128 negative values, 127 positive values and 0 add up to 256 values.

@JLBorges, query:
Does the C++ standard expect 2's complement representation? I ask because of this:
(6.7.1) 1. .... The range of representable values for a signed integer type is -2^(N-1) to 2^(N-1)-1 (inclusive), where N is called the range exponent of the type.

http://eel.is/c++draft/basic.fundamental#1
Last edited on
jlb wrote:
A signed char can hold values in the range -126 to +127.

An (8-bit) signed char can hold integer values in the range [-128, 127].
http://eel.is/c++draft/basic.fundamental#1

This falls out of the twos-complement representation, which is required as of C++20.
http://eel.is/c++draft/basic.fundamental#3

Where have I gone wrong??

Your mental representation allows for a signed representation of zero - 0b1000'0000 and 0b0000'0000 both would have +/- 0.

Twos-complement has some nice properties. Not only does it eliminate signed zeros, which are often undesirable in integer math, but they have clean properties at the hardware level too.

Edit:
@dutch, I didn't see your post. Yes, twos complement is required.
Last edited on
Pages: 12