Priting multi-bytes characters

The 我 character UTF-8 code units are: E6 88 91.

Suppose that I have these hexadecimal values stored in an array:
1
2
3
4
unsigned char bytesArray[3];
bytesArray[0] = '0xE6';
bytesArray[1] = '0x88';
bytesArray[2] = '0x91';


Given this representation of a multi-bytes character, I would like to print the corresponding glyph (我) on the console.
How would you suggest me to do that?

Thank you for helping.
Sadly this is OS dependant. On *nix terminals I believe it outputs utf-8 by default. But on Windows I believe you have to change a setting. I forget exactly how to do it though.
This is becoming a frequently-asked question recently

On Linux, and other systems that support UTF-8 at the console driver level, just print it:

1
2
3
4
5
6
7
#include <iostream>

int main()
{
    char bytesArray[3] = {'\xE6', '\x88', '\x91'};
    std::cout.write(bytesArray, sizeof bytesArray);
}
demo: http://ideone.com/BdRpfZ

On more strict systems, you'd have to enable the locale to choose the correct format (after all, why default to UTF-8? It could've been GB18030 just as well). Locale names are OS-dependent. I'm using the POSIX locale for US English below, but any UTF-8 locale would work the same way.

1
2
3
4
5
6
7
8
9
10
#include <iostream>
#include <locale>

int main()
{
    char bytesArray[3] = {'\xE6', '\x88', '\x91'};
    std::locale::global(std::locale("en_US.utf8"));
    std::cout.imbue(std::locale());
    std::cout.write(bytesArray, sizeof bytesArray);
}

That is how C++ is supposed to work with Unicode. (C as well, for that matter, printf() and scanf() deal in multibyte sequences)


Now, on systems that did not bother implementing Unicode for their console output, you have to convert from UTF-8 to wide string, and then output using standard wide character functionality (which also requires a locale to be set)

You can do it C++11 way

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <locale>
#include <codecvt>
#include <string>

int main()
{
    char bytesArray[] = {'\xE6', '\x88', '\x91'};
    std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> conv;
    std::wstring wide = conv.from_bytes(bytesArray,
                                        bytesArray + sizeof bytesArray);

    std::locale::global(std::locale("en_US.utf8"));
    std::wcout.imbue(std::locale());
    std::wcout << wide << '\n';
}
(tested with clang++ on Linux)

Or C way
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <iostream>
#include <locale>
#include <cwchar>

int main()
{
    std::locale::global(std::locale("en_US.utf8"));
    std::wcout.imbue(std::locale());

    char bytesArray[] = {'\xE6', '\x88', '\x91'};

    std::mbstate_t state = std::mbstate_t();
    const char* end = bytesArray + sizeof bytesArray;
    const char* ptr = bytesArray;
    int len;
    wchar_t wc;
    while( (len = std::mbrtowc(&wc, ptr, end-ptr, &state)) > 0)
    {
        std::wcout << wc;
        ptr += len;
    }
}

(tested with gcc on Linux)

On Windows, you can do C++11 or C way, but you also have to enable wide character output on console using its special non-portable method

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <codecvt>
#include <string>
#include <fcntl.h>
#include <io.h>

int main()
{
    char bytesArray[] = {'\xE6', '\x88', '\x91'};
    std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> conv;
    std::wstring wide = conv.from_bytes(bytesArray,
		                        bytesArray + sizeof bytesArray);

    _setmode(_fileno(stdout), _O_WTEXT);
    std::wcout << wide << '\n';    
}
tested with Visual Studio 2012 but I remember this working with 2010 as well. Note that default console fonts on most installations of Windows do not include those characters. Either get such font, or just print your output to a file, which you can then open with Notepad (but you'll need more than just one Chinese character for autodetection to realize it's dealing with Unicode in this case)
Last edited on
Thank you very much for this answer.
Topic archived. No new replies allowed.