Displaying Unicode characters on Windows, Mac and Linux?

If I use setlocale (see below) in the main function, it works alright taking and storing the user input and displaying it again, characters like "ů".

The problem is when I manipulate the string, e.g. with the substr function or a loop, they are presented as something else. Then I created a function to replace these.

As this page http://zuga.net/articles/unicode/character/016F/ addresses: in C++ "ů" is "\u016F" and so on. This works with the Visual Studio/Windows compiler, but with the Linux compiler I get a warning and a bunch of numbers instead of "ů" in the output.

How do I solve this for both platforms?

1
2
3
4
5
  //At the top of main()
  setlocale(LC_ALL, "")

  //my function, works in Windows
  return '\u016F';
'\u016F' is pretty much the the only way you can do it that is invalid C++ (technically, this is a multi-character literal, like '1234', with unspecified meaning)

Your choices are between strings: "\u016F", u8"\u016F", u"\u016F", U"\u016F", L"\u016F" and single code units u'\u016F', U'\u016F', L'\u016F'

Personally, I prefer UTF-8 strings everywhere, which, on non-Windows, are simply "ů" (or "\u016F", same thing), no setlocale needed: https://wandbox.org/permlink/rpQghvgTNYNtWqrK

If your Windows compiler is not too old, you can pass /utf-8 to the compiler, save your source as UTF-8, and do the same thing (except you'll need SetConsoleOutputCP(CP_UTF8) )
Last edited on
So basically since it's to quote the Linux compiler: "multi-character character constant" it needs to have double quotation marks. In the Windows compiler 'ů' actually works.

But, I shouldn't need SetConsoleOutputCP(CP_UTF8) since I've used setlocale in the main function?

Any good source why this happens when a string is manipulated and not before?
In the Windows compiler 'ů' actually works.

define "works". It doesn't issue a warning when compiling, but when I print 'ů' in my VS2017, I get 50607 as output.
Same when I use online MSVC here: https://rextester.com/HEXJTF42374

You could always try
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


#include <iostream>
#include <sstream>
#include <string>
#include <iomanip>

using namespace std;
	
int main(void){
/*

//const char D(46);
// Int (char) 88;
char x1 = static_cast<int>(88);
//char32_t i9 = U'/U00002155';
//char16_t i9 = u'q'; 
//char32_t i9 = U'\U0000222B';
wchar_t i9  {L'A'};
char16_t k10  {u'a'};
char32_t L11 {U'A'};

 
 }

 */
When I instead use setlocale(LC_ALL, "utf-8") some characters are stored correctly in variables, but the same characters that I have written in predefined strings display wrong instead.

In the Windows compiler using regex with ASCII hex works, but in the Linux compiler neither works with regex. Very confusing for a newbie brain.
Topic archived. No new replies allowed.