UTF-32 to UTF-8 range_error

Demonstration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Example program
// Prints a UTF-8 string to stdout
// Fails miserably on non-BMP characters (throws std::range_error)!

#include <ciso646>
#include <iostream>
#include <codecvt>
#include <locale>
#include <string>

void print( const char32_t* s )
{
  std::u32string u = s;
  std::cout << std::wstring_convert <std::codecvt_utf8 <char32_t>, char32_t> ().to_bytes( u ) << std::endl;
}

int main()
{
  std::cout << "trial 1: " << std::flush;
  print( U"\U0002A6A5" );  // BMP Chinese character found online
  
  std::cout << "trial 2: " << std::flush;
  print( U"\U001F1800" );  // non-BMP left arrow "<--"
}

What am I doing wrong?
Standard wrote:
2.14.5/15
Within char32_t and char16_t literals, any universal-character-names shall be within the range 0x0 to 0x10FFFF.
Clang actually captures in compile time
http://coliru.stacked-crooked.com/a/1dd342a1d0d37158
Er, I figured it out: I can't type.

U+1F1800 is not a valid Unicode character... there's an extra 1 in there.
(It should have been U+1F800.)

Foo.

[edit] Amazing how often I can check to see if someone has responded before I do and find it is not the case... even if it is.

Thanks MiiNiPaa!
Last edited on
Topic archived. No new replies allowed.