std::string to std::wstring with extended characters (unicode)

Hi, I have an app that is required to run with some extended characters, such as the characters defined in Latin-1 Supplement and, possibly, Latin Extended-A, as described in Unicode ( see http://en.wikipedia.org/wiki/List_of_Unicode_characters, for example), so my compiler is set to use the Unicode character set.

I get an std::string that includes such characters ( e.g. "ú", "é", and others), and I need to output a wchar_t*.

Can wchar_t* even handle these characters?



This is how I usually convert:

1
2
3
4
5
6
7
8
9
std::string input = "Bancé";

std::wstring wCmd = std::wstring(cmd.begin(), cmd.end()); //(1)

WCHAR* wCCmd = const_cast<WCHAR*>(wCmd.c_str());  // wchar_t typedef

SQLWCHAR* output = wCCmd; //wchar_t typedef



I am using windows typedefs, but WCHAR* and SQLWCHAR* are just the same type wchar_t*, i.e. what I want as out.

This conversion usually works, but, for the case where I have a string, such as "Bancé" above,

at step (1) (i.e. conversion from std::string to std::wstring), the extended character "é" becomes "←", (i.e. "Bancé" becomes "Banc←").

What can I do to use extended Unicode characters (at least Latin-1 Supplement, and possibly, Latin Extended-A) in my std::wstring and the types that follow it?

(I guess it comes down to converting a char that supports these characters to a wchar_t that supports these characters, but, in the end, I am using std::string as input, so I kept it that way in code).

Does it depend on my compiler settings, or something else entirely?




Thanks for any help!!! :)

C :)

Last edited on
try putting an L before your wide string literal
std::string input = L"Bancé";
On Windows (in Visual Studio specifically), when you write "Bancé", you're actually writing "Banc\xe9" (on sane systems, such as Linux, you actually get "Banc\xc3\xa9", but that's another story)

When you use wstring's range constructor, it performs a static_cast for each char in your string to form a wchar_t, so it converts '\xe9' into L'\xffe9', which you see as a box. What you're looking for is L'\x00e9', which you can get if you use a real multibyte-to-wide conversion. There are many in C++, here's one that works in this case (it relies on the default locale's ctype::widen()):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <string>
#include <sstream>
#include <fcntl.h>
#include <io.h>

int main()
{
    std::string input = "Bancé";
    std::wostringstream conv;
    conv << input.c_str();
    std::wstring wCmd(conv.str());

    _setmode(_fileno(stdout), _O_WTEXT); // MSVC's special needs
    std::wcout << wCmd << '\n';
}
Last edited on
awesome, I'll have to us that more often! Thanks for your help!! :)
Thanks For Use.
Topic archived. No new replies allowed.