Encoding for German umlauts

Hello,

I'm having a problem with encoding and don't really know, where to begin. So, I'm going to describe it in detail:

I wrote a little programm with GUI, which has multiple textboxes. The text, that I enter in the textboxes replaces some pattern-words in .txt-files (text is read from a source file and is written in a destination file). It works fine, until I enter German umlauts (öäü). The umlauts are not displayed right in the txt-files, after the patterns are replaced.

What I tried till now:

- Saved the files, both source and destination in UTF-8 encoding
- Tried different text-editors
- Tried the wcstombs function
- Tried codecvt::do_out function (I have to admit, the function is too difficult to understand for me) and I'm not sure, if I have to convert from UTF-16 to UTF-8 and that is my main problem.

I don't know, from what encoding to which to convert and on which stage the encoding is changed by the programm.

This is how the pattern is replaced:

1
2
3
4
5
6
7
8
while (getline(fin, temp))
{
    pos1 = 0;
    //str1 is the pattern-string
    while ((pos1 = temp.find(str1, pos1)) != string::npos)
    {
        temp.replace(pos1, str1.size(), (context.marshal_as<std::string>(mainField->Text)));
}


This is what the entered string looks like in the file, when I enter "ASDäöü": http://picload.org/image/oicaldg/unbenannt.png


I really hope, that you can help me with this.

Thank you in advance!
I forgot to mention, that I'm using Visual Studio 2012.
No one an idea? Could you at least recommend me, where I can find people, who might know that? I haven't found any really active German forums about C++.
Hello fellow german.

It's really hard to do any unicode stuff with the c++ stdlib and Windows GUI controls.
AFAIK windows uses a somewhat UTF-8 looking UTF-16 dialect with some sprinkles of BOM and some old-school wide-char ideas. So it's basically useless.

I'm afraid you have to tinker with the characters on a very low level. So using std::string is probably a bad idea. Maybe you could build your own string class which is able to distinguish between UTF8, UTF16, UTF8/16 + BOM, wide char and good old ASCII.

And, be aware of the locales settings for your IDE, the current OS, the target OS, the lib and the objects you using. It's a pain in the *ss to deal with 'deutschen Umlauten' or with unicode characters in general.

AND: I strongly recommend to learn how unicode works (the wikipedia article is pretty good (the english one)).
plexus,

Thanks a lot for this information, even though it's kind of disappointing, that it's that difficult to get this less functionality.

And thanks for the article!
This one is a good starting point (as the article title suggests):
http://kunststube.net/encoding/
Topic archived. No new replies allowed.