Unicode?

Im still studying c++.I heard about unicode (which will , hopefully, make me be able to use characters other than ascii characters.).What do you suggest me about it?Is there a library that I can use in C++ code?In some place I saw "ICU" but given website is down.
Avoid unicode. Of course it's good for a cross language application and stuff, but Unicode is very hard to deal with, and the standard library of C/C++ doesn't help you very much there. C++ has the char type wchar_t, but it's not really suitable to deal with Unicode characters because the size of wchar_t is not clearly defined (usually 16 bits, but you can't depend on that), whereas Unicode uses 32 bit characters. Also, Unicode is very problematic to deal with (e.g. the 'รค' of the german language can be written with one or two characters), so it's really an advanced topic you should usually avoid unless you are writing a browser or text editor or anything else that absolutely NEEDS to deal with Unicode characters.
Last edited on
whereas Unicode uses 32 bit characters

I'm not particularly familiar with the details of Unicode encondings, but I'm sure there are different sizes and even variable size in the context of a same enconding. For instance, UTF-8 stores corresponding ASCII character in a single byte, making it backwards compatible with ASCII, but it may use up to 4 bytes for other characters.
Last edited on
Certainly, but Unicode characters are still addressed using 32 bit numbers. Just the encoding is slightly different. It doesn't make dealing with Unicode characters the least bit easier, rather this forces you to differentiate between ASCII, 2, 3 and 4 byte characters.

EDIT: Just to clarify, my little excourse about Unicode size was just to explain that wchar_t is compiler specific.
Last edited on
Avoid unicode.


ACK! No no. You should prefer Unicode! The alternatives to it are a complete and utter mess.

Code pages? Give me a break.

It doesn't make dealing with Unicode characters the least bit easier,


You can deal with UTF-8 the same way you deal with a normal ASCII string.

Read: EXACTLY THE SAME

The only difference is the "length" of the string won't match the "size" of the string (ie: 4 bytes might only be 3 on-screen characters). But for coding purposes this doesn't matter as you can just use the size everywhere.

EDIT 2: Actually I suppose if you're manually modifying individual characters it would be more complicated with UTF-8, so the above isn't really true. /EDIT 2

EDIT:

I agree wchar_t is a bit absurd. But wchar_t doesn't necessarily mean Unicode (which adds to the absurdity of wchar_t).
Last edited on
I said, avoid unicode unless you absolutely have to deal with multilanguage characters. And thanks to some characters being present multiple times in Unicode, as well as some characters existing as distinct entities as well as being representable by a sequence of combining characters, stuff like the equivalency of two strings is hard to check.
What I understood from your short discussion is that,I must avoid unicode for the time being, not to complicate things.But, is there a simpler alternative way to be able to use at least some non-English latin characters?[Im not interested in , at this point, to deal with characters of languages such as chinese,japanese,arabic,indian,russian etc. ]
ANSI, if you have to. Though I would still recommend you to stick to good old ASCII.

Side note: What I said about Unicode is only for Unicode procession, or self defined output. If you just want to output unicode to a device that will deal with it on it's own, or just pass the characters around, you can use UTF-8 without problems.
Topic archived. No new replies allowed.