Character sets for visual c++

Hello everyone, I have just started studying visual c++ using the book Ivor Horton's Beginning Visual C++ 2012 and among the first lessons in the book it states that for console programs I need to change the Character sets from 'Unicode' to 'not set' otherwise the programs won't compile. Before this I didn't even know about this and when I made console programs, they all ran without problems. Can someone explain what the character sets are as the book doesn't go into any detail on the matter.
As long as you stick to ANSI/roman characters you should be fine.
Explanation about character sets is not easy but http://en.wikipedia.org/wiki/Unicode may be a good start.
Well then I guess I will do as the book says until I know enough to make my own decisions.
can i interject here? why would it not compile if it was unicode? i thought the point of unicode was to hold multiple languages. also, instead of switching it off can't you use std::locale?
Its possible that the book uses WinAPI functions, of which there come two varieties - the unicode and the ansi versions. In unicode, the following would not compile:
MessageBox(NULL, "Some Error Message", "Error", MB_OK | MB_ICONERROR);
Because it is expecting wide strings. Rather, you would have instead have to have used the following:
MessageBox(NULL, L"Some Error Message", L"Error", MB_OK | MB_ICONERROR);

Of course, this is just me guessing.
closed account (2AoiNwbp)
The author states that by default, the project options will be set to use Unicode libraries. This makes use of a non - standard name for the main function in the program..
Because these use MS specfic headers which provides those "tmain" functions whereas tmain could be "main" for standard C++ programs using null terminated strings with ASCII characters, or tmain could be wmain if you decide to work with unicode. Those values are "defines" in tchar.h (as far as I remember), which is MS specific.
In unicode, the following would not compile:
MessageBox(NULL, "Some Error Message", "Error", MB_OK | MB_ICONERROR);
Because it is expecting wide strings. Rather, you would have instead have to have used the following:
MessageBox(NULL, L"Some Error Message", L"Error", MB_OK | MB_ICONERROR);


Grah! Both of these are wrong.

There are 3 forms of WinAPI functions.

The 'TCHAR' version (MessageBox)
The wchar_t version (MessageBoxW)
The char version (MessageBoxA)

The below calls are correct and will compile 100% of the time:
1
2
3
4
5
6
7
8
// char strings, char function:
MessageBoxA(NULL, "Foo", "Bar", MB_OK );

// wchar_t strings, wchar_t function:
MessageBoxW(NULL, L"Foo", L"Bar", MB_OK );

// TCHAR strings, TCHAR function:
MessageBox(NULL, TEXT("Foo"), TEXT("Bar"), MB_OK );


The below calls are incorrect (even though they might sometimes compile):
1
2
3
4
5
// char strings, but TCHAR function
MessageBox(NULL, "Foo", "Bar", MB_OK);

// wchar_t strings, but TCHAR function
MessageBox(NULL, L"Foo", L"Bar", MB_OK);




among the first lessons in the book it states that for console programs I need to change the Character sets from 'Unicode' to 'not set' otherwise the programs won't compile.


You only need to do that if you are calling the wrong kind of function. See my info above.

If you are using char strings... just put an 'A' after WinAPI function/struct names to get the char version.


EDIT: FWIW, TCHARs are stupid and I always avoid them. I typically use the 'W' version of WinAPI functions/structs exclusively, with the occasional exception.

Don't expect people to fiddle with their compiler settings in order to get poorly written code to compile. Instead, write your code correctly.
Last edited on
closed account (2AoiNwbp)
There's a macro (I believe it's in tchar.h) called TEXT() that you can use with string literals, so the general version of MessageBox would comile if you try:
 
MessageBox(NULL, TEXT("Foo"), TEXT("Bar"), MB_OK);

independently if you are using unicode or not.

regards,
Alejandro
Yes, TEXT() takes a string literal and makes it a TCHAR string. Much like how the 'L' prefix takes a string literal and makes it a wchar_t string.

1
2
3
"foo" <- char
L"foo"  <- wchar_t
TEXT("foo") <- TCHAR



However this is just a literal qualifier and does not convert strings that are not literals. For example:

1
2
3
4
5
std::string foobar = "Some text from the user";

MessageBox( NULL, foobar.c_str(), TEXT("foo"), MB_OK ); // <- wrong!  foobar is a char
   // string.  Need to use 'MessageBoxA'  (or convert to a TCHAR string)- cannot use
   //  TEXT() to convert foobar because foobar is not a literal. 




The premise of this is very simple. The only thing that really makes it confusing is the weak typing employed by C.

There are 3 distinct character types: char, wchar_t, TCHAR. There are 3 distinct versions of all WinAPI functions and structs (at least those which take strings) -- one version for each character type.

Call whichever version of the WinAPI function that matches whatever character type you have.
Last edited on
closed account (2AoiNwbp)
Yes, actually, I think that the only thing that TEXT() do, is to put an L before a string literal in case _UNICODE_ or something like that is defined.
But all these macros (even MessageBox) are intended to write one version of code with both capabilities, Unicode and ASCII.
If you already know that you are not going to use unicode, just leave all that MS stuff, and stick to char*
make it simple!
Last edited on
But all these macros (even MessageBox) are intended to write one version of code with both capabilities, Unicode and ASCII.


TCHARs are a legacy holdover for backwards compatibility with the Win9x days, when Windows did not internally use UTF16 for all text.

Nowadays, wide character output is the 'norm'... and the narrow (ANSI) output gets widened automatically by Windows.


The only advantage to using TCHARs is that they have the ability to toggle between wide and narrow characters as a build setting. So, if you wanted, you could compile 2 different versions of your program: Unicode and ANSI... by just flicking a switch.

The problem is that's dumb, because if you're going through all the work to use TCHARs properly, there is no reason that you'd want to do an ANSI build. So you might as well just use wide chars outright to keep it simple.

If you already know that you are not going to use unicode, just leave all that MS stuff, and stick to char*


I agree.

But you can't leave all the MS stuff because WinAPI is MS stuff.

If you're using chars, then use the char version of the WinAPI function. IE: Use MessageBoxA instead of MessageBox.

make it simple!


It doesn't get much simpler than putting an 'A' after the function name.
closed account (2AoiNwbp)
But you can't leave all the MS stuff because WinAPI is MS stuff.

But you leave all the MS stuff, by using MessageBoxA... even though it is still winapi.
As if you were programming for another API. You can stick to ANSI and still use winapi, can't you?
I'm not sure I understand the question.

MessageBoxA is part of WinAPI
WinAPI is Microsoft's API for Windows

Using WinAPI = Using MS
closed account (2AoiNwbp)
I think we must clarify something here.

By default, the project options will be set to use Unicode libraries. This makes use of a non - standard
name for the main function in the program. In order to use standard native C++ in your console
programs, you need to switch off the use of Unicode libraries.
... that's Ivor Horton's Beginning Visual C++ 2010

I mean, all that stuff. You don't need those Unicode libraries and macros like _tmain or _tWinMain or TEXT(), or whatever specific to MS, to write standard ANSI C++ programs (named as native C++ by Horton to make a difference with these MS specifics and C++/CLI).

Recall that this has to do with writing standard C++ console programs in Visual Studio, it is not about writing Windows applications. He starts with standard C++ and builds up to programming Windows specifics. (but this is the first part of the book, just to introduce ANSI C++)

On the other hand, what I said, and is more a question than a confirmation:
But you leave all the MS stuff, by using MessageBoxA... even though it is still winapi.
As if you were programming for another API. You can stick to ANSI and still use winapi, can't you?

If you access to a windows api like say, networking, from a console application, you just can stick to char or wchar_t and access the A or W ended versions of the winapi functions, writing standard C++ without using those _tchar TEXT() macros, etc. Am I right?
Last edited on
@Disch
Sorry I'm a bit late in, but I'll just clarify - I meant the book might be doing that, I never meant to suggest that they should be used.

@aabuezo
I think that this is for compatibility reasons - older versions of Windows didn't support unicode (I'm pretty sure that 9x were ANSI). Basically, these are there in case you want to support these older operating systems, and also so that legacy code will continue to compile. AFAIK, TCHAR is also a system for getting around this problem - legacy code will be properly wide or narrow based on the target OS.
closed account (2AoiNwbp)
Thank you for your answer NT3, although I edited my post without seeing it.
My question was, if
Nowadays, wide character output is the 'norm'... and the narrow (ANSI) output gets widened automatically by Windows.

then, what is the aim to keep using MessageBoxA.
As I mentioned, it is really for one main reason: Backwards compatibility.
aabuezo wrote:
I mean, all that stuff. You don't need those Unicode libraries and macros like _tmain or _tWinMain or TEXT(), or whatever specific to MS, to write standard ANSI C++ programs (named as native C++ by Horton to make a difference with these MS specifics and C++/CLI).


This is all correct. Visual Studio is perfectly capable of writing programs that only use standard libs and do not use anything MS specific.

I think we agree, I'm just getting a little confused by your terminology.

If you access to a windows api like say, networking, from a console application, you just can stick to char or wchar_t and access the A or W ended versions of the winapi functions, writing standard C++ without using those _tchar TEXT() macros, etc. Am I right?


Yes. In fact I recommend it, since chars/wchar_ts are much, much easier to work with and are less error prone. I avoid TCHARs like the plague.

Although... another terminology quirk here. The 'TEXT' macro is not a C++ standard but it is certainly a WinAPI standard. And if you're already using WinAPI for its networking functions... then you're already going outside the C++ standard, so it doesn't hurt to use additional things from WinAPI (like the TEXT macro).

It's not like VS is the only compiler to support the TEXT macro -- all compilers support it. It's not even a compiler feature, it's a part of a library.

NT3 wrote:
Sorry I'm a bit late in, but I'll just clarify - I meant the book might be doing that, I never meant to suggest that they should be used.


Fair enough. I just see this problem come up on these forums all the time. It's a very simple problem, but tutorials/books either seem to avoid it, or don't seem to understand it and teach the wrong thing, so I get a little frustrated when I see it now.

then, what is the aim to keep using MessageBoxA.


NT3 mentioned backwards compatibility. But it's also for convenience. If you have string data (like from a file or soething) that is not UTF16, and you know it only contains ASCII characters, it's easier to output it to an 'A' function than to manually widen it to UTF16 and output it with a 'W' function.

Calling the 'A' function has the same effect in that case, and Windows simply does the widening for you so you don't have to do it yourself.



Though there is the question of how long the 'A' functions will continue to be supported. I have heard from different people that they have been deprecated, although I have not seen anything on any official source which supports that, so it might just be a rumor.

Frankly, I think there is too much code out there using it for MS to ever deprecate the narrow API. It's more likely that they'll deprecate WinAPI entirely and replace it with some other API.
I am sorry guys, I didn't even know this conversation was still going on ( I didn't get any emails). Thanks for all the replies though I have to admit a lot of this is over my head. I am only on chapter 4. I am sure as I go through the book I will come across all the functions you have listed which will make all your answers more meaningful to me.

I have been doing the exercises in the book and I haven't changed the settings, it is still set to Unicode. I didn't however try to compile any of the examples. So far I have been able to compile all the programs I wrote without running into any problems, aside from typos of course.

I asked this question because I was concerned about any immediate problems I might run into but from your answers it seems that I don't have to worry about this stuff until I start using the windows API which I believe begins at chapter 10. I will revisit this once I get there in hopes of better understanding this matter. Thanks for all the effort you guys put into this and sorry again for my delayed reply.
Topic archived. No new replies allowed.