wstring null termination

Are wstrings null-terminated? The reason I ask is because for string/wstring.size(), it won't take the null character into account, and I know strings are null terminated. I'm basically trying to find the size in bytes of wstrings during run-time, for which right now I'm only doing(myString is a wstring):


sizeInBytes = sizeof(wchar_t)*myString.size()

This won't take the null character into account, and sizeof(myString) will throw out the value 28 for a 17 character wstring. I'm not sure how this comes out to be when sizeof(wchar_t)==2 and 17x2=34(greater than 28).
Any help is greatly appreciated.
You are assuming that sizeof(myString) can account for dynamically allocated memory, which it can't, and the string is stored in dynamically-allocated memory. The 28 bytes sizeof() is telling you about is just the bytes needed to declare myString. The actual size of the string in bytes requires the calculation you are performing, with the addition of the null char, because wstring's are null-terminated as well:

sizeInBytes = sizeof(wchar_t) * (myString.size() + 1);
Instead of wchar_t, use unsigned short. wchar_t can be implementation specific, can change on different platforms and may not be portable.

wide strings are implemented using unsigned short (utf-8 or utf-16) or may be by using unsigned long (utf-32) and putting a null character at the end is not a good option. Although, 0 would not be the value of any character but string length should be passed with the array.

check your wstring implementation. utf-8 or utf-16 can give a variable length to each code point.

see here for more details:

http://icu-project.org/docs/papers/codepages_and_unicode.html
writetonsharma said:
"putting a null character at the end is not a good option"

From what you're saying, it sounds like you have the option of adding a null character. Are you suggesting that wstrings do not automatically have null characters appended?
Null character is just a zero, ascii 48 i think and not a hex 0. So there is no way to know if its the end of string (NULL or a 0) or a integer value 0.

No string is automatically appended with null characters. We have to put them. The strings will be initialized with whatever there was previously.. garbage, null or any other values.
So you're saying if I define
 
std::wstring myString = L"data";

then it will only be == {L'd', L'a', L't', L'a'}, and != {L'd', L'a', L't', L'a', 0 or NULL}?
wstring is a class and will give you all the facilities automatically. How will it handle the end of string is implementation specific.

But now lets say I do this:

unsigned short str[] = {2354, 2325, 2327, 0}; //A string with some Devanagari characters

I have appended a zero at the end to know the end of the string. So this is a string with a null terminator but we cannot tell if the string ends here or not. May be 0 has a meaning or 0 has a value in the Devanagri character set. So having a length which says its length is 3 is important.
Topic archived. No new replies allowed.