Character array to wide string?

I am still working on my project which will be reading some old data from some old DOS files. The data stored there is naturally, char*. Once I read in my character array, how do I assign this to a wstring since my application is UNICODE?

Here is my current solution:
1
2
3
4
5
6
wchar_t* Class::Function(char *pName)
{
  //I verify the pointer and such first, then do the below
  this->_Name.assign(pName, (pName + strlen(pName));
  return this->_Name.c_str();
}

Am I on the right track here?
The root questions are:

1) How is the source data encoded (UTF-8 seems unlikely if this is old DOS files... but I guess it depends on HOW old).
2) Do you care about preserving anything beyond the ASCII set?
3) How is the dest data to be encoded (I would assume UTF-16?). Given that you said "UNICODE" I'm assuming this is on Windows and you want UTF-16.


WinAPI provides a method (MultiByteToWideChar) which can convert pretty much any codepage, as well as UTF-8, to UTF-16.


Or... if you don't care and all you care about is the basic ASCII set... it's a straight 1:1 copy (just converting a 1 byte character to a 2+ byte character):

1
2
3
4
5
6
7
8
9
10
void copyToWide(wchar_t* dst, const char* src)
{
    while(*src)
    {
        *dst = *src;
        ++dst;
        ++src;
    }
    *dst = 0;
}


EDIT: your use of assign is actually simpler than my approach. Hah. That will work just fine if you don't care about anything beyond normal ASCII.
Last edited on
Yes I am on Windows using wchar_t which is 16bit, but I will also be using this on Linux where wchar_t is 32bit. The good news? I am only READING the data on both systems so there will be no new writing at this point. If I decide to write data and share it, I will write it as 16bit for Windows and easily read that into the Linux equivalent.

Now it is old DOS, not UTF8. I am assuming old ASCII only, and I do not plan on converting it to any other languages at this point. That would be a project in itself, so the ASCII set is fine with me. I can jump any conversion hurdles down the road.

Finally, it looks as though my call to the "assign" method is correct? I am asking because this will be a library and as such I will not be able to test it until after the program which uses this library is also to a certain stage in development.
Finally, it looks as though my call to the "assign" method is correct?


Yes.

Though I would question your use of wchar_t in this library, as chars are typically much easier to work with. Case in point... you just mentioned that wchar_t is different sizes on different platforms... which makes it more difficult to write portable code.
Yes, but everything is UNICODE now. Plus it will allow me to target other languages and such in the future if I need to. Just because it is easy doesn't mean it is right, after all. I honestly don't know anybody not coding UNICODE apps anymore, and I enjoy learning as I go with the new UNICODE stuff.
UTF-8 is unicode... and can be contained in a normal char array. Or a normal std::string.

In fact.. IIRC, the API for filesystem I/O on linux platforms all accept UTF-8 strings (const char*).
Last edited on
I know you can use UTF8 with a char array, but I have never come across a UTF8 file that wasn't using some other form of storage. Most of the time it is wchar_t. This particular file was out about the same time Windows 95 was released, so no worries of UTF8 there. If it was a Linux system, maybe. Linux always seems to be ahead of the game.
The data stored there is naturally, char*. Once I read in my character array, how do I assign this to a wstring


Why read into a char array in the first place? Open the file as std::wifstream, and you can getline or operator>> straight into a std::wstring.
I know you can use UTF8 with a char array, but I have never come across a UTF8 file that wasn't using some other form of storage.


Different strokes I guess. I've found that UTF-8 is more common pretty much everywhere. Especially in places that had to be 'upgraded' while still maintaining backward compatibility (switching from ANSI to UTF-8 is much easier than switching to UTF-16).

Posix API is an example .. none of it from what I've seen uses UTF-16. Any kind of binary file with embedded comments (zip, png, etc). In fact the only place I can think of where I've seen UTF-16 in widespread use is in WinAPI.

But whatever... it's your code and you can do what you want. =) Don't let me bully you.
I just wanted to answer Cubbi. I have to read the file into a char array due to the file being binary. It contains 3D object data as well as ANSI names for said objects.
Topic archived. No new replies allowed.