question about wstring indexes

I'm trying to feed a wstring into a function and in that function I am wanting to check what each character is in that wstring. But when I run the program I am getting an out of range error. Stepping into the program I'm seeing that the value of wstring is what I want it to be but I'm also seeing that the size of that wstring is 0. Is there something that I'm missing?

I've got this function to work using wchar_t* but I'd like to use wstring (or preferably wstring&) if I can.
Do you have code that exhibits that behavior? There is nothing special about wstrings:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <iostream>
#include <string>
#include <locale>

void func(const std::wstring& str)
{
    for(size_t n = 0; n < str.size(); ++n)
        std::wcout << str[n];
    std::wcout << '\n';
}

int main()
{
    std::locale::global(std::locale(""));
    std::wcout.imbue(std::locale());
    func(L"空手道");
}

online demo: http://ideone.com/sO9O4n
I have a function that takes in a wstring and directs it to the appropriate parser. I'm working on a parser that changes text from one orthography to another. I've tried wstring, wstring& and const wstring& variants here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
void Parser::MainParser(const wstring& wstrInput, int str_len, int iSource)
{
	switch (iSource)
	{
	case PHONETIC: // Phonetic
		Phonetic(wstrInput, str_len);
		break;

	case SOURCE1: // Source1 orthography
		Source1(wstrInput, str_len);
		break;

	case SOURCE2: // Source2 orthography
		Source2(wstrInput, str_len);
		break;

	case SOURCE3: // Source3 orthography
		Source3(wstrInput, str_len);
		break;

	case SOURCE4: // Source4 orthography
		Source4(wstrInput, str_len);
		break;
	}
}


The source is selected based on the selection from a combo box. Then all of the above functions start like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
vector<wstring> Parser::Phonetic(const wstring& wstrInput, int str_len)
{
	// Set the size of the vector array to 1 (m_iCount)
	m_iCount = 1;
	m_Results.resize(m_iCount);
	
	// Set the first item in the vector array to an empty string
	m_Results[0] = L"";

	// Loop through the string and convert as necessary
	for (int i = 0; i < str_len; i++)
	{
		// Cycle through each character
		// Remove dashes
		if (wstrInput[i] == '-')
		{
			AppendString(L"");
		}


I run into trouble as soon as I hit the if (wstrInput[i] == '-') line. Since the program is saying that the size is 0, it is giving an out of range error when I try to access an element from that string. I'm not sure where the problem here is. Especially since the debugger shows the wstrInput to have the value that was submitted via the textbox.
Last edited on
what's the meaning of str_len? the length of wsrtInput is wstrInput.length().
Also it doesn't hold chars, so a more appropriate comparison would be == L'-'
That is one of my functions that I hadn't converted to == L'-'. But the one I am testing with is set up like that.

I tried using .length() and .size() but since the size is shown as 0, it just skips over the for loop.

You said the wstring doesn't hold chars. Another person on here suggested I go with wstring& here which is what I am trying to get figured out. But if a wstring doesn't hold chars, I'm not sure how I can use them. Will I just need to go back to wchar_t*?
it holds wchar_t. in any case, if .length() returns zero, the string is actually empty. How was it supposedly populated?
It is populated via a GetWindowTextW. In there it is fed into a wstring variable. Then it gets it from there and feeds it into my MainParser. Everytime I check wstrInput either in the MainParser or one of the SOURCE functions, the value is always what I typed into the edit box on the form. So if I type in "test" into the form and then insert a break at either of the above functions, wstrInput will have a value of "test", a size of 0, and a range (I think it is called) of 7...not sure what that is. I have no idea how it can have a value of "test" yet have a size of 0.
If string::length is returning 0 then there's nothing in it, so it is correct that the loop doesn't run. It does sound like something is going wrong with your string population code, and that the str_len value is getting out of step with the string's actual length.

You must eliminate the str_len parameters from your functions, e.g. use

Phonetic(const wstring& wstrInput)

rather than

Phonetic(const wstring& wstrInput, int str_len)

as it's dangerous -- you cannot say that a string has length str_len when in fact it has some other length.

string and all the standard constainers (vector, etc) know their own size (or length, in string's case.) You should not try and keep track of the size/length separately as it introduces the possibility that the length gets out of step with the actual size/length, as you have found out!

As an interim measure, so you don't have to chase the changes all the way though your code in one go, you could:

1. just ignore the parameter (do a search and replace: ", int str_len" -> ", int /*str_len*/ -- or similar, where I'm assuming you're naming is consistent.) This will break all functions which are erroneously using the str_len parameter so you can fix them.

2. add local str_len variables where required. e.g. change

1
2
vector<wstring> Parser::Phonetic(const wstring& wstrInput, int str_len)
{


to

1
2
3
vector<wstring> Parser::Phonetic(const wstring& wstrInput, int /*str_len*/)
{
    const int str_len = wstrInput.length();


So you don't have to alter the functions either (save for the new variable.)

Then you can alter the function signatures and tidy up the function code in a more leisurely fashion.

But if a wstring doesn't hold chars,

It does hold characters, or chars in the looser sense, but not chars in the sense of the C++ char data type.

In the Windows world, a wstring holds 16-bit wchar_t characters, rather than the 8-bit char characters which string uses. (wchar_t can be other sizes, e.g. x86 Linux uses a 32-bit wchar_t)

Andy
Last edited on
Yea, I saw where I could use the size() or length() of my wstring in place of my str_len but since I can't get it to work right, I've left it in there for now. As far as where I'm getting my lines crossed, I have an idea but is also because I'm not 100% sure what's going on. So here is where my wstring is used, how it is initialized, etc.

Here is the variable initialization.
wstring input = L"";

Here is where I get the text from the edit box. I think the problem may lie here but I'm not sure.
GetWindowTextW(hInput, (LPWSTR)input.c_str(), 256);

At this point, "input" has a value of what I put in the edit box (IE I usually put in "test"). It has a size of 0 and a capacity of 7. So here it is already having the problem of having no size. Is it because this is some kind of pointer? Or something to do with c_str()? I seem to remember someone saying that this makes a read-only copy or something to that effect. I may be way off base here.

Now it is fed into my parser object. I've already removed str_len at this point in anticipation of getting this fixed :).
g_Parser.MainParser(input, iSource);

Here is the above function:
void Parser::MainParser(const wstring& wstrInput, int iSource)

Then from there it is fed into:
Source4(wstrInput);

And here is the above function:
vector<wstring> Parser::Source4(const wstring& wstrInput)

I've already shifted everything to the "const wstring&". But like I said before, with it telling me that it has a size of 0, it just skips over the for loop which is:
for (int i = 0; i < wstrInput.size(); i++)

Like I said, I see the size is 0 way back at the GetWindowTextW line so that may be where my problem lies. If so, what would be a better way to go about it? In another thread, you used:

vector<wchar_t> buf(str_len + 1);

And used &buf[0] as a way to get text from a listbox selection and used that to append it to another wstring. I'm not sure exactly what is going on with &buf[0] as far as using that wchar_t vector though. But would using something like this be a better way?
Why are you using a wstring as a buffer here, rather than a vector<wstring> as you used in this post???
http://www.cplusplus.com/forum/windows/105554/#msg570330

1
2
3
4
5
6
7
8
wstring input; // wstrings do not need to be set to L"" as their
               // constructor does that for you

vector<wchar_t> buffer(256); // also automatically zeroed

GetWindowTextW(hInput, &buffer[0], buffer.size());

wstring input = &buffer[0];


I'm not sure exactly what is going on with &buf[0] as far as using that wchar_t vector though. But would using something like this be a better way?

If you mean, would it be better to use a wchar_t vector -- the answer is yes!

Andy

PS This line

GetWindowTextW(hInput, (LPWSTR)input.c_str(), 256);

is telling GetWindowTextW that input is 256 wchar_t's long when it's actually an empty string.

The fact you need to cast away the const (string.c_str() if for reading values only) should also have been a hint.
Last edited on
@andywestken

Can't he use wstring the way you are using the vector?
1
2
3
wstring wstr;
wstr.resize(256);
GetWindowTextW(hInput, &wstr[0], buffer.size());



I'm not certain if std::string guarantees contiguous memory, but it looks like it should work. http://ideone.com/DIw4Vf
std::string is guaranteed to provide contiguous storage in C++11, but not in C++03.

Even then, the problem with std::string / std::wstring is that they know their own length and therefore don't always pay attention to where the null terminator is located. For example, see where the quote are around the string returned by test_string (code is included below)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#1 test_string

[GetWindowTextW called with nMaxCount = 256]

GetWindowTextW returned : "My Test Window Class Name


                                           "

#2 test_string

[GetWindowTextW called with nMaxCount = 256]

GetWindowTextW returned : "My Test Window Class Name"


To fit the contained text to the string, you need to use

1
2
3
4
wstring wstr;
wstr.resize(256);
GetWindowTextW(hInput, &wstr[0], buffer.size());
wstr.resize(wcslen(&wstr[0])); // fit the string to the text  


or, as GetWindowTextW returns the number of chars / wchar_t's copied into the buffer (if successful.)

1
2
3
4
wstring wstr;
wstr.resize(256);
int ret = GetWindowTextW(0, &wstr[0], wstr.length());
wstr.resize(ret); // fit the string to the text  


(shrink_to_fit is available in C++11, i.e. GCC, but didn't do the job.)

Andy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <iostream>
#include <string>
#include <vector>
using namespace std;

// pretend function
int GetWindowTextW(void* /*hWnd*/, wchar_t* lpString, int nMaxCount);

void test_string();
void test_vector();

int main()
{
	test_string();
	test_vector();

	return 0;
}

void test_string()
{
	wcout << L"test_string" << endl;
	wcout << endl;
	wstring wstr;
	wstr.resize(256);
	// or :
	//wstring wstr(256, L'\0');
	GetWindowTextW(0, &wstr[0], wstr.length());
	wcout << L"\"" << wstr << L"\"";
	wcout << endl;
}

void test_vector()
{
	wcout << L"test_string" << endl;
	wcout << endl;
	wstring wstr;
	vector<wchar_t> buffer(256);
	GetWindowTextW(0, &buffer[0], buffer.size());
	wstr = &buffer[0];
	wcout << L"\"" << wstr << L"\"";
	wcout << endl;
}

// pretend function
int GetWindowTextW(void* /*hWnd*/, wchar_t* lpString, int nMaxCount)
{
	wcout << L"[GetWindowTextW called with nMaxCount = "
              << nMaxCount << L"]" << endl;
	wchar_t testData[]= L"My Test Window Class Name";
	const int len = wcslen(testData);
	if(nMaxCount < (len + 1))
		return 0;
	wcsncpy(lpString, testData, nMaxCount);
	return len;
}
Last edited on
Ok, I got it to work. Thank you! Using c_str() wasn't much of a hint for me since there is still so much I don't know about what is going on "under the hood". To a certain extent (too much in my opinion but I'm still learning!) I "do stuff that makes the red lines go away". I'm not sure what exactly is happening sometimes when I do that but instances like this are helping me slowly learn what is going on. And I've taken up reading more of the technical stuff (IE "under the hood" stuff) since all of the "hello world" tutorials I've followed don't explain what is going on but more or less just say "type this out and compile".

But based on what you've been saying, it looks like using buffers are something I'm going to have to get used to. It is just with all the casting, going from one type to another, etc., it didn't occur to me that I'd have to use a buffer and that some sort of conversion would be available to me (which is what I ignorantly thought c_str() was doing).

When I was talking about I wasn't sure what &buf[0] was doing, I meant the & as well as the [0]. Is that somehow dereferencing what is at the first element?

Now I'm off to convert the rest of my functions :).
Last edited on
Is that somehow dereferencing what is at the first element?

operator& is the address-of operator -- so &but[0] is getting the address of the first element.

dereferencing is the other way round.

1
2
3
4
5
vector<int> buf(256);

int* ptr = &buf[0]; // set ptr to the address of the first elem of buf

int val = *ptr; // dereference ptr to get the value it points to 


Andy
Last edited on
@andywestken
Thank you for the detailed explanation.
Topic archived. No new replies allowed.