Separating words within a string

I need to separate words in a sentence(string), and return the individual words. Here is the code I have so far, it reads a line from main and separates the line into individual words...how do I store the words and return them to the main program?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Snt sepWrds(char *sentence)
{
	Snt infunc;
	//char *init;
	int i=0;
	char *next;
	string sent = sentence;
	
	do {
		string::size_type buf = sent.find(" ");
		//string::size_type prd = sent.find(".");
		string frst = sent.substr(0, buf);
		string nxt = sent.substr(buf + 1);
		cout << frst << endl;
		sent = nxt;
		
		//next = new char [nxt.size()+1];
		//strcpy(next, nxt.c_str());
		//if(!strcmp(next, ".")) {break;}
		//if(!sent.find(".")
		
		}while(sent.find("."));
		//strcpy (init, frst.c_str());
		//strcpy (next, nxt.c_str());
		
	return infunc;
}


...any suggestions?

P S

This code requires that each word in the sentence is separated by a space...I would have to enter:

i hit the ball .

with a space between the last word and the period...how would I modify my code so that the loop recognizes the period even though there is no space between the last word and the period?
Last edited on
The result of the find() function is string::npos in case it couldn't find the string.

You should first try to find the "." which is the limit where you search

If find() could actually find the string it returns the offset. Within you loop the offset of find() must be after the last found offset:
sent.find(" ", last_found_offset, period_offset);

Your while loop would look like:
1
2
3
4
5
6
string::size_type period_offset = sent.find(".");
string::size_type last_found_offset = 0;
while(last_found_offset != string::npos)
{
...
}


the loop doesn't reach period_offset...here is what I have...the program continues to loop, it outputs the word before the period over and over again. How do I get the program to acknowledge the "period" and output the word before it?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Snt sepWrds(char *sentence)
{
	Snt infunc;
	
	string sent = sentence;
	string::size_type period_offset = sent.find(".");
	string::size_type last_found_offset = 0;
	while(last_found_offset != string::npos) {
		
		string::size_type buf = sent.find(" ");
		//string::size_type prd = sent.find(".");
		string frst = sent.substr(0, buf);
		string nxt = sent.substr(buf + 1);
		cout << frst << endl;
		sent = nxt;
		//next = new char [sent.size()+1];
		//strcpy(next, sent.c_str());
		//next = strchr(sent, '.');
		//cout << next << endl;
		
		//checknext = sent.substr(buf+1);	
		//cout << checknext << endl;
		//if(checknext == "."){break;}
		sent.find(" ", last_found_offset, period_offset);
				
		//if(!strcmp(next, ".")) {break;}
		//if(!sent.find(".")
		//check = sent.substr(0, prd);
		//init = new char [check.size()+1];
		//strcpy(init, check.c_str());
		//char &nextt[2] = ".";
	}
		//strcpy (init, frst.c_str());
		//strcpy (next, nxt.c_str());
		
	return infunc;
}
Last edited on
closed account (DSLq5Di1)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
inline void split_string(const string& str, const string& delim, vector<string>& output)
{
    size_t start = 0, found = str.find_first_of(delim);

    while (found != string::npos)
    {
        if (found > start)
            output.push_back( str.substr(start, found - start) );

        start = ++found;
        found = str.find_first_of(delim, found);
    }
    if (start < str.size())
        output.push_back( str.substr(start) );
}

1
2
vector<string> tokens; // sub-strings stored here
split_string("Quick, brown.. fox! jumps lazy dog?", " .,!?", tokens);
Last edited on
I like sloppy9's, no doubt. But tokenizing a string is still something I find simpler to do in plain old C and the strtok_s() function (might be Microsoft-specific). It uses a duplicate of the original string because the function modifies it, but after that it is just peachy: It replaces all delimeters with null chars and return char pointers, making it super easy to generate std:string objects from them without the substring index math.
you can use "strtok" and "const_cast" if you want use in string or just strtok and copy content of string in char []
You have to change the last_found_offset, lik so:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Snt sepWrds(char *sentence)
{
	Snt infunc;
	
	string sent = sentence;
	string::size_type period_offset = sent.find("."); // Find the period for the end
	if(string::npos == period_offset) // if there's no period
	  period_offset = sent.size(); // go until the end of the string
	string::size_type last_found_offset = 0;
	while(last_found_offset != string::npos) {
		
		string::size_type start_offset = last_found_offset;
		last_found_offset = sent.find(" ", last_found_offset, period_offset); // finds the space after the last found
		string frst;
		if(last_found_offset != string::npos) // do this only when a space was found
		{
		   frst = sent.substr(start_offset, last_found_offset - start_offset); // the 2. Paramter is count not offset
		  ++last_found_offset; // we need to go beyond the space
		}
		else // if there're no spaces any more
		   frst = sent.substr(start_offset); // get the rest of the string

		cout << frst << endl;
	}
		
	return infunc;
}
Not tested! Check it out

Another option would be the stringstream
ok coder777 - the first method returns the entire sentence...I added "cout << (int)last_found_offset << endl; below to check the value of the buffer location...it returns "-1". here is code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Snt sepWrds(char *sentence)
{
	Snt infunc;
	
	string sent = sentence;
	string::size_type period_offset = sent.find("."); // Find the period for the end
	if(string::npos == period_offset) // if there's no period
		period_offset = sent.size(); // go until the end of the string
	string::size_type last_found_offset = 0;
	while(last_found_offset != string::npos) {
		//string::size_type buf = sent.find(" ");
		string::size_type start_offset = last_found_offset;
		last_found_offset = sent.find(" ", last_found_offset, period_offset); // finds the space after the last found
		cout << (int)last_found_offset << endl; /////////////////////////////////////////added code...
		string frst;
		
		if(last_found_offset != string::npos) // do this only when a space was found
		{
			frst = sent.substr(start_offset, last_found_offset - start_offset); // the 2. Paramter is count not offset
			++last_found_offset; // we need to go beyond the space
		}
		else // if there're no spaces any more
			frst = sent.substr(start_offset); // get the rest of the string
		
		cout << frst << endl;
	}
	
	return infunc;
}


do I need to use the command "sent.find(" ")" before the last_found_offset command?
Last edited on
thanks ahura24, string tokens are much easier! I found a simple solution!! Here it is. Thanks all for help!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Snt sepWrds(char *sentence)
{
	Snt infunc;
	char*pch;
	char *str;
	string sent = sentence;
	str = new char [sent.size()+1];
	strcpy(str, sent.c_str());
	pch = strtok (str," ,.-");
	while (pch != NULL)
	{
		
		printf ("%s\n",pch);
		pch = strtok (NULL, " ,.-");
	}
	
	
	return infunc;
}


A question though...can I store the "printed" values in separate, ordered containers for recall in another function...would I use a vector?
Last edited on
Yeah, strtok_s() (strtok() is not thread safe and can only be used for one string at a time) is really nice.

Save your tokens in a vector<std::string>.
I was wrong when it came to the 2. paramter of 'find()'. Here's the working version:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
	string::size_type period_offset = sent.find("."); // Find the period for the end
  string sentence;
	if(period_offset != string::npos) // if there's a period
    sentence = sent.substr(0, period_offset); // set the sentence
  else
    sentence = sent; // get the whole string
	string::size_type last_found_offset = 0; // start at 0
	while(last_found_offset != string::npos) {
		//string::size_type buf = sent.find(" ");
		string::size_type start_offset = last_found_offset;
		last_found_offset = sentence.find(" ", start_offset); // finds the space after the last found
		cout << (int)last_found_offset << endl; /////////////////////////////////////////added code...
		string frst;
		
		if(last_found_offset != string::npos) // do this only when a space was found
		{
			frst = sentence.substr(start_offset, last_found_offset - start_offset); // the 2. Paramter is count not offset
			++last_found_offset; // we need to go beyond the space
		}
		else // if there're no spaces any more
			frst = sentence.substr(start_offset); // get the rest of the string
		
		cout << frst << endl;
	}


And this is the version with stringstream:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <sstream>

...

  stringstream ss_sent(sent);
  string s;
  getline(ss_sent, s, '.');

  stringstream ss_sentence(s);
  while(ss_sentence.good())
  {
    string frst;
    ss_sentence >> frst;

    cout << frst << endl;
  }
Topic archived. No new replies allowed.