getting the words (strings) of a sentence

Hi guys,

I am trying to write a code that reads all the words in text according to each sentence from a file. this means i have to signify the end and beginning of every individual sentence the text file. i really dont even know where to start as of now. can anybody just give me a hint. am not asking that it should be done for me.
actually am trying to create a summarizer in c++.
And how would someone help you with that if you don't show what your text file looks like, or what code you are using?
I'm not really sure of what you want but this might be one way to solve it. Here you get all the sentences as elements in a vector. i have not tried it but it should work. :)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include<fstream>
#include<string>
#include<vector>

using namespace std;

int main(int argc, char* argv[])
{
  if(argc != 2)
    return -1;

  ifstream ifs(argv[1], ifstream::in);
  vector<string> vec;
  string str, sentence;
  
  if(!ifs)
    return -2;

  while(ifs >> str)
  {

    sentence = sentence + " " + str; 

    if(str.at(str.length() - 1) == '.')
    {
      vec.push_back(sentence);
      sentence = "";
    }
    
  }

}
Have you seen?

Tokenization
http://en.wikipedia.org/wiki/Tokenization

Sentence boundary disambiguation
http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation

or

Doing Things with Words, Part One: Tokenization
http://www.attivio.com/blog/57-unified-information-access/257-doing-things-with-words-part-one-tokenization.html

Doing Things with Words, Part Two: Sentence Boundary Detection
http://www.attivio.com/blog/57-unified-information-access/263-doing-things-with-words-part-two-sentence-boundary-detection.html

Andy

PS feeding the examples from the last of the above articles, e.g. (it's unwrapped text in the actual file.)

I saw a squirrel. Attivio is on Walnut St. in Newton. Bob got a
doctorate from M.I.T. I said, "Attivio is in Newton." I never
drink... wine. But I thought he was...


the output of loonielou's illustrative code is:

sentence #1:  I saw a squirrel.
sentence #2:  Attivio is on Walnut St.
sentence #3:  in Newton.
sentence #4:  Bob got a doctorate from M.I.T.
sentence #5:  I said, "Attivio is in Newton." I never drink...
sentence #6:  wine.
sentence #7:  But I thought he was...

so there's a good bit of tweaking required if you want your summarizer to handle text robustly!

where the following code was added to the end of main(), and <iostream> included:

1
2
3
4
5
  const size_t cnt = vec.size();
  for(size_t idx = 0; cnt > idx; ++idx)
  {
    cout << "sentence #" << (idx + 1) << ": " << vec[idx] << endl;
  }
Last edited on
thanks guys i managed to get my code working through a the below code.
but my problem now is getting the text read from a file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <string>
#include <vector>
#include <iostream>
#include <sstream>


using namespace std;

vector<string> split(const string &s, char delim);

int main(){
	string text = "It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy. Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like).";

	int wale; 
	vector<string> paragraphs = split(text, '.');
	for (unsigned i = 0; i < paragraphs.size(); i++)
	{
		std::cout << ' ' << paragraphs.at(i);
		std::cout << '\n'; std::cout << '\n';
		std::cout << '\n';
		std::cout << '\n';
	}

	cout << "";
	cout << " olawale";
	cin >> wale ;
	return 0;
	
}

vector<string> split(const string &s, char delim)
{
	vector<string> elems;	

	stringstream ss(s);
	string item;
	while (getline(ss, item, delim)) {
		elems.push_back(item);
	}
	return elems;

	
}
Last edited on
After some head cracking i was able to load the text from file. now i am stuck on trying to search the whole text for a particular word according to each sentence ( i.e a topic name. ) and display the sentences with that contains that word. below is where i am on this problem can some give some hints as in what to do.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

#include <string>
#include <vector>
#include <iostream>
#include <sstream>
#include <fstream>


using namespace std;

vector<string> split(const string &s, char delim);

int main(){
	string filepath ;
	string back2str;
	string topic;string line;
	cout << " enter file Dest"<< endl;
	cin >> filepath;
	cout << " enter topic name " << endl;
	cin >> topic;
	string tempf;
	
	ifstream file (filepath); // declaring file that contains the data inputs
	while( std::getline( file, line) )
	{	
		back2str += line;  // transfers all the contents of the text into a whole string
	}
	int wale;
	vector<string> paragraphs = split(back2str, '.');
	for (unsigned i = 0; i < paragraphs.size(); i++)
	{
		size_t pos = 0 ;
		while((pos = tempf.find(topic, pos))!= string::npos)
		{
			std::cout << ' ' << paragraphs.at(i);
			std::cout << '\n'; std::cout << '\n';
			pos+=topic.size();
		}
		
		
	}

	cout << "";
	cout << " olawale";
	cin >> wale ;
	return 0;
	
}

vector<string> split(const string &s, char delim)
{
	vector<string> elems;	

	stringstream ss(s);
	string item;
	while (getline(ss, item, delim)) {
		elems.push_back(item);
	}
	return elems;

	
}

Topic archived. No new replies allowed.