Reading words from a file

I'm trying extract words from a text file and then put then them into a set. I want to delimit all the spaces as well as the periods. My code is working fine for spaces but its leaving out the periods at the end of some words. Can someone help me out?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
void Dict::get_words(string file)
{
  ifstream file(file);

  int end, beginning = 0;
  string word;
  string line;

  ifstream file("file.txt");

  if(file.is_open())
  {
    while(! file.eof())
    {
      getline(file, line);
      end = line.find(" ");
      word = line.substr(beginning, end);
      words.insert(word);
      beginning = end + 1;
    }
  }
  file.close();
  
} //get_words 
Last edited on
how are even the spaces being managed correctly? i think you can try something like the following. Disclaimer: i'm a newbie myself and the code is just an attempt to solve your problem and in the process learn a bit myself.
1
2
3
4
5
6
7
8
9
10
11

while (! file.eof()) {
  getline(file,line,".");
  int beginning = 0;
  do {
     end = line.find(" ");
     word = line.substr(beginning,end);
     words.insert(word);
     beginning = end + 1;
  } while (end < line.length()) ;  
}
There are several ways.

Personal favorite is to redefine "whitespace" to include periods (because C++ streams are that tunable):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <fstream>
#include <locale>
#include <sstream>
#include <set>
#include <vector>
#include <string>

struct period_ws: std::ctype<char> {
    static const mask* make_table()
    {   
        static std::vector<mask> v(classic_table(), classic_table() + table_size);
        v['.'] |=  space;  // period will be classified as whitespace
        return &v[0];
    }
    period_ws(std::size_t refs = 0) : ctype(make_table(), false, refs) {}
};

int main()
{
    std::ifstream f("test.txt");
    f.imbue(std::locale(f.getloc(), new period_ws()));

    std::set<std::string> words;
    std::string word;
    while(f >> word) // just forget "file.eof()" exists
       words.insert(word);

    for(auto& s: words)
        std::cout << "'" << s << '\'' << '\n';
}


You could also use boost.tokenizer, which skips periods by default:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <fstream>
#include <set>
#include <string>
#include <boost/tokenizer.hpp>
int main()
{
    std::ifstream f("test.txt");
    std::istreambuf_iterator<char> beg(f), end;
    std::string total(beg, end);
    boost::tokenizer<> tok(total);
    std::set<std::string> words(tok.begin(), tok.end());

    for(auto& s: words)
        std::cout << "'" << s << '\'' << '\n';
}


Or, you could follow your approach: read line by line... Again, several ways, here's a simple one:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <iostream>
#include <fstream>
#include <set>
#include <string>
#include <sstream>
#include <algorithm>
int main()
{
    std::ifstream f("test.txt");
    std::set<std::string> words;

    std::string line;
    while(getline(f, line))
    {   
        std::replace(line.begin(), line.end(), '.', ' '); // replace periods with spaces
        std::istringstream buf(line); // then use standard parse
        std::string word;
        while(buf >> word)
            words.insert(word);
    }

    for(auto& s: words)
        std::cout << "'" << s << '\'' << '\n';
}
Last edited on
I'm using this code to extract words from a file. The program works but it only gets words from the first line. I can't get it to grab words from every single line, and until EOF. I cant figure it out.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <fstream>
#include <string>
#include <set>
#include <algorithm>
#include <cctype>
#include <sstream>

using namespace std;

int main()
{
   string temp, sentence;
   stringstream iss;
   set <string> sentences;
   ifstream file("thisfile.txt");

  if(file.is_open())
  {
    while(!file.eof())
    {
      while(getline(file, temp, '.'))
      {
        iss << temp;
        while(getline(iss, sentence, ' '))
        {
          sentences.insert(sentence);
        }
      }
    }
  }

  file.close(); //close file

  for (set<string>::const_iterator it = sentences.begin();
    it != sentences.end(); it++)
    {
      cout << *it << endl;
    }

  return 0;
} // get_sentences
Topic archived. No new replies allowed.