reading words from a txt file into a string vector

I should implement a function that reads in a large text file, counts how many times each word occurs and prints only the words that occur more often than the “threshold” parameter and their occurrence count, one name-count per line. The higher the threshold parameter, the less words will meet it and be printed. Threshold 0 will print all the words. (We are not worried about punctuation and such things here. Consider as a separator white space (i.e. so you can use >> without doing anything more). However, letter case should be ignored. e.g.: The fragment “This IS TRUE. THIS is not.” has these words: “this” x 2, “is” x 2, “true.” and “not.” I must implement this function without using a map. I must use only 1 vector of strings for this function and nothing else. I must use iterators to work with the vector and not array index notation. To turn a word into all lowercase I can use the STL transform
transform(word.begin(), word.end(), word.begin(), ::tolower);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
* Print a word and the number of times it occurs only if it occurs
* more often than the given threshold
*/
void printIf(string word, int occurrences, int threshold) {
	if (occurrences > threshold) {
		cout << word << " - " << occurrences << endl;
	}
}

void printCommonWords1(string& filename, int threshold) {
	// your code here

	// you MUST use ONLY ONE VECTOR OF STRING (NO STRUCTS) to store the data, and NO MAP OR ANYTHING ELSE.

	// you MUST use ITERATORS to access the data in the vector and NOT [index] NOTATION

	// to print any values call the function above

	vector<string> myWords;
	vector<string>::iterator it;
	while (cin >> filename)          //no idea if this is correct
		myWords.push_back(filename);
     //should I use transform with iterators?
     //how do i get the count for each word?
	for (it = myWords.begin(); it != myWords.end(); it++) {
		printIf(*it, count, threshold);
	}


}
First step is to read all the words from the file , convert them into lowercase and store the words in the vector.
Forget everything else for the moment until you have done it.

Maybe you need to read this tutorial first about files.
http://www.cplusplus.com/doc/tutorial/files/
ok so here is how I edited it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void printCommonWords1(string& filename, int threshold) {

	string word;
	int* wordCountArray;
	vector<string> myWords;
	vector<string>::iterator it;
	ifstream myFile(filename);
	if (!myFile)
		cout << "Could not open file" << endl;

	while (myFile >> word)
		myWords.push_back(word);

	transform(myWords.begin(), myWords.end(), myWords.begin(), ::tolower);

	for (it = myWords.begin(); it != myWords.end(); it++) {

	}
	
	myFile.close();
}

But now I don't know how to keep the count for each word in the vector. Should I use a separate wordCountArray? I need to be able to pass the count for each word to printIf.
I think the myWords vector must contain different words. The way I did it stores every single word in the vector even if it is a word that has been stored previously. I am confused as hell.
Last edited on
Should I use a separate wordCountArray?


I thought you are not allowed.
// you MUST use ONLY ONE VECTOR OF STRING (NO STRUCTS) to store the data, and NO MAP OR ANYTHING ELSE.


One way I see is to count the word before you print it.
You could create a function int WordCount that counts the number of occurences in the vector.
Then for each word in the vector you get the count and if the count is greater than threshold you print it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <iostream>
#include <string>
#include <cctype>
#include <vector>
#include <fstream>
#include <iterator>
#include <algorithm>
#include <iomanip>

std::string to_lower( std::string str )
{
    for( char& c : str ) c = std::tolower(c) ;
    return str ;
}

std::vector<std::string> get_words( std::string file_name )
{
    std::vector<std::string> result ;

    std::ifstream file(file_name) ;
    std::string word ;
    while( file >> word ) result.push_back( to_lower(word) ) ;

    return result ;
}

void print_common_words( std::string file_name, int threshold )
{
    std::vector<std::string> words = get_words(file_name) ;

    auto iter = std::begin(words) ;
    const auto end = std::end(words) ;
    std::sort( iter, end ) ; // sort the vector to make repetitions appear next to each other

    while( iter != end )
    {
        // http://en.cppreference.com/w/cpp/algorithm/upper_bound
        const auto next_word = std::upper_bound( iter, end, *iter ) ;

        // http://en.cppreference.com/w/cpp/iterator/distance
        const auto frequency = std::distance( iter, next_word ) ; // frequency of this word

        if( frequency > threshold ) std::cout << std::quoted(*iter) << " x " << frequency << '\n' ;

        iter = next_word ; // move to the next word
    }
}
thank you, but that is way beyond my understanding.
my problem now is counting each word in the myWords vector and passing the word and its occurrence to printIf.
Last edited on
Perhaps this is easier to understand:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
void print_common_words( std::string file_name, int threshold )
{
    std::vector<std::string> words = get_words(file_name) ;

    std::sort( words.begin(), words.end() ) ; // sort the vector to make repetitions appear next to each other

    auto iter_current_word = words.begin() ;
    while( iter_current_word != words.end() )
    {
        const std::string& this_word = *iter_current_word ;

        // the vector is sorted; locate the next (different) word in the vector
        auto iter_next_word = iter_current_word ;
        while( iter_next_word != words.end() && *iter_next_word == this_word ) ++iter_next_word ;

        const auto occurrences = iter_next_word - iter_current_word ; // occurrences of this word

        printIf( this_word, occurrences, threshold ) ;

        iter_current_word = iter_next_word ; // move to the next word
    }
}
a bit easier. So, what is get_words? I guess I can use this instead
1
2
3
4
5
6
ifstream myFile(filename);
std::vector<std::string> words;
if (!myFile)
     cout << "Could not open file" << endl;
while (myFile >> word)
     words.push_back(word);


ALSO what can I use instead of "auto"?
Could you explain the while loops?
Last edited on
I came up with this one, but it does not print anything.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
 if (!myFile)
		cout << "Couldn't open the file" << endl;

	while (myFile >> word)     //get words one by one (ignoring white spaces) and push them into the string vector
	{
		transform(word.begin(), word.end(), word.begin(), ::tolower);
		words.push_back(word);
	}

	std::sort(words.begin(), words.end());     // sort the vector to make repetitions appear next to each other

	it = words.begin();
	while ( it != words.end() ) {
		currentWord = it;
		count = 0;
		while (it == (it + 1)) {
			count++;
			it++;
		}
		count++;
		it++;
		printIf(*currentWord, count, threshold);
	}
	myFile.close();
Last edited on
> what can I use instead of "auto"?

Legacy C++:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <iostream>
#include <string>
#include <cctype>
#include <vector>
#include <fstream>
#include <algorithm>

std::string to_lower( std::string str ) {

    for( std::size_t i = 0 ; i < str.size() ; ++i ) {
            str[i] = std::tolower( str[i] ) ;
    }

    return str ;
}

std::vector<std::string> get_words( const char* file_name ) {

    std::vector<std::string> result ;

    std::ifstream file(file_name) ;
    std::string word ;
    while( file >> word ) result.push_back( to_lower(word) ) ;

    return result ;
}

void printIf( std::string word, int occurrences, int threshold ) {

	if( occurrences > threshold ) {

		std::cout << word << " - " << occurrences << '\n' ;
	}
}

void print_common_words( const char* file_name, int threshold ) {

    std::vector<std::string> words = get_words(file_name) ;

    std::sort( words.begin(), words.end() ) ;

    std::vector<std::string>::iterator iter = words.begin() ;
    while( iter != words.end() ) {

        const std::string& this_word = *iter ;

        int occurrences = 0 ;
        while( iter != words.end() && *iter == this_word ) {

                ++iter ;
                ++occurrences ;
        }

        printIf( this_word, occurrences, threshold ) ;
    }
}
ok I found my mistake. I forgot the * for the inner while loop.
Last edited on
Topic archived. No new replies allowed.