Filtering Words using ifstream/ofstream

I'm a little confused with an assignment that i've been given. I am wanting to show the count of each individual word in the text1.txt file.

The banned.txt file gives me an array of banned words, the idea is to essentially compare the two files, and find the words in the text1.txt file.

I am also required to count how many times each individual word from the array is in the text1.txt file but I cannot figure it out. Any help is greatly appreciated.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
 #include <iostream>
#include <string>
#include <fstream>
using namespace std;





int main()
{
	//Step 1 read in 
	ifstream infile("banned.txt"); //The list of search words is in "search1.txt".
	string words[8]; //array to store the list of search words (words we need to find)
	int wordCount = 0;
	
	
	if (!infile) //checks we can open the file - validation example
	{
		cout << "ERROR: ";
		cout << "Can't open search1.txt\n";
	}
	//before we close this file we want to read the contents into an array, we can do this with a for loop
	for (int i = 0; i < 8; ++i)
	{
		infile >> words[i]; //by using the operator >> it reads from the variable "infile" and stores it into the array called words.
	}
	//to check we have stored the words we are searching for correctly inside the array, we can then use a cout to output them
	for (int i = 0; i < 8; ++i)
	{
		cout << words[i] << " "; //output each member of the array for debugging purposes.
	}
	cout << endl;
	infile.close(); //we have now finished with this file and all the information from the file has been stored into the array called words
	//the next step is to read in the line of characters in that we will use to search for specific words that are stored inside the array called words
	string text; //here we have declared a string called text which we can store our line of text in.                                              
	infile.open("text1.txt");
	getline(infile, text);
	while (!infile.eof())
	{
		cout << text;
		cout << endl;
		getline(infile, text);
	}
	infile.close();
	//Step 2
	//Do a comparison between the words from "search1.txt" and the words from "text1.txt". 
	//You need to find the occurrence of each word from “search1.txt” in the string of characters from “text1.txt”. 
	//Output the word, whether it has been found and, if found, the index of its location in the array,
		
	
	for (int i = 0; i < 8; ++i)
	{
		int position = text.find(words[i]); //uses the inbuilt function called find()
		

		cout << "\"" << words[i] << "\", ";
		if (position != string::npos)
		{
			
			cout << "Found, location " << position << endl;
			

			
			
			
			
		}
		else
		{
			cout << "Not Found" << endl;
		}

	}


	
	

	system("pause");
}
Last edited on
Each "banned word" has two attributes: the string and a count.
* The string is the word itself
* The count is how many times that word has been found from text.

You could have two arrays:
1
2
string words[8];
int counts[8] {}; // all initialized to be zero 

When you find word words[k], you can increment counts[k] by one.


This is where struct is handy; only one array is required:
1
2
3
4
5
6
struct Word {
  string w;
  int count {}; // initialized to 0
};

  Word words[8];

When you find word words[k].w, you can increment words[k].count by one.
Sorry if I seem stupid, where are you getting the [k] from?
It's just meant as "any valid index", like i.

Btw,
1
2
3
4
5
6
7
8
	infile.open("text1.txt");
	getline(infile, text);
	while (!infile.eof())
	{
		cout << text;
		cout << endl;
		getline(infile, text);
	}

can be re-written in a simpler way:
1
2
3
4
5
	infile.open("text1.txt");
	while (getline(infile, text))
	{
		cout << text << endl;
	}
show the count of each individual word in the text1.txt file.


Unless you're been told not to, use a std::map.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <string>
#include <fstream>
#include <map>

int main()
{
	std::ifstream infile("banned.txt");

	if (!infile)
		return (std::cout << "Cannot open banned file\n"), 1;

	std::map<std::string, size_t> words;

	for (std::string word; infile >> word; words.emplace(std::move(word), 0));

	infile.close();
	infile.open("text1.txt");

	if (!infile)
		return (std::cout << "Cannot open word file\n"), 1;

	for (std::string word; infile >> word; )
		if (const auto itr {words.find(word)}; itr != words.end())
			++itr->second;

	for (const auto& w : words)
		if (w.second)
			std::cout << w.first << "  " << w.second << '\n';

	std::cout << '\n';
}


banned.txt


for the


text1.txt


for the bad words for are bad the for good


which displays:


for 3
the 2

Last edited on
I am wanting to use that map, but I can't seem to get it to work? Any help? I have redone my code also.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>
#include <map>
using namespace std;








vector<string> readFile(const string& filename)
{
    vector<string> result;
    ifstream infile(filename);
    if (!infile)
    {
        cout << "ERROR reading " << filename << '\n';
    }
    else
    {
        for (string s; infile >> s; ) result.push_back(s);
    }
    return result;
}


int main()
{
    map<string, unsigned int> wordCounter;
    string word;
    vector<string> banned = readFile("banned.txt");
    vector<string> text = readFile("text1.txt");
    for (const string& b : banned)
    {
        auto f = find(text.begin(), text.end(), b);
        if (f == text.end()) cout << b << " is not found\n";
        else                   cout << b << " is found at word " << f - text.begin() + 1 << '\n';
    }
      
}


int wordCounter()
{
    ifstream infile("banned.txt");

    if (!infile)
    {
        return (cout << "cannot open banned file\n"), 1;
    }

    map<string, size_t> words;

    for (string word; infile >> word; words.emplace(move(word), 0));

    infile.close();
    infile.open("text1.txt");

    if (!infile)
    {
        return (cout << "cannot open file\n"), 1;
    }

    for (string word; infile >> word; )
        if (const auto itr{ words.find(word) }; itr != words.end())
            ++itr->second;

    for (const auto& w : words)
        if (w.second)
            std::cout << w.first << "  " << w.second << '\n';
          

   
        
}


What does "not work" mean? Be specific.
Last edited on
1
2
3
for (string word; infile >> word; )
        if (const auto itr{ words.find(word) }; itr != words.end())
            ++itr->second;


has errors, I have tried to fix them myself but I haven't been able to.

It expects a ')' on the if statement and also states that itr must have a bool type.
You need to compile as C++17
Thank you! You are a life saver, thank you to all of you guys for helping me.
Time for some extra credit!

Suppose the word "you" is banned. Now consider this text in the input file:
"Where are you?" said the man.
"You who?" I replied.
"You've got to be kidding!  You know who I am!" he said.

As written, your code won't find a single occurrence. To fix this, you can normalize the words. That means you convert all forms of the word to the same string. For example:
- Remove all leading and trailing punctuation.
- Convert letters to upper (or lower) case.

This still won't catch "you've", but it's a pretty good start. To make it work with your code, add this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Normalize a word: convert upper case to lower. Also trim leading and trailing spaces.
string normalize(string word)
{
    // trim punctuation from end
    while (word.size() && !isalpha(word.back())) {
        word.pop_back();
    }

    // trim leading punctuation
    size_t pos = 0;
    while (pos < word.size() && !isalpha(word[pos])) {
        ++pos;
    }

    if (pos) {
        word.erase(0, pos);
    }
    for (auto &ch : word) {
        if (isupper(ch)) ch = tolower(ch);
    }
    return word;
}


And in readFile(), change
result.push_back(s);
to
result.push_back(normalize(s));
This is all really helpful and is much appreciated!

Next I want to filter the text1.txt by comparing every word with the list that is in banned.txt. Right now it compares just the word that is in banned.txt and text1.txt ie: dog is in banned.txt so it will count dog in text1.txt. However, I want it to count words such as doggerel also, but replace the banned word with asterisks, and output them to a filtered file of text1.txt using the ofstream.

I can't seem to find any resources online that are explaining really what i'm trying to do. Any help? Thanks.
As one way (case sensitive), consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <string>
#include <iostream>
#include <fstream>
#include <set>

int main()
{
	std::ifstream fwrds("text1.txt");
	std::ifstream fban("banned.txt");

	if (!fwrds || !fban)
		return (std::cout << "Cannot open files\n"), 1;

	std::set<std::string> banned;

	for (std::string wrdban; fban >> wrdban; banned.insert(wrdban));
	fwrds.seekg(0, std::ios::end);

	std::string text(fwrds.tellg(), 0);

	fwrds.seekg(0);
	fwrds.read(text.data(), text.size());
	fwrds.close();

	for (const auto& ban : banned)
		for (size_t pos = 0, srt = 0; (srt = text.find(ban, pos)) != std::string::npos; pos = srt + 1)
			text.replace(srt, ban.size(), std::string(ban.size(), '*'));

	std::ofstream ofs("text1.txt");

	ofs.write(text.data(), text.size());
}

Last edited on
One approach:
FOR each word in text1
  IF is_bad( word )
  THEN write * out
  ELSE write word out

The function is_bad() returns true, if word should be banned. Else it returns false.

A variant:
FOR each word in text1
  write censor(word) out

The function cencor() returns *, if word should be banned. Else it returns the word.

The point is that main program does something for the whole text and it uses
functions to do the work. A function like is_bad() or censor() does something with just one word.

Within those functions you can choose a comparison method that you like.
1
2
3
word == ban // case and punctuation sensitive, whole word match
normalize(word) == ban // whole word match
normalize(word).find( ban ) // substring match 



seeplus uses different approach above. All text1 is in memory and is modified before the entire text is written out.
Last edited on
As case insensitive:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <string>
#include <iostream>
#include <fstream>
#include <set>
#include <cctype>

std::string tolower(const std::string& txt)
{
	std::string low;

	low.reserve(txt.size());

	for (const auto& ch : txt)
		low += static_cast<char> (std::tolower(static_cast<unsigned char>(ch)));

	return low;
}

int main()
{
	std::ifstream fwrds("text1.txt");
	std::ifstream fban("banned.txt");

	if (!fwrds || !fban)
		return (std::cout << "Cannot open files\n"), 1;

	std::set<std::string> banned;

	for (std::string wrdban; fban >> wrdban; banned.insert(tolower(wrdban)));
	fwrds.seekg(0, std::ios::end);

	std::string text(fwrds.tellg(), 0);

	fwrds.seekg(0);
	fwrds.read(text.data(), text.size());
	fwrds.close();

	const auto aslow {tolower(text)};

	for (const auto& ban : banned)
		for (size_t pos = 0, srt = 0; (srt = aslow.find(ban, pos)) != std::string::npos; pos = srt + 1)
			text.replace(srt, ban.size(), std::string(ban.size(), '*'));

	std::ofstream ofs("text1.txt");

	ofs.write(text.data(), text.size());
}

Topic archived. No new replies allowed.