String search program

Hey all,
I'm working on this homework assignment and I'm really having trouble. I'm supposed to count the number of words more than two characters(have to contain one letter), unique words, and the number of times each unique word appears in the Programming Execution Environment. I'm also supposed to get input to search for in the PEE and output the number of times it appears and the line where it appears. I have some of it working, but I'm really struggling with counting how many times each word appears. I know my code is really bad right now, but that's why I'm here. Any help is really appreciated!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
  #include <iostream>
#include <cstring>
#include <string>
#include <cctype>
#include <algorithm>
#include <vector>
#include <set>

using namespace std;

//PEE string
string envstr("");

bool checkChar(unsigned c)
{
	return (ispunct(c) || isspace(c) || isblank(c) || isdigit(c) || c == '\n');
}

void searchWord(unsigned c, size_t length)
{
	multiset<string> words;
	vector<string> vwrds; //this was something i was trying out
	string tempword;
	while (!checkChar(envstr[c]) && c < length)
	{
		tempword = tempword + envstr[c]; //problem here
		c++;
	}

	tempword = tempword + " ";
	vwrds.push_back(tempword); 

	words.insert(tempword); //this is just a bunch of random letters

	tempword.clear();
	//for (multiset<string>::const_iterator i(words.begin()), end(words.end()); i != end; i++)
		//cout << *i;
}

bool checkIfWord(char c)
{
	bool valid = false;
	int i;

		for (i = c; i > c - 2; i--)
		{
			if (!checkChar(envstr[i]))
				valid = true;
		}

		if (valid)
			searchWord(i, envstr.length());

	return valid;
}

int main()
{
	//this code given by my instructor
	extern char **environ; // needed to access your execution environment

	int k = 0;
	size_t wordCount = 0;
	while (environ[k] != NULL)
	{
		cout << environ[k] << endl;		
		string str(environ[k]);
		envstr = envstr + str;
		k++;
	}

	//iterator to count words
	wordCount = count_if(envstr.begin(), envstr.end(), checkIfWord);

	cout << "\nThe PEE contains " << wordCount << " words. \n";

	//transform environment string to lowercase
	transform(envstr.begin(), envstr.end(), envstr.begin(), tolower);

	string input;

	do
	{
		cout << "Enter your search item: \n";
		cin >> input;
		//string can only be forty characters
		if (input.length() > 40 || input == "\n")
		{
			cout << "That search query is too long. \n";
			continue;
		}

		//change the search string to lowercase, like the envstr
		transform(input.begin(), input.end(), input.begin(), tolower);

		int j = 0;
		int searchCount = 0;
		vector<size_t> positions;
		size_t pos = envstr.find(input, 0);

		//search for that string
		while (pos != string::npos)
		{
			positions.push_back(pos);
			pos = envstr.find(input, pos + 1);
			searchCount++;
		}

		cout << "\nThat phrase occurs a total of " << searchCount << " times.\n";
		cout << "It occurs in the following lines: \n";
		//output where that string occurs
		for (vector<size_t>::iterator it = positions.begin(); it != positions.end(); ++it)
		{
			for (int i = *it; i < envstr.length() - 1 && checkChar(envstr[i]); i++)
			{
				cout << envstr[i];
			}

			cout << endl;
		}

		positions.clear();

	} while (input != "END");

	cin.get();
	return 0;
}
struggling with counting how many times each word appears
Look into std::map. It holds a pair of values: unique key and value which can be accessed by key.
so if you have std::map<std::string, int> word_count; each time you have a word you want to count in variable word you can just do word_count[word] += 1; If word didn't appear before, operator[] will assign 0 to value of key word and after inctrement we will have 1, what we actually want.

I know my code is really bad right now, but that's why I'm here. Any help is really appreciated!
If you show which code was provided by intructor and you must absolutely use that, we might ompimize other code. And tell, does your compiler support C++11, some faculties would be useful here.
1
2
3
4
5
6
7
8
9
10
11
12
//this code given by my instructor
	extern char **environ; // needed to access your execution environment

	int k = 0;
	size_t wordCount = 0;
	while (environ[k] != NULL)
	{
		cout << environ[k] << endl;		
		string str(environ[k]);
		envstr = envstr + str;
		k++;
	}


That's the part I have to use. I'm trying to change the code to use the map, but I'm still trying to figure it out because I've never used that before. My functions to separate the words is wrong, but if I can get that right I think I could get the map to work too. I'm using Visual Studios 2013, which I'm pretty sure supports C++11.
There is numerous problems with given code snipped:
a) There will be no delimeters between different lines in enviroment:
ALLUSERSPROFILE=C:\ProgramDataAPPDATA=E:\Users\MiiNiPaa
when it sould be:
ALLUSERSPROFILE=C:\ProgramData
APPDATA=E:\Users\MiiNiPaa
Or at least:
ALLUSERSPROFILE=C:\ProgramData APPDATA=E:\Users\MiiNiPaa

That way it is hard to distinguish words.

b) I do not see what is word. Does path like C:\ProgramData is one word or ProgramData is one? Or mabe whole line ALLUSERSPROFILE=C:\ProgramData is one word, seeing like there is no spaces?
From the assignment: A word is any string of at least two characters, i.e., letters, and/or digits, and/or
underscores “_”. To be a word, the string must have at least one letter. Thus x86 is a
word while 86 and C: are not words. Any other characters including <newline> and
whitespace should be interpreted as delimiter separators between words and non-words
(or words). For example, the string “SystemRoot C:\Users” consists of 2 words, where
SystemRoot is a word and Users is a word. C:\Users is not a word because the characters
“:” and “\” are not legal word characters. How do I split it so there are delimiters between the words? I want to try and split the string, but how do I do that if there are multiple delimiters? Like :, =, \, etc
There is boost, you can use reges, or you can do it direct way:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#include <algorithm>
#include <cctype>
#include <iomanip>
#include <iostream>
#include <map>
#include <string>
#include <queue>


inline bool isWordLetter(char c)
{
    return std::isalnum(c) || c == '_';
}


inline void skipws(std::queue<char>& q)
{
    while (!( q.empty() || isWordLetter(q.front()) ))
        q.pop();
}


/* Extracts valid word.
 * Returns word or empty string if no word found till the end of queue
 */
std::string extractWord(std::queue<char>& q)
{
    std::string word;
    bool again = false;
    do {
        again = false;
        skipws(q);
        /* Extract all consequent word character */
        while( !q.empty() && isWordLetter(q.front()) ) {
            word += q.front();
            q.pop();
        }
        /* Check if word extracted is actually word */
        if (word.size() < 2 || count_if(word.begin(), word.end(), isalpha) == 0) {
            word.clear();
            if ( !q.empty() )
                again = true;
        }
    } while (again); //Repeat until we find word or exhaust our queue
    std::transform(word.begin(), word.end(), word.begin(), tolower);
    return word;
}


std::string getEnvString()
{
    extern char **environ; // needed to access your execution environment
    std::string envstr;
    int k = -1;
    while (environ[++k] != NULL) {
        envstr += environ[k];
        envstr += '\n';
    }
    return envstr;
}


int main()
{
    std::string envstr = getEnvString();
    std::cout << envstr << std::endl;;

    std::queue<char> line( {envstr.begin(), envstr.end()} );
    size_t wordCount = 0;
    std::map<std::string, unsigned> entries;
    std::string word;
    while((word = extractWord(line)) != "" ) {
        ++wordCount;
        ++entries[word];
    }
    std::cout << "There is " << wordCount << " words in envstring\n" <<
                 "Unique words and their frequency:\n";
    for(auto& p: entries) {
        std::cout << std::setw(30) << p.first << ' ' << p.second << '\n';
    }
}
http://ideone.com/DKLmP1

I did reorder code a little, because I like it that way.
Thanks, that really helped. I've got it working for the most part. The only thing left is the searching. I search through and store the positions in a vector, so then how do I output them? I need to output the whole line where the string occurs. Here's what I have for that so far:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
string input;
	while (!input.compare("END") && !input.compare("\n"))
	{
		cout << "Enter your search item: \n";
		cin >> input;
		//string can only be forty characters
		if (input.length() > 40 || input == "\n")
		{
			cout << "That search query is too long. \n";
			continue;
		}

		//change the search string to lowercase, like the envstr
		transform(input.begin(), input.end(), input.begin(), tolower);

		int j = 0;
		int searchCount = 0;
		vector<size_t> positions;
		size_t pos = envstr.find(input, 0);

		//search for that string
		while (pos != string::npos)
		{
			positions.push_back(pos);
			pos = envstr.find(input, pos + 1);
			searchCount++;
		}

		cout << "\nThat phrase occurs a total of " << searchCount << " times.\n";
		cout << "It occurs in the following lines: \n";
		//output where that string occurs

		for (vector<size_t>::iterator it = positions.begin(); it != positions.end(); ++it)
		{
			unsigned i = *it;
			while (envstr[i] != '\n' && i < envstr.size() - 1)
			{
				cout << envstr[i];
				i++;
			}

			cout << endl;
		}

		positions.clear();
	}


But this isn't right. I think it outputs more lines than it should
Problem is probably that if word is in line multiple times, it will be outputted multiple times. I suggest to check one line at a time and make sure it pushed only once: http://ideone.com/qKJo1X
Last edited on
Topic archived. No new replies allowed.