Comparing Strings - Simple Spell Checker

Hello everyone. First and foremost, thank you for taking the time to view my problem. I am in my sophomore year of college, and I really, really love programming. Despite not being able to figure this out, I thoroughly enjoy every moment of the chase. I am not looking for someone to do my homework; this will be my job someday so I want to understand!

The homework is to compare words in a string from a file. The first word of the line is the correct word, and the words following may or may not contain some kind of error. One part of the assignment is to create an enum set for the return value. This is not part of my problem, and I actually have the function set to "void" while I am figuring out the larger problem. I just have it set to COUT what the return value is for now. Also, the program has to count how many times each error occurs, but I have that taken care of as well. The error can be:

(2)Substitution - one char was substituted for another
(3)Transposition - two chars were switched with one another
(4)Deletion - a char was deleted from the string
(5)Insertion - an extra char was inserted in the string
(6)Misspelling - There is more than one type of error contained
(1)Correct - The word is correct

Here is the problem:

I have it almost figured out but cannot return value of 6 (misspelling, two or more types of error). If the original word is "COMPUTER" and the test word is "KOMPUTERS", it returns error(5), Insertion. This is because "KOMPUTERS" size is technically longer than "COMPUTER". The problem is, it also contains a substitution on the first word. I am lost, because I do not know how to say "If it is THIS plus THIS, then return error(6) Misspell. If someone could help me figure this one problem out, I will be able to take the rest of the assignment from there.

AGAIN, thank you so much for your time and help. Let me know if there is any further information you need from me. Take care!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
using namespace std;

vector<string> GlobalList;

enum error {CORRECT = 1, SUBSTITUTION, TRANSPOSITION, 
				DELETION, INSERTION, ERROR};


void getString(ifstream& fin, ofstream& fout)
{
	string line = "";
	getline(fin, line, '\n');
	fout << line;
}

void processString(ifstream&fin, ofstream& fout, 
					int& cor, int& sub, int& tran,
					int& del, int& ins, int& miss)
{
	
	string word = "";
	while (!fin.eof())
	{
		fin >> word;
		::GlobalList.push_back(word);
		fout << GlobalList.size() << word << " " << endl;
	}
	for (int i = 1; i < GlobalList.size(); i++)
	{
		// The compared word size is smaller (CHAR DELETION)
		if (GlobalList.at(0).size() > GlobalList.at(i).size()) 
		{						
			//return(error(4));
			del++;
			cout << "The return value is 4" << endl;
		}
		// The compared word size is larger (CHAR INSERTION)
		else if (GlobalList.at(0).size() < GlobalList.at(i).size())
		{
			//return(error(5));
			ins++;
			cout << "The return value is 5" << endl;
		}
		// The compared word is the exact same (CORRECT)
		else if (GlobalList.at(0).compare(GlobalList.at(i)) == 0)
		{
			//return(error(1));
			cor++;
			cout << "The return value is 1" << endl;
		}
		else
		{
			int y1 = 1;
			int y2 = 1;
			for (int j = 0; j < GlobalList.at(0).size(); j++)
			{
			char x1 = GlobalList.at(0).at(j);
			char x2 = GlobalList.at(i).at(j);
				y1 *= x1;
				y2 *= x2;
			}
			// The sum of the compared word's chars is equal to original word, 
			// but not necessarilly same order (TRANSPOSITION)
			if (y1 == y2)
			{
				//return(error(3));
				tran++;
				cout << "The return value is 3" << endl;
			}
			// The sum of the compared word's chars is not equal to original word,
			// meaning a different value char was inserted (SUBSTITUTION)}
			else if (y1 != y2)
			{
				//return(error(2));
				sub++;
				cout << "The return value is 2" << endl;
			}
			// The word has two or more types of errors (MISSPELL)
			else
			{
				//return(error(6));
				miss++;
				cout << "The return value is 6" << endl;
			}
		}
		 
	}

}


void main()
{
	int sub = 0, tran = 0, del = 0,
		ins = 0, miss = 0, cor = 0;
	{
		ifstream fin("input.txt");
		ofstream fout("tempOutput.txt");
		getString(fin, fout);
	}
	{
		ifstream fin("tempOutput.txt");
		ofstream fout("output.txt");
		processString(fin, fout, cor, sub, tran, del, ins, miss);
	}
	
	system("pause");

}


P.S. Here is my input file:
COMPUTER COMPUDER COPMUTER COMPTER COMPUTXER KOMPUTERS COMPUTER
BRITNEY BRITTANY BRITNE BRITNYE BTITINEY BRITNEYS
KOURNIKOVA KOURNIKOVO OURNIKOVA OKURNIKOVA SKOURNIKOVA COURNIKOVA KOURNAKOVA
NAME SALE SALES NALE NAIL NALES NAMES NAMSE NAM NMAE

Just using the first line, comparing "COMPUTER" with the other 6 words, all return the correct value except the one in my problem above ("KOMPUTERS"). The return values are 2, 3, 4, 5, 5, 1. As I said, those are all correct besides the second '5' should actually be a '6' (misspell).
Last edited on
Why the global variable (GlobalList)?

What is the purpose of the getString() function? You don't appear to be using the file it creates.

Why is your processString() function named processString? Wouldn't processFile() be a more appropriate name?

Since every line contains multiple spellings wouldn't it be easier to read one line at a time, process that line using a stringstream? With your current method how do you know when you are finished with the first line?

Surely by now someone must have told you about the two correct forms of main()?
1
2
int main()
int main(int argc, char **argv)


You should notice that void is not a correct return type for this function.

Thank you for the response, jlb.

Be reminded the program is not complete yet, so some things may appear to be named incorrectly or not finished yet. Right now I'm just trying to figure out a way to detect if a word has more than one type of error. Also be reminded, before a couple months ago, I never programmed in my life, so my methods may be outdated (or wrong =P).

To answer your questions:

I used a global variable to store the words (I look at it like a word jar) so I can reference the words contained in that storage space from any function. (The scope of the words is the whole program).

I felt getString() needed to be implemented because I need to read the input file line by line. If I read the whole file at once, I would be comparing "COMPUTER" to "BRITNEY", which would be way off. It was the only way I knew how to say "The first word in this line is the correct word," otherwise it would be comparing the first word to the entire rest of the file. Get it? This answers your question of how I know I'm done with the line. Each line is extracted to a temporary file and that temporary file is read in, containing that single line. I'm probably missing something simple here, but that's my logic. These individual lines are stored in a temporary file, where they are processed by processString(), which stores each word in a global vector array (the "word jar"). (I chose a vector because there is an unknown number of words following the correct word).

I called it processString() because it is processing the string that getString() got. It is comparing the first word of the line with the rest in that line.

I am not sure how to process the line using stringstream. We did not learn it in class. As a matter of fact, we did not learn vectors either, but I found no other way to go about this problem. Like I said, I could be going about this completely "bass-ackwards" :/

I have tried telling my teacher in class about how void main() is wrong and also how system("pause") is wrong, but he says don't worry about it for the means of this class.

Thank you so much.

Last edited on

I used a global variable to store the words (I look at it like a word jar) so I can reference the words contained in that storage space from any function. (The scope of the words is the whole program).


This is not a good reason to use a global variable, learn to properly pass the variable to and from your functions that need this variable.


I called it processString() because it is processing the string that getString() got. It is comparing the first word of the line with the rest in that line.

But wouldn't it be better to actually process a string instead of reading another file again and again? Pass the string to your processString() function instead of the file streams. Remember reading a file is many magnitudes slower than memory operations.

I am not sure how to process the line using stringstream.

One of the good things about C++ is that a stream is a stream no matter if it is a file stream or string stream the basic processing is the same. So for your purposes:

1
2
3
4
5
6
7
8
9
10
11
12
13
   ifstream fin("YourFileName); Create a file stream.
   string line;
   vector<string> words;
   string word;

   getline(fin, line);   // Get one line of text.
   istringstream sin(line);  // Create a stringstream.

   while(sin >> word)  // Process the line placing each word into the vector.
      words.push_back(word);

   // Now do your comparisons.
 


Notice the similarities between the string stream and a file stream.

I have tried telling my teacher in class about how void main() is wrong and also how system("pause") is wrong, but he says don't worry about it for the means of this class.

Okay, then don't worry about it, just use one of the proper forms of main().


Last edited on
Thank you very much, jlb. Your knowledge is greatly appreciated. I am going to implement this new information and let you know how it turns out.

On another note, while this information should make my program (and myself) more efficient and less garbled, I still do not know a way to prove that two error conditions exist simultaneously within the word (return value error(6)). I will perhaps look at ways stringstream can be used to compare words until someone smarter than I thinks of something else. I feel tapped out. I thought about it for a good 10 hours yesterday! ?:/
There is no reason you can't compare the two strings for all the different items. Instead of using if/else try just if statements.

Ah, so you are saying perhaps make a variable counter which is added to (++) when a condition is true. For example, if size is too long plus there is a substitution, the value would be 2. If the final value of the counter is greater than 1, it means more than one condition exist and the result will be error(6).

Now that I think about it more, I feel to accomplish this, I'd have to find the first error, correct it, and then resend it through the check for the second error. This is getting complicated.
Last edited on
Why do you need to correct any of the errors? Just check for each type of error, using the method you described in paragraph 1 above.
I would have to correct it because my method for finding if a char in a word has been transposed vs substituted is to multiply all the int values chars of the word together (sum), compare it with the value(sum) of the correct word, and if if the int value is the same, it means the letters were just switched (the sum is the same). If a word is too long, such as "KOMPUTERS" (one char too long), I would have to remove the letter that was inserted, before multiplying the chars together to check for transposition vs substitution.
Last edited on
If the word lengths don't match, just use the length of the smaller string when multiplying the values.

Last edited on
The character insertion could have been in any part of the word. There could be an extra character inserted in the beginning, middle, or end. Taking the length of the smaller string will just remove the last letter(s) of that string and compare. The letter removed may have not been the inserted letter, defeating my method of checking for substitution vs transposition.
Last edited on
Well I doubt that your method is actually working correctly.

What happens when the word is "worker" and you do your multiplication of the following letter values 119 * 111 * 114 * 107 * 101 * 114? This is a fairly large number (1.85517462035e+12), larger than can be held in a signed integer value, and since integer overflow produces at best implementation defined behaviour (at worst undefined behaviour) you should probably rethink your method.


Last edited on
This is true. I could static cast the int values to double. Even if I do static cast, this still leaves me at the same situation I was at. It seems I may have to find a way of comparing char by char.
I may have to find a way of comparing char by char.

That's probably a better way of proceeding.
Well, I am still stumped. Nonetheless, I have made the modifications you suggested. It is not due until the end of the week and I have class tomorrow so I will let you know what happens. In the meantime, here is my new code. Maybe someone can think of something now that I have taken the individual chars and put them in vectors to compare one by one.

Thanks again!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
using namespace std;


enum error {
	CORRECT = 1, SUBSTITUTION, TRANSPOSITION,
	DELETION, INSERTION, ERROR
};


void processString(ifstream&fin, ofstream& fout,
	int& cor, int& sub, int& tran,
	int& del, int& ins, int& miss)
{
	string line = "";
	string word = "";
	vector<string> words;
	getline(fin, line);
	istringstream sin(line);

	while (sin >> word)
	{
		words.push_back(word);

		fout << words.size() << word << " " << endl;
	}
	for (int i = 1; i < words.size(); i++)
	{
		int errorCount = 0;

		// The compared word size is smaller (CHAR DELETION)
		if (words.at(0).size() > words.at(i).size())
		{
			//return(error(4));
			del++;
			errorCount++;
			errorCount *= 4;
			
			
		}
		// The compared word size is larger (CHAR INSERTION)
		if (words.at(0).size() < words.at(i).size())
		{
			//return(error(5));
			ins++;
			errorCount++;
			errorCount*= 5;
			
		}

		
		{
			// Turn words(strings) into letters(chars) to compare letter by letter
			vector<char> correctLetters;
			vector<char> userLetters;
			double y1 = 0.0, y2 = 0.0;
			for (int j = 0; j < words.at(0).size() && j < words.at(i).size(); j++)
			{
				correctLetters.push_back(words.at(0).at(j));
				userLetters.push_back(words.at(i).at(j));
				if (correctLetters.size() == 1)
				{
					y1 = 1.0;
					y2 = 1.0;
				}
				if (correctLetters.at(j) != NULL)
					y1 *= double(correctLetters.at(j));
				else;
				if (userLetters.at(j) != NULL)
					y2 *= double(userLetters.at(j));
				else;
			}
			// The sum of the compared word's chars is equal to original word, 
			// but not necessarilly same order (TRANSPOSITION)
			if (y1 == y2)
			{
				//return(error(3));
				tran++;
				errorCount++;
				errorCount *= 3;

			}
			// The sum of the compared word's chars is not equal to original word,
			// meaning a different value char was inserted (SUBSTITUTION)}
			if (y1 != y2 && (words.at(0).size() == words.at(i).size()))
			{
				//return(error(2));
				sub++;
				errorCount++;
				errorCount *= 2;
			}
			// The compared word is the exact same (CORRECT)
			if (words.at(0).compare(words.at(i)) == 0)
			{
				//return(error(1));
				cor++;
				cout << "The return value of " << words.at(i) << " is 1" << endl;
			}
			else
			{
					if (errorCount == 2)
					cout << "The return value of " << words.at(i) << " is 2" << endl;
				else if (errorCount == 3)
					cout << "The return value of " << words.at(i) << " is 3" << endl;
				else if (errorCount == 4)
					cout << "The return value of " << words.at(i) << " is 4" << endl;
				else if (errorCount == 5)
					cout << "The return value of " << words.at(i) << " is 5" << endl;
				else
					cout << "The return value of " << words.at(i) << " is 6" << endl;
			}
		}
	}
}


int main()
{
	int sub = 0, tran = 0, del = 0;
	int	ins = 0, miss = 0, cor = 0;
	ifstream fin("input.txt");
	ofstream fout("output.txt");

	processString(fin, fout, cor, sub, tran, del, ins, miss);

	system("pause");
	return(0);
}
I have been reading this thread. I have some questions. I may or may not have any advice.

Why do you need the output file? (For example, is it a requirement of the exercise?)

For the input file that you supply in the first post, what should the output file look like?

To where does the error value need to be supplied?

Since you name the function processString(), why not just pass the string into this function? Let the next layer out function do the getline(fin, line) and then pass just line to this function.

I think (by reading and testing) that your current version just processes one line of the file. Are you intending (or the assignment require) that you process all lines of the file.

You said:
Maybe someone can think of something now that I have taken the individual chars and put them in vectors to compare one by one.

Why do you think that they need to be in vectors?
After countless, countless, countless hours of testing and thinking. I will post the completely, 100% working (to my knowledge) code. I do not think it can get much simpler than this.
I eliminated a lot of garbage as well.
pheininger, I certainly did not need to use vectors after all to compare. =)
I will also post the input and output. The output is exactly what it is supposed to look like.
If there is a way you can think to make this more simple, I'm all ears.
Thanks to those who helped!!!

INPUT FILE:
COMPUTER COMPUDER COPMUTER COMPTER COMPUTXER KOMPUTERS COMPUTER
BRITNEY BRITTANY BRITNE BRITNYE BTITINEY BRITNEYS
KOURNIKOVA KOURNIKOVO OURNIKOVA OKURNIKOVA SKOURNIKOVA COURNIKOVA KOURNAKOVA
NAME SALE SALES NALE NAIL NALES NAMES NAMSE NAM NMAE


OUTPUT FILE:

*****Starting a new line*****
Correct word is COMPUTER

User word is COMPUDER
The user word has one character substituted

User word is COPMUTER
The user word contains a transposition

User word is COMPTER
The user word has one character deleted

User word is COMPUTXER
The user word has one character inserted

User word is KOMPUTERS
The user word is too bad to be a misspelling

User word is COMPUTER
The user word is correct

*****Starting a new line*****
Correct word is BRITNEY

User word is BRITTANY
The user word is too bad to be a misspelling

User word is BRITNE
The user word has one character deleted

User word is BRITNYE
The user word contains a transposition

User word is BTITINEY
The user word is too bad to be a misspelling

User word is BRITNEYS
The user word has one character inserted

*****Starting a new line*****
Correct word is KOURNIKOVA

User word is KOURNIKOVO
The user word has one character substituted

User word is OURNIKOVA
The user word has one character deleted

User word is OKURNIKOVA
The user word contains a transposition

User word is SKOURNIKOVA
The user word has one character inserted

User word is COURNIKOVA
The user word has one character substituted

User word is KOURNAKOVA
The user word has one character substituted

*****Starting a new line*****
Correct word is NAME

User word is SALE
The user word is too bad to be a misspelling

User word is SALES
The user word is too bad to be a misspelling

User word is NALE
The user word has one character substituted

User word is NAIL
The user word is too bad to be a misspelling

User word is NALES
The user word is too bad to be a misspelling

User word is NAMES
The user word has one character inserted

User word is NAMSE
The user word has one character inserted

User word is NAM
The user word has one character deleted

User word is NMAE
The user word contains a transposition

There was 1 correct word
There were 5 words with a insertion
There were 4 words with a deletion
There were 5 words with a substitution
There were 4 words with a transposition
There were 7 words that were way off


CODE:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
#include <iostream>
#include <string>
#include <fstream>
using namespace std;

enum errorType {
	CORRECT = 1, SUBSTITUTION, TRANSPOSITION,
	DELETION, INSERTION, ERROR
};

int errorLocation(string cw, string bw)
{  // loop through the words and find the first char that is different (i)
	int i = 0;
	for (i = 0; i < cw.size(); i++)
		if (cw.at(i) != bw.at(i))
			break;
	return i;
}

errorType checkSubAndTran(string cw, string bw)
{
	int i = 0;
	i = errorLocation(cw, bw);

	if (i == cw.size() - 1)
		return SUBSTITUTION;
	if (cw.substr(i + 1) == bw.substr(i + 1))
		return SUBSTITUTION;
	else if (cw.at(i) == bw.at(i + 1) && cw.at(i + 1) == bw.at(i) && i <= cw.size() - 2)
		return TRANSPOSITION;
	else if (cw.at(i) == bw.at(i + 1) && cw.at(i + 1) == bw.at(i) && cw.substr(i + 2) == bw.substr(i + 2))
		return TRANSPOSITION;
	else
		return ERROR;
}

errorType checkDeletion(string cw, string bw)
{
	int i = 0;
	i = errorLocation(bw, cw);  // bw is shorter

	if (i == bw.size())
		return DELETION;
	if (cw.substr(i + 1) == bw.substr(i))
		return DELETION;
	else
		return ERROR;
}

errorType checkInsertion(string cw, string bw)
{
	int i = 0;
	i = errorLocation(cw, bw);  // cw is shorter

	if (i == cw.size() && bw.substr(0, (i - 1)) == cw.substr(0, (i - 1)))
		return INSERTION;
	if (bw.substr(i + 1) == cw.substr(i))
		return INSERTION;
	else
		return ERROR;
}

void convert(errorType error, int& sub, int& tran, int& del, int& ins, int& miss, int& correct, ofstream& fout)
{
	switch (error)
	{
	case CORRECT:
		fout << "is correct" << '\n' << endl;
		correct++; break;
	case SUBSTITUTION:
		fout << "has one character substituted" << '\n' << endl;
		sub++; break;
	case TRANSPOSITION:
		fout << "contains a transposition" << '\n' << endl;
		tran++; break;
	case DELETION:
		fout << "has one character deleted" << '\n' << endl;
		del++;  break;
	case INSERTION:
		fout << "has one character inserted" << '\n' << endl;
		ins++; break;
	default:
		fout << "is too bad to be a misspelling" << '\n' << endl;
		miss++; break;
	}
}

errorType returnCorrect()
{
	return(CORRECT);
}

void displayResult(int sub, int tran, int del, int ins, int miss, int correct, errorType error, ofstream& fout)
{
	fout << "There";
	if (correct > 1)
		fout << " were " << correct << " correct words" << endl;
	else
		fout << " was  " << correct << " correct word" << endl;
	fout << "There were " << ins << " words with a insertion" << endl;
	fout << "There were " << del << " words with a deletion" << endl;
	fout << "There were " << sub << " words with a substitution" << endl;
	fout << "There were " << tran << " words with a transposition" << endl;
	fout << "There were " << miss << " words that were way off" << endl;
}

int main()
{
	int sub = 0, tran = 0, del = 0;
	int	ins = 0, miss = 0, cor = 0;
	errorType error = {};
	string cw = "", bw = "";
	char tempChar = ' ';
	ifstream fin("input.txt");
	ofstream fout("output.txt");

	fin >> cw;
	while (!fin.eof())
	{
		fout << "*****Starting a new line*****" << endl;
		fout << "Correct word is " << cw << '\n' << endl;
		fin.get(tempChar);

		while (tempChar != '\n' && !fin.eof())
		{
			fin >> bw;
			fout << "User word is " << bw << endl;
			fout << "The user word ";

			if (cw == bw)
				convert(returnCorrect(), sub, tran, del, ins, miss, cor, fout);
			else if (cw.size() == bw.size())
				convert(checkSubAndTran(cw, bw), sub, tran, del, ins, miss, cor, fout);
			else if (cw.size() == bw.size() + 1)
				convert(checkDeletion(cw, bw), sub, tran, del, ins, miss, cor, fout);
			else if ((cw.size() + 1) == bw.size())
				convert(checkInsertion(cw, bw), sub, tran, del, ins, miss, cor, fout);
			else
				fout << "has beaten the program (back to the drawing board)" << '\n' << endl;
			fin.get(tempChar);
		}
		fin >> cw;
	}
	displayResult(sub, tran, del, ins, miss, cor, error, fout);

	return(0);
}
Last edited on
Topic archived. No new replies allowed.