Spell Checker

Over the last few days, I've been trying to implement a spell checker in c++ that checks if the inputted word is in the dictionary file on my computer. However, as I'm nearing completion of the project, I keep running into more and more errors, some of which I do not understand. My program is complete, albeit, with some minor bad practices (I hard coded a few things just to get the program up and running. I would like some help with the last function. For some reason, everything works well until I output spelling suggestions. I type in "mses" (attempting to spell mess) and it returns some strange characters. It does definitely recognize that there are two permutations that match words in the dictionary (however they are both mess due to the algorithm implemented), and I don't understand why? I will work on removing the duplicates after.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
// Spell Checker.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <stdio.h>
#include <string>
#include <stdlib.h>
#include <fstream>
#include <iostream>

using namespace std;

#pragma warning(disable : 4996)

int k;
char output[100][100];
//char **output;

int factorial(int n)
{
	if (n<=1){
		return 1;
	}
	else
		return n*factorial(n-1);
}

/*
void print(char *v, const int n){
	//sprintf(output[k++],"%s",v);
	//strcpy(output[k],v);
	//k++;
}
*/

void permute(char *v, const int start, const int n)
{  
  if (start == n-1) {
    //print(v,n);
	  sprintf(output[k++],"%s",v);
	  
  }
  else {
    for (int i = start; i < n; i++) {
      int tmp = v[i];
      
      v[i] = v[start];
      v[start] = tmp;
      permute(v, start+1, n);
      v[start] = v[i];
      v[i] = tmp;

    }
  }
}

void dictionary(int m)
{
	ifstream infile("dictionary.txt");
	string *list= new string[267751];
	//char list[267751][5];
	//string list[267751];
	if (infile.is_open()){
		for (int i=0; i <267751; i++)
		{
			//infile>>list[i];
			getline(infile, list[i]);
			//cout<<list[i]<<endl;
		}
	}
	
	infile.close();
	for (int i=0; i<267751; i++)
	{
		for (int j=0; j<m; j++)
		{
			
		if(output[j]==list[i])
		{
		printf("Spelling suggestions: %s \n", list[i]);
		}
		}
	}
	
}	
	




int main()
{
	//read in word
	char word[100];
	 printf ("Enter a word: ");	 
	 scanf("%s",word);
	 int n=strlen(word);
	 int i,m=factorial(n);

	 for(int j=0; j<n; j++)
	 {
		word[j]=toupper(word[j]);
	 }
	//generate permutations of the word
	//char** output=(char**)malloc(m*sizeof(char*));
	//for (i=0; i<m; i++){
		//output[i]=(char*)malloc((n+2)*sizeof(char));
	//}
	k=0;
	permute(word,0,n);

	for (i=0; i<m; i++){
		printf("%d %s\n",i,output[i]);
	}
	
	
	//for each word in the permutation list, search it in the dictionary. if match, store in an output array
	dictionary(m);
	//give names for output
	system("pause");
}
You can't use printf to print a std::string.

You can, however, use printf on the C string the std::string returns when you use its c_str() method

printf("Spelling suggestions: %s \n", list[i].c_str()); // cf. your line 80

But why the C style output, rather than C++ ???

cout << "Spelling suggestions: " << list[i] << "\n";

Andy
Last edited on
Ah I just tried cout a little while ago and it worked! Thanks for the heads up! I tried to use a character array as you can see but my program would start giving me like stack overflow problems? Also, I would like the permutations not to show to the user, but when I don't print them, the program doesn't work..
Also any help with cleaning up the code and some explaining would be greatly appreciated. I'm trying to develop good habits and a good understanding of the language.
First few thngs you should do, aside from a general tidy up, are:

a. replace the new string with vector<string>, which will allow you to get rid of the "special" value 267751 (if you use push_back to add to your vector)

b. elimnate the globals (k and output)

c. switch to C++ style i/o

Andy

PS rather than using the #pragma to disable warning C4996, you should be able to just add this before your includes

#define _CRT_SECURE_NO_WARNINGS
Last edited on
Use a proper distance measure; for instance Levenshtein distance
http://en.wikipedia.org/wiki/Levenshtein_distance

This uses a fair amount of C++11 features and may not compile cleanly with a Microsoft compiler. But you should be able to get the general idea:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <string>
#include <vector>
#include <numeric>
#include <algorithm>
#include <iostream>

std::size_t levenshtein_distance( std::string first, std::string second )
{
    if( second.size() < first.size() ) std::swap(first,second) ;
    std::vector<std::size_t> current( second.size()+1 ), previous(current) ;
    std::iota( previous.begin(), previous.end(), 0 ) ;

    for( std::size_t i = 0 ; i < first.size() ; ++i )
    {
        current[0] = i+1 ;
        for( std::size_t j = 0 ; j < second.size() ; ++j )
        {
            current[j+1] = std::min( std::min( current[j], previous[j+1]) + 1,
                                     previous[j] + ( first[i] != second[j] ) ) ;
        }
        current.swap(previous);
    }
    return previous.back() ;
}

std::vector<std::string> nearest_words( const std::string& word, const std::vector<std::string>& dict )
{
    constexpr std::size_t MAX_DISTANCE = 5 ;
    std::vector<std::string> nearest[MAX_DISTANCE+1] ;
    for( const std::string match : dict )
    {
        std::size_t dist = levenshtein_distance( word, match ) ;
        if( dist <= MAX_DISTANCE ) nearest[dist].push_back(match) ;
    }

    for( const auto& seq : nearest ) if( !seq.empty() ) return std::move(seq) ;
    return {} ;
}

int main()
{
    const std::vector<std::string>& dict { "mess", "less", "plus", "pious", "messes", "misses", "lessen", "listen" } ;
    const std::string test[] = { "mises", "missse", "mesten", "pius" } ;
    for( const std::string& word : test )
    {
        std::cout << word << " => " ;
        for( const auto& nearest : nearest_words( word, dict ) ) std::cout << nearest << ' ' ;
        std::cout << '\n' ;
    }
}

http://ideone.com/tGHyow
I've been trying to use dynamic memory for the arrays but I can't seem to figure it out?
I replaced the new string with vector<string>, but I realized it takes substantially more time to produce suggestions... I would rather the search be as quick as possible?
Topic archived. No new replies allowed.