How to read a txt file and store it into an array without numbers and punctuation?

I'm working on a program that can read a text file and analyze the percentage of each specific words. I've finished all of the functions, but have some trouble with removing numbers and punctuations. the text file looks like this: "1 guy is writing his shitty code, he got 6 functions ready but stuck at "reading into the array". he is so stupid."

here is my main:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  ifstream fin;
string fileName;
cout << "Please input the filename:" << endl;
cin >> fileName;
fin.open(fileName);
string theWord;
string buffWord;

int pos = 0;
string wordList[2000];

while (!fin.eof()) {
    fin >> buffWord;
    int beginPos = 0;
    int length = buffWord.length();
    bool isLetter=true;
    while (beginPos<length)
    {
        if (isalpha(buffWord[beginPos])==false) {
            isLetter = false;
        }
        beginPos++;
    }
    if (isLetter == true) {
        theWord = buffWord;
        wordList[pos] = theWord;
        pos++;
    }
}
    cout << wordList[13] << endl;  //this part is for testing
    cout << wordList[14];


the ideal value of array should be {"guy","is","writing",.....}
thanks a lot guys


I've edit it to this, and now it doesn't pop-up error window, but just goes to "not respongd"


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
	ifstream fin;
	string fileName;
	cout << "Please input the filename:" << endl;
	cin >> fileName;
	fin.open(fileName);
	string theWord;
	string buffWord;

	int pos = 0;
	string wordList[2000];

	while (!fin.eof()) {
		fin >> buffWord;
		int beginPos = 0;
		int length = buffWord.length();
		bool isLetter = true;
		int i = 0, j = 0;
		while (i < length) {
			if (isalpha(buffWord[i])) {
				buffWord[i] = buffWord[i];
			}
			else {
				buffWord.erase(i, 1);
			}
			theWord = buffWord;
		}
			wordList[pos] = theWord;
			pos++;
	}
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <iostream>
#include <fstream>
#include <string>
#include <cctype>

std::string remove_non_alpha( std::string str ) {

    std::string result ;
    for( char& c : str ) if( std::isalpha(c) ) result += c ;
    return result ;
}

int main() {

    std::string fileName;
    std::cout << "Please input the filename: " ;
    std::cin >> fileName ;

    if( std::ifstream fin{fileName} ) { // if the file was opened for input

        const std::size_t MAX_WORDS = 2'000 ;
        std::string wordList[MAX_WORDS] ;

        std::size_t num_words = 0 ; // actual number of valid words read

        // read up to a maximum of MAX_WORDS
        std::string word ;
        while( num_words < MAX_WORDS && fin >> word ) {

            word = remove_non_alpha(word) ; // remove punct, digits etc
            if( !word.empty() ) wordList[num_words++] = word ;
        }

        // do something with the words that were read. for instance, print them out:
        std::cout << "\nvalid words read from file '" << fileName << "'\n--------------------------------\n" ; 
        for( std::size_t i = 0 ; i < num_words ; ++i )
            std::cout << i << ". " << wordList[i] << '\n' ;
    }

    else {

        std::cerr << "error opening file '" << fileName << "'\n" ;
        return 1 ;
    }
} 

http://coliru.stacked-crooked.com/a/e1215243e0f17145
Thanks a lot, it works, partially.
I've created a text file, your code can do that, but it cannot work with my original txt file (which contains some Unicode letters, greek I think?)
here's the link of my weird file
https://drive.google.com/open?id=1LhdACxKU-BCa8B8tiv8iPc3cywyvcGV6
It also appears to have a mixture of line endings - some lines have internet standard (windows) line-endings and others have unix native line-endings.

The simplest solution would be to select and copy the text from google drive (in the browser) and then use a text editor to save it as a plain text file.
Topic archived. No new replies allowed.