Determining the number of characters, words,and paragraphs in a file

Hi all,

I would like to write a program to count the number of characters, words, and paragraphs (or newlines) in a file. I have the characters counting correctly and the newlines counting correctly but I am stuck on how to count the words, I would like to store the words in an array to hold the size of the word (i.e. the number of letters influences where its stored in the array) then I need to sum up the array and display the output.

Thanks in advance for the help.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
// This program reads the characters, words, and paragraphs
//in a file, the user enters the file and the output is displayed with
//the number of words, characters, and paragraphs 

#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>

using namespace std;

//Function Prototypes
void openTheFile(ifstream& fileToBeOpened, string& nameOfFile);
void countCharacters(ifstream &fileToBeOpened, int &numOfChars, int &numOfNewLines);
void testOpenFile(ifstream& fileToBeOpened);
void countWords(ifstream& fileToBeOpened, int wordLengths);


int main()
{
  //variable declarations
  string nameOfFile, word;
  ifstream fileToBeOpened;
  int numOfChars = 0, numOfNewLines = 0, wordLengths[] = {}, size = 11;

  //functions within the main
  openTheFile(fileToBeOpened, nameOfFile);
  testOpenFile(fileToBeOpened);
  countCharacters(fileToBeOpened, numOfChars, numOfNewLines);
  countWords(fileToBeOpened, wordLengths[size]);
  return 0;


}

//opens the file the user enters
void openTheFile(ifstream& fileToBeOpened, string& nameOfFile)
{
  cout<<"What is the name of the file? ";
  cin>>nameOfFile;
  fileToBeOpened.open(nameOfFile.c_str());
}

//test to see if the program can open the file, if not the program exits
void testOpenFile(ifstream& fileToBeOpened)
{
  if(!fileToBeOpened)
    cout<<"Error opening the file!";
    //exit(1) the exit causes program to terminate and not display output 
}

//counts the number of characters and newlines
void countCharacters(ifstream &fileToBeOpened, int &numOfChars, int &numOfNewLines)
{
  char character;
  while(character != EOF)
    {
      character = fileToBeOpened.get();
      numOfChars++;
      if(character == '\n')
	{
	  numOfNewLines++;
	}
    }
  //cout <<numOfChars <<" " <<numOfNewLines;  (testing my function)
}

//counts the number of words in the file
void countWords(ifstream& fileToBeOpened, int wordLengths) 
{  
  string word;
  int wordCount = 0;
  fileToBeOpened>>word;
  while(!fileToBeOpened.eof());
    {
      fileToBeOpened>>word;
      wordCount++;      
    }
    cout << wordCount;
}

//sums the array where the word lengths are stored
/*void sumTheArray (int wordLengths, int size)
{
  

}

//rewinds the file to the beginning
void rewindFile(ifstream& fileToBeOpened, string& nameOfFile);
{
  fileToBeOpened.close();
  fileToBeOpened.clear();
  fileToBeOpened.open(nameOfFile);
}
*/

//Just need to write a function to display the output. 
You are looking to "Tokenize" your file. This is easily done with std::vector<std::string>. Check this out:
1
2
3
4
5
6
7
8
9
10
11
12
vector< string > tokenize( ifstream& file )
{
    vector< string > tokens;
    string word;
    
    while ( file.good() )
    {
        file >> word;
        tokens.push_back( word );
    }
    return tokens;
}


With this we can do something like the following:
1
2
3
4
5
6
7
8
9
ifstream file( "input.txt" );

vector<string> words = tokenize( file );

cout << "There are " << words.size() << " words in the file" << endl;

cout << "The words are: " << endl;
for (int i = 0; i < words.size(); ++i)
    cout << words[i] << ' ';

Is there another way with out using vector? I haven't learned them yet in my class I'm taking at university.

Thanks
It's very easy if we just want to count the words:
1
2
3
4
5
6
7
8
9
10
11
int countWords(ifstream& fileToBeOpened) 
{  
  string word;
  int wordCount = 0;
  while ( fileToBeOpened.good() )
  {
    fileToBeOpened>>word;
    wordCount++;
  }
  return wordCount;
}



If you also wanted to store the words, I suppose you could do something like this:
1
2
3
4
string words[2000] = {""}; // fills all elements with blank string
int wordIndex = 0;
while( fileToBeOpened.good() && wordIndex < 2000)
    fileToBeOpened >> words[ wordIndex++ ];
I don't see how storing the words is beneficial unless you want to use the words to determine the number of characters, rather than making another pass through the file. Of course, you don't need any storage beyond that required for a single word in that case. Just keep a running sum of the word lengths as you read them.

I would take a different approach:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <iostream>
#include <iomanip>
#include <sstream>
#include <cctype>

std::istringstream in(
    "Hi all,\n"
    "\n"
    "I would like to write a program to count the number of characters, "
    "words, and paragraphs(or newlines) in a file.I have the characters "
    "counting correctly and the newlines counting correctly but I am "
    "stuck on how to count the words, I would like to store the words in "
    "an array to hold the size of the word(i.e.the number of letters "
    "influences where its stored in the array) then I need to sum up the "
    "array and display the output.\n"
    "\n"
    "Thanks in advance for the help."
);

int main()
{
    std::size_t characters = 0;
    std::size_t words = 0;
    std::size_t paragraphs = 0;

    bool in_word = false;
    bool in_paragraph = false;

    char token;
    while (in.get(token))
    {
        if (std::isspace(token))
        {                               // whitespace
            in_word = false;

            if (token == '\n')
                in_paragraph = false;
        }
        else
        {                               // non-whitespace
            ++characters;

            if (!in_paragraph)
            {
                in_paragraph = true;
                ++paragraphs;
            }

            if (!in_word)
            {
                in_word = true;
                ++words;
            }
        }
    }
    
    std::cout << "Characters:" << std::setw(5) << characters << '\n';
    std::cout << "Words:     " << std::setw(5) << words      << '\n';
    std::cout << "Paragraphs:" << std::setw(5) << paragraphs << '\n';
}


http://ideone.com/dySIin

[Edit: Whether whitespace/newlines should be considered in the character count is something that should be specified by the problem. Here, I assume it should not be, but it should be easy enough to make an adjustment in the code if it doesn't reflect the problem's requirements.]
Last edited on
Topic archived. No new replies allowed.