Word Count

Hi, I have to create a simple program that can compute the number of words, bytes, and lines that are represented in a string. I have a bit of a problem because I can compute the bytes with no problem but I'm having a bit of a hard time computing the lines and the number of words in the string. Any help would be gladly appreciated!

#include<iostream>
using namespace std;

int main( )
{
char ch[30];
int line=1, words=0;
string bytes;

getline(cin,bytes);
cin.getline(ch,30);

for(int i = 0; ch [i] != '\0'; i++){
if ( ch [i] == ' '){
words++;
}
else if (ch [i] == '\n'){
line++;
}
}
cout<< '\n' << line << '\t' << words+1 << '\t' << bytes.size()<< endl;

return 0;
}

Thomas1965 (4571)

Is there a reason why you use this old C char arrays?
Your task would be much easier using a string and a stringstream.

dhayden (5799)

I wouldn't use a char array or a string. After all, what if the input file is a single 100GB line? Your program will probably run out of memory.
A byte is obviously any char that you read, but what exactly defines a line? Reading \n? How about \r? What about \n\r (the new line encoding in Windows)?

What about a word? What exactly defines a word? A non-space character? But that would include punctuation. The transition from a non-letter to a letter? That's much better, but then what about contractions? Is "don't" one word or two? (hint: for this assignment, you can probably ignore that distinction).

Once you're clear on what makes a char, a word and a line, you'll probably find that all you need is the current character and the previous one.

privatewabbit1 (5)

Im still really confused :/
if I take away the char my whole program stops working.

JLBorges (13770)

> compute the number of words, bytes, and lines that are represented in a string.

#include <iostream>
#include <string>
#include <iomanip>

int main()
{
    const std::string str = "first line\n  the\tsecond    line\n   a third  line   " ;
    std::cout << std::quoted(str) << "\n\n" ;

    ////////////////// number of bytes //////////////////////////
    const std::size_t nbytes = str.size() ;


    ////////////////// number of lines //////////////////////////
    std::size_t nlines = 0 ;

    // count the number of new line characters in the string
    // range based loop: http://www.stroustrup.com/C++11FAQ.html#for
    // for each char 'c' in string 'str'
    for( char c : str ) if( c == '\n' ) ++nlines ;

    // if the last character is not a new line, add one more (for the last line)
    if( !str.empty() && str.back() != '\n' ) ++nlines ;


    ////////////////// number of words //////////////////////////
    std::size_t nwords = 0 ;

    bool last_char_was_ws = true ;
    for( char c : str )
    {
        // check if this char is white space
        const bool this_char_is_ws = std::isspace(c) ;

        // if the last char was white space, and this char is not white space,
        // a new word is starting; increment the count of words
        if( last_char_was_ws && !this_char_is_ws ) ++nwords ;

        // this char becomes the last char for the next iteration
        last_char_was_ws = this_char_is_ws ;
    }


    /////////////////////////////////////////////////////////////////
    std::cout << "#bytes: " << nbytes << '\n'
              << "#words: " << nwords << '\n'
              << "#lines: " << nlines << '\n' ;
}

http://coliru.stacked-crooked.com/a/a223a4241a41a43c

In the above snippet, we iterate through the characters in the string twice:
once to count the number of lines and a second time to count the number of words.

Would it be possible to count both simultaneously with just one pass through the string?
Would this attempted optimisation be worth the extra code complexity?

laedus (5)

Hello,

Here is a snippet with a very useful utility.

Best regards.

void Tokenize (const string &str, vector<string>& tokens, 
			    const string& delimiters)
 {
    tokens.clear ();
    
    // Skip delimiters at beginning.
    string::size_type lastPos = str.find_first_not_of (delimiters, 0);
    // Find first "non-delimiter".
    string::size_type pos     = str.find_first_of (delimiters, lastPos);
    
    while (string::npos != pos || string::npos != lastPos) {
      // Found a token, add it to the vector.
      tokens.push_back (str.substr(lastPos, pos - lastPos));
      // Skip delimiters.  Note the "not_of"
      lastPos = str.find_first_not_of (delimiters, pos);
      // Find next "non-delimiter"
      pos = str.find_first_of(delimiters, lastPos);
    }
  }

Last edited on

Topic archived. No new replies allowed.