Counting Repeating Words

Hello,

My instructor is asking us to write a program that counts the number of repeating words in a string. The string must be user input, and the algorithm must be basic (no advanced methods, not even vectors.) She told us that this program is very simple, and possible through the use of a nested for-loop. No matter how many times I try to write this program, I keep failing. This assignment is due tomorrow, thank you for the help!
I will assume that "words" is taken as the English (or <insert human language here>) words, as opposed to some other definition.

Doing it as a nested for loop is significantly less efficient than a simple histogram -- but since you are forbidden that, let's consider an example:

    "The quick brown fox jumps over the lazy dog."

The first thing that must be done to the sentence is to eliminate punctuation and case differences. This can be done by transforming it: all lower case (or upper case), and replace non-characters with spaces.

    "the quick brown fox jumps over the lazy dog "

Now you need your outer loop to go through every word in the sentence.

The inner loop should also go through every word in the sentence from the beginning, so that you can detect and eliminate duplicates in your output.

the -found twice
quick -found once
brown -found once
fox -found once
jumps -found once
over -found once
the -ALREADY FOUND (do not output)
dog -found once

So your output might be:
2 the
1 quick
1 brown
1 fox
1 jumps
1 over
1 dog

Now, your code is complicated by the fact that you must separate on WORDS. This can be done by keeping two indices for each loop, word_start and word_end. At the beginning of each loop, word_start must scan forward for the first non-space character. Then the word_end must be the first non-alphanumeric character.

This is actually a pretty easy thing to do. I recommend using functions for the scanning part, but if that is forbidden, you can easily inline the code at the head of each loop.

Hope this helps.
Topic archived. No new replies allowed.