Frequency in array of strings

Pages: 12
Jan 20, 2011 at 2:24am
I have read a bunch of words from a file into an array of strings, and now I am trying to determine the frequency of each word in the array.

For example, if the input file was:
 
hello bye bye hello blue hello


I now have an array of strings that is:
1
2
3
4
5
6
A[0] = hello
A[1] = bye
A[2] = bye
A[3] = hello
A[4] = blue
A[5] = hello


What I am trying to do, basically is to achieve the output:
1
2
3
4
Frequency of words
#1. "hello" with 3 occurrences
#2. "bye" with 2 occurrences
#3. "blue" with 1 occurrences 


I'm not asking for the code to do this. I basically want to know if it is even possible, or if I stored the words in a bad way in order to accomplish this task.

Any help is appreciated.
Jan 20, 2011 at 2:45am
Hint: map<string, int>
Jan 20, 2011 at 2:47am
I have this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>
#include <string>

using namespace std;

int main()
{

  string words[5] = { "hello" , "bye", "hello", "blue", "hello" };
  string t1, t2;
  int num[5] = { 0, 0, 0, 0, 0 };

  for(int j = 0; j < 5; j++)
  {
        t1 = words[j];
        for(int i = 0; i < 5; i++)
                if(t1 == words[i])
                        num[j]++;
  }

  for(int i = 0; i < 5; i++)
        cout << words [i] << " " << num[i] << endl;

return 0;
}


and it produces
1
2
3
4
5
hello 3
bye 1
hello 3
blue 1
hello 3


Which is something along the lines of what I am looking for... but not quite, as I need to give the 3 words that occurred the most, and how many times they occurred.
Jan 20, 2011 at 2:49am
Pan, I didn't mention this, but I am not allowed to use container classes for this project.
Jan 20, 2011 at 2:27pm
What is it you are supposed to learn from this project?
Jan 20, 2011 at 5:14pm
Experience text processing techniques. Like I said though, im not looking for the code, just a little assistance.
Jan 20, 2011 at 7:36pm
One thing you can do is to break out of the counting loop as soon as you have found the word and counted it. Then just output all words and counts where the count is > 0.
Jan 20, 2011 at 7:59pm
Can you use pointers and new? You could make a dynamic array that adds a new element to its end every time it encounters a new word.

Or you could make your own container class (simple container) that holds strings and ints then add a new container every time a word is not in a class.
Jan 20, 2011 at 9:34pm
Normally I wouldn't type up this code, but to see if it was a workable example with pointers and dynamic new arrays I wrote a test code.

Its messier than what I'd normally make.
#snip
Last edited on Jan 20, 2011 at 11:56pm
Jan 20, 2011 at 11:48pm
Thanks for doing that wolfgang, however I got it working before seeing this post (of course that would happen). I had basically did the same thing that you have done here, separating it into two separate arrays and checking if it was already, and incrementing a counter upon new/repeat words.

But thanks for your time. Yours is a little better than mine too lol.

Edit: spelling

Last edited on Jan 20, 2011 at 11:50pm
Jan 20, 2011 at 11:53pm
That's a ton of ownership passing you're doing there, wolfgang.

=x
Jan 20, 2011 at 11:56pm
Disch wrote:
That's a ton of ownership passing you're doing there, wolfgang.

=x


Hmm? You mean giving out the code? I will probably cut it out. I'm not too bothered as I was just making sure I could get it to not explode with pointers (especially if the OP wanted to use said method then asked me how to get it done. I wouldn't know how to answer otherwise.)

Code being snipped now.
Jan 21, 2011 at 1:12am
I think he meant stuff like "return (some pointer pointing to new'd data)".
Jan 21, 2011 at 1:26am
Everything I did was passed by reference. I passed in a int* by reference and simply gave the address of a new allocation after deleting the old. It probably isn't the best way and comes with a gratuitous amount of overhead in the long run.
Jan 21, 2011 at 1:31am
Yeah I didn't have a problem with the fact that you posted code.

Like firedraco suggested, I meant you're handing off responsibility for a dynamically allocated buffer to a portion of the program that didn't allocate it. That's what I meant by "passing ownership". Doing that makes code really hard to manage and maintain, and leads to memory leaks that are next to impossible to track down.
Jan 21, 2011 at 1:48am
Ahh I see. Can you offer some ideas on how to make the allocations correctly?
Jan 21, 2011 at 4:12am
Avoid dynamic allocation is the best way. :-)
Jan 21, 2011 at 4:47am
What PanGalactic said.

In places where it's necessary, objectify it. Dynamically allocated memory should have a clear owner -- and that owner should be responsible for all cleanup.

This is particularly easy with STL, as it offers such containers already (like std::vector).
Jan 21, 2011 at 5:16am
So mainly you recommend if I need to use dynamic memory, it should be handled by in objects. Okay. That's nice and simple enough. I was going to write a class for the code I made, but as I said when I made it I was making it quick and dirty. (REALLY DIRTY). I was more interested in the actual working of the idea, not the means.

If I ever made something for my college course I always deal with objects.
Jan 21, 2011 at 6:09am
I prefer malloc to new[]. Why? Because of realloc. When making a dynamic array, new[] forces you to copy the array every time you resize it. That's crazy. Until they add renew[] or whatever, I'll (and I believe you should) only use new, never new[].
Last edited on Jan 21, 2011 at 6:12am
Pages: 12