Frequency of characters in array

Forum

Forum
Beginners
Frequency of characters in array

Frequency of characters in array

Hello, I'm reading a text file and I'm trying to count the frequency of each letter. Assuming the whole document is uppercase, this is what I have so far. In the end, I'm trying to obtain how many instances of each letter there are, and then I can divide by the total to get the percentage of each letter. I'm just stuck on keeping a running total of each letter. I know there are other ways to do it. I am just trying to use an array to keep track. Would I have to have variable for each array element, or is there a simpler way?

#include "stdafx.h"
#include "fstream"
#include <iostream>
using namespace std;


int _tmain(int argc, _TCHAR* argv[])
{
        int total = 1;
	char array [26] = {'A','B','C','D','E','F','G','H','I',
'J','K','L','M','N','O','P','Q','R','S','T','U','V','W',
'X','Y','Z'};
	ifstream infile;
	infile.open("p4in.txt");
	while(!infile.eof())
	{
	char c = infile.get();
        total++;
	}
	return 0;
}

Last edited on

cheshirecat (88)

So is there a way to do this? Is there a way to use an array to track frequency? I've read dozens of ways to do it otherwise, and I know several, but if you have an array, can that be used, in any way, to track the amounts of letters in a document?

AbstractionAnon (6954)

If you're going to track each letter, then you need an array.
Here's the simplest way I know:

 
int cnt[26];
...
memset(cnt,0,26*sizeof(int));
...
cnt[c-'A']++;

cheshirecat (88)

do I not already have an array? What are the ...? what is memset? what is c-'A'?

mukomo (27)

what cheshirecat meant was that you make a new array of the same size, made of integers, whose elements get incremented whenever the letter in the same index appears.

cheshirecat (88)

I'm cheshirecat - the original poster, and I don't know how to do that.

cheshirecat (88)

I get that int cnt[26] is an array. But I don't know what is happening with memset, cnt, or those 3 dots.

AbstractionAnon (6954)

memset initializes the cnt array to 0.
http://www.cplusplus.com/reference/cstring/memset/

The three dots were simply an indication that other statements in your program go there. I assumed you could figure out where each line should go. i.e. int cnt[26]; goes after line 12 in your program. memset goes before line 15. cnt[c-'A']++; goes after line 18.

c-'A' is an expression that converts c to an array index. If c contains 'A', then c-'A' is zero. If c contains 'B', then c-'A' is 1, etc.

jlillie89 (403)

Hi cheshirecat,
I read this post quickly because I saw your title I did something very similar to this in my class where I had to go through a file and count frequency of each letter.
Anyways I don't know if this will help but here is the function I had for my class maybe you can modify it ?

//Function to fill frequency array
void  fillFrequency(string str1, int freqcount[], const int alphabet)
{		
	for(string::size_type i = 0; i < str1.size(); i++)
	{	
		str1[i] = toupper(str1[i]);
		++(freqcount[str1[i] - 'A']);
	}
}

The str1[i]-'A' is because of ascii table. So if I remember right it works like this. The size of the array is 26 for the letters of the alphabet. So after the function converts the letter to upper(str1[i] = toupper(str1[i]);) which you already have. Then the values (look up ascii table) are between 65 and 90. Starting with A = 65, to Z = 90. So lets take an example say the letter 'A' its value is 65. So, 65-65 is 0. Remember the array has size 26 so A is now stored in position 0. If you did 'B' it would be 66-65 so again position 1 would now have B. This is what this part is doing.

++(freqcount[str1[i] - 'A']);
the ++ will increment the frequency. So if A goes through position 0's VALUE is 1. Suppose 'A' goes through again then it will be 2 at position 0.

2 | | | |.....
That's what the contents of the array will have so really you fill the array and get the count of each letter with this function. This was hard to come up with for me to my lab aid did a tutorial I thought this was was cool and it gets rid of all the bullsh*t it's pretty simple when you figure it out. Hope my explanation was OK I didn't read back over it to proof myself:)
Hope it helps dude:)

Last edited on

Raezzor (304)

The array of ints he posted is sometimes called a "bucket." Each array element corresponds to the elements of the char array. As you check the input versus the char array you increment the int array for the corresponding letter. This "counts" how many times each letter occurs in your input.

Instead of memset you could initialize the int array like this too:

int cnt[26] {0};
int cnt[26] = {0};
int cnt[26] {};
int cnt[26] = {}; // not 100% sure on this one, but I can't find anywhere that says this isn't valid

Last edited on

cheshirecat (88)

well thanks jlillie, I guess I just don't understand how strings work whatsoever. So, I'll have to take a look at that.

So, I don't need it in a separate function - we're assuming it's all caps so I don't know toUpper either.

So, I would then have this -

for(string::size_type i = 0; i < str1.size(); i++)
{
++(freqcount[str1[i] - 'A']);
}

But why does it say 'A' there? I mean, what if it's a B or a C? Is that only going to track A? It also looks like we're using two arrays here. Is that correct?

Last edited on

cheshirecat (88)

also, abstraction, if you don't use memset, it won't start at 0?

jlillie89 (403)

OK, dude.
So, there is only one array that is called freqcount[] .
That is correct. str1[] is a string and [] is used as an index. The thing you have to remember is that it is used for both strings and arrays. So you see str1[] and you know that it is a string then it behaves as I have tried to show in this example.

Compile this.

#include<string>
#include <iostream>
using namespace std;

int main()
{

	string s = "boat";

	//This is just an example without using the loop [] is just for the index
	// You will see when you run it.
	cout << s[0] << endl;
	cout << s[1] << endl;
	cout << s[2] << endl;
	cout << s[3] << endl;
	cout << endl;
	

        for(int i = 0; i < 4; i++)
	{
		cout << s[i] << endl;

	}


system("pause");
return 0;
}

Read my post above and go to the table to see the values for the letters you will see that if i was = 'B' = 66 that this ++(freqcount[str1[i] - 'A']); Is now ++(freqcount[66-65]); Remember 'A' is 65. So it really is just saying this ++(freqcount[1]);.
The ++ just increments the freqcount at position 1 (or whatever the subtraction tells us) So if the subtraction gave the value of 4 the poistion 4 is incremented this keeps track of all the frequency of any letter that is uppercase. Remember the size of the array is 26 for each letter of alphabet. Now you have an array filled with the frequency of each letter. Hopefully this makes more sense after I tried to explain [] for strings. Mess with the code you will see ...Good luck ...Don't worry I'm stressed about my class also we will make it dude!

cheshirecat (88)

Sorry jlillie, I appreciate the effort, but I don't understand most of it, I ask specific questions about certain parts of it so that I can understand the whole, but you are just giving me everything in it's entirety and not answering my questions specifically. It's like trying to explain what a paragraph means in spanish when I'm asking you about specific words. It doesn't help me to know what the entire paragraph means. I need to know what the specific words mean so that I can create my own paragraphs. Or in simpler terms, I don't get this crap, so lets keep it simple.

What mukomo was saying was that I have two arrays -

so something like:

char keyboard[27] = {'A','B','C','D','E','F','G','H','I',
'J','K','L','M','N','O','P','Q','R','S','T','U','V','W',
'X','Y','Z'};
int frequency[27]

So, if I get a 'B', how is frequency going to know to add something to number 1 (because frequency starts at 0 for A)?

jlillie89 (403)

You have to read what I wrote. I'm telling you I can't explain it any better. I'm sorry. Did you google ascii table? For your question you ask last I will try to explain this. It is really what I'm trying to explain above.
The array that you are filling in my example is just an array of frequency
3 | 6 | 1 | 9 | .....=> So position 0 represents 'A' and 1 represents 'B' and so forth.

Look I know I explained this relatively the same the specific thing you are looking for is here
I swear it is all here.
++(freqcount[str1[i] - 'A']); IT will know to add something to number one because of the ++. 'A' is just a value of '65'. So if you got B it would be 65-64 which is 1. So it will go to position 1 then lastly it ++ position one. This does it for every letter you get. Please go to ascii table look at the values of the uppercase letters. Remember that your array size is 26. Take any of the values for uppercase and walk through the code.
Take a pen and paper. Do ++( inside here first) So pick any letter (uppercase from table)and substitute for i then evaluate the code.

Please please go to ascii and just substitute a value for i and evaluate it. Like I suggested. You will see. I can't explain it any better. Really my response explains the specific question you asked it is just we are not seeing i to i ....(get it) .
It's OK though it happens. Please just evaluate it please. Use the ascii table!!!! This array is just the count remember there is no 'A' inside of it . Just the position 0 represents 'A' position 1 represents 'B'.
Evaluate it!!!!! Good luck.

Chervil (7320)

I've skimmed through this thread but not studied it in detail. Sorry if I missed something. My version:

#include <iostream>
#include <fstream>
#include <iomanip>

    using namespace std;

int main()
{
    int total[26] = {0};

    ifstream infile("input.txt");
    if (!infile)
    {
        cout << "Error opening input file" << endl;
        return 0;
    }

    char c;
    while (infile.get(c))         // read characters one at a time
    {
        if (isalpha(c))           // check it is a-z or A-Z
        {
            c = toupper(c);       // make it always A-Z

                                  // char A-Z has ascii code  65 to 90
                                  // Subtract 'A' to get
            int index = c - 'A';  // index in range 0 to 25;

            total[index]++;       // increment corresponding total
        }
    }


    for (int i=0; i<26; i++)      // Print the results
    {
        cout << "  " << char(i+'A') << " occurs "
             << setw(5) << total[i] << " times" << endl;
    }

    return 0;
}

Input:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Output:

  A occurs    29 times
  B occurs     3 times
  C occurs    16 times
  D occurs    19 times
  E occurs    38 times
  F occurs     3 times
  G occurs     3 times
  H occurs     1 times
  I occurs    43 times
  J occurs     0 times
  K occurs     0 times
  L occurs    22 times
  M occurs    17 times
  N occurs    24 times
  O occurs    29 times
  P occurs    11 times
  Q occurs     5 times
  R occurs    22 times
  S occurs    18 times
  T occurs    32 times
  U occurs    29 times
  V occurs     3 times
  W occurs     0 times
  X occurs     3 times
  Y occurs     0 times
  Z occurs     0 times

In my file, there are various nonalphabetic characters, such as spaces and punctuation, hence the use of isalpha(). Similarly, there are both lower and upper case letters, hence toupper() is used.

Last edited on

cheshirecat (88)

Well, you ask me to compile this but it doesn't work.

#include<string>
#include <iostream>
using namespace std;

int main()
{

	string s = "boat";

	//This is just an example without using the loop [] is just for the index
	// You will see when you run it.
	cout << s[0] << endl;
	cout << s[1] << endl;
	cout << s[2] << endl;
	cout << s[3] << endl;
	cout << endl;
	

        for(int i = 0; i < 4; i++)
	{
		cout << s[i] << endl;

	}


system("pause");
return 0;
}

cheshirecat (88)

I also understand that 'A' has value of zero. 'B' has value of 1. I just don't get how you get that information from the A to the frequency array. No code actually works when I run it, so I can't mess around with it to see the actual steps that the program is taking behind the scenes. I would just copy some code from the net and mess around with it, but all the stuff I've found - 100% of it - for tracking frequency of information read from a file, doesn't use arrays. Everyone keeps going back to strings, but I never get a working piece of code that I can integrate into my program, just the idea. So, if I did understand strings, which I don't, I could use one in my program to get the arrays to sync up somehow. I am continuing to pursue an answer, don't get me wrong, but at this point it looks like I'm not going to find it in a forum.

Last edited on

Chervil (7320)

No. 'A' has the value 65, 'B' has the value 66 and so on.
http://www.asciitable.com/
That's why, in order to find the correct value for int index I have to adjust it, effectively subtracting 65 from the code of the each particular character.

No code actually works when I run it

My code is a working example. If it doesn't work for you, please let me know what goes wrong.

You say you don't understand strings. At its heart, a string is just an array of characters. What in particular don't you understand?

It's disappointing to read this

I am continuing to pursue an answer, don't get me wrong, but at this point it looks like I'm not going to find it in a forum

, after I went to the trouble of posting sample code, and spent time adding comments to explain the important parts.

Last edited on

jlillie89 (403)

Dude I don't know what to tell you. I just copy and pasted my code that I you said didn't work as is on my visual studio and it works just fine. That example is supposed to help you see what [] is doing.

I just don't get how you get that information from the A to the frequency array.

This would say freqcount[12] This is at position 12 Right ? GO TO THE ASCII TABLE A HAS A VALUE OF 65 (UPPERCASE). 66(B) - 65(A) is 1
This now says freqcount[1] This one is position 1 just like 12 a few sentences ago. Now the ++ on the outside of ++(freqcount[1]) litterally saying INCREMENT POSITION 1 BY ONE COUNT
If B came through once POSITION 1 in the array freqcount will have a value of 1 if it came again it will have a value of 2.

You need to look at ascii table PLEASE LOOK AT ASCII
Goodluck dude

Topic archived. No new replies allowed.

C++

Forum

Frequency of characters in array