Counting the occurences of different letters within a file

Hi guys, I'm trying to make a program that can read a file, check each 'character', record the frequency of each character, then print the character that is most frequent. I'm also aiming to make this program apply to any character, not just the letters displayed in the text file.

The text file is as follows:

yy
aaa
b

So far I've managed to parse the file and store each sequence of characters into different elements of a vector named list.

Right now I'm trying to access each element of the vector, count the occurrences of each character and compare them. However once I compile the program, I get the error message "string subscript out of range".

I believe the issue lies somewhere within my last 'for' loop because if I change the loop to "(int i = 0; i < size -1; i++)" the error message stops however, as expected, the console only prints the line "ya", completely ignoring b.

Can anyone explain the reason behind this? I'm still fairly new to programming so I apologise if this seems a bit silly.

Also, any tips on how to make this program would also be appreciated, thanks :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
  std::vector<std::string> list;
	std::ifstream ifile;
	ifile.open("test.txt");
	std::string charsequence;
	char spaces[3]; //array for position of spaces
	int position = 0;
	if (ifile.fail())
	{
		std::cout << "Error opening file" << std::endl;
		exit(1);
	}
	std::string line;
	while (getline(ifile, line))
	{
		for (int i = 0; i < line.length(); i++)
		{
			if (line[i] == ' ')
			{
				spaces[position] = i;
				position++;
			}
		}

		charsequence = line.substr(0, spaces[position]);
		list.push_back(line);
	}

	int size = list.size();

	for (int i = 0; i < size; i++)
	{
		std::string seq = list[i];
		char letter = seq[i];
		std::cout << letter;

	}
	
What is your program like? Can you provide us with expected output?

I rather would read the file char by char.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <iostream>
#include <iomanip>
#include <Windows.h>
#include <fstream>

int main()
{
  std::ifstream src("filename");

  if (!src)
  {
    std::cerr << "Error opening file: ";
    return;
  }
  char ch = 0;
  while (src.get(ch))
  {
    // use ch
  }
  system("pause");
  return 0;
}


To record the frequency of each character, I would use a std::map<char, size_t>
Yes, the expected output of my current code should be: yab.
Currently, I'm trying to loop through the elements of the vector named 'list' (which consists of a series of strings that contain:
list[0] = yy,
list[1] = aaa,
list[2] = b.


Then I want to store the repeated char into the variable 'letter', then print it.
Is it what you wanted?

1
2
3
4
5
6
for (int i = 0; i < size; i++)
	{
		std::string seq = list[i];
		char letter = seq[0];
		std::cout << letter;
	}


Line 33 : You can access the first character and it will be always safe. Otherwise, seq[i] is unsafe because the index is misused.
The std::map could be nice for recording a {char,count} table.
http://www.cplusplus.com/reference/map/map/find/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <limits>
#include <fstream>
#include <cctype>
#include <algorithm>

int main()
{
    constexpr std::size_t N = std::numeric_limits< unsigned char >::max() + 1 ;
    int counts[N] {} ; // letter frequency counts; initialise to all zeroes

    //@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    std::ifstream file( __FILE__ /*"myfile.txt"*/ ) ; // this file
    file >> std::noskipws ; // read all characters, including white-space

    //@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    char c ;
    while( file >> c ) { const unsigned char u = c ; ++counts[u] ; }

    //@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    const auto most_frequent = std::max_element( counts, counts+N ) - counts ;
    std::cout << "(one of the) most frequently occurring byte(s) - \nvalue: " << int(most_frequent)
              << "  frequency: " << counts[most_frequent] ;
    if( std::isprint(most_frequent) ) std::cout << "  printable char: '" << char(most_frequent) << "'\n" ;
    else std::cout << "  it is not a printable character\n" ;

    //@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
}

http://coliru.stacked-crooked.com/a/5e188b823589ba56
closed account (48T7M4Gy)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <vector>
#include <fstream>

int main()
{
    std::vector<int> list(255);
    
    std::ifstream ifile;
    char ch;
    int highest_frequency = 0;
    int index_of_most_frequent = 0;
    
    ifile.open("????.txt");
    
    if (ifile.fail())
    {
        std::cout << "Error opening file" << std::endl;
        exit(1);
    }
    
    while (ifile >> ch)
    {
        list[(int)ch]++;
    }
    
    // LIST OUT RESULTS OF COUNTING
    for(int i = 0; i < list.size(); i++)
    {
        std::cout << i << ' ' << (char)i << ' ' << list[i] << '\n';
        if(list[i] > highest_frequency)
        {
            highest_frequency = list[i];
            index_of_most_frequent = i;
        }
    }
    
    std::cout
    << "Most frequent character " << (char)index_of_most_frequent
    << " frequency = " << highest_frequency << '\n';
    
    return 0;
}
do you know what a bucket sort is?

int counter[256] = {0}; //ascii. Unicode is much more troublesome.

for(I = 0; ....iterate over the file size in chars)
{
counter[characterfromfile]++;
}

Done. Want to know how many times the letter X appeared?
cout << counter[(int)'X'];


edit, looks like someone beat me to it.
Last edited on
closed account (48T7M4Gy)
http://stackoverflow.com/questions/9317248/writing-bucket-sort-in-c

google "c++ bucket sort" if that one is no use.
Thanks for the advice guys :) Particularly Mantorr22, that fixed the issue I was encountering.
Topic archived. No new replies allowed.