Program that calculates the frequencies of characters read from the standard input

After almost 2 years without touching any C or C++, now I need to do it again, finally :) even though I've forgotten a lot things.

I've created a program that calculates the frequencies of characters read from the standard input. This program should at the end output (from the most frequent to the least one) all characters followed by their respective frequencies.

My idea was to create an array first where the frequencies of the characters will be put.

After that I calculate how many different characters have been read (n). I create a struct (Char) which will contain both the character and its corresponding frequency, and then I allocate a buffer for n Char. Then I use qsort to sort the buffer of Char. And finally I print the characters followed by their frequencies.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include<stdio.h> // getchar, EOF
#include<stdlib.h> // qsort, calloc
#include<ctype.h> // isprint


typedef struct{
    int freq;
    char val;
} Char; // alias


int comp_char(const void* a, const void* b){
    return (((Char*)b)->freq) - (((Char*)a)->freq);
}

void read_stdin(int* f){
    int c;
    while((c = getchar()) != EOF) ++f[c];
}

int main(){
    
    int f[256] = {0};
    
    read_stdin(f);
    
    // Calculating the number of characters read from the standard input.
    int n = 0;
    for(short i=0; i<255; ++i)
        if(f[i] != 0) ++n;
    
    // Allocating a buffer of size n for Char objects.
    Char* chars = calloc(n, sizeof(Char));
    
    // Setting the val and freq fields of Chars
    for(short i=0, j=0; i<255; ++i){
        if(f[i] != 0) {
            chars[j].freq = f[i];
            chars[j++].val = (char)i;
        }
    }
    
    // Sort Char elements by their freq field.
    qsort(chars, n, sizeof(Char), comp_char);
    
    // Print Char's val followed by the respective Char's freq.
    for(short i=0; i<n; ++i){
        if(isprint(chars[i].val))
            printf("'%c' %i\n", chars[i].val, chars[i].freq);
        else
            printf("%3o %i\n", chars[i].val, chars[i].freq);
    }
    
    // Freeing memory allocated for chars.
    free(chars);

    return 0;
}


My first question is:

1. Is the program correct?

2. Are there parts where the risk of undefined behaviour is greater than 0?

3. Is it correct to cast i to a char using (char)i? In this case, I'm assuming that the characters in the file will only be ASCII ones. But what if they weren't? How can I change my program to deal with this cases?

4. In general, can I improve my program? If yes, where?

Note:

1. This is actually a C program

2. I would like that both the file can be passed to the program as ./name_of_this_file < file_to_count.txt or simply ./name_of_this_file and typing words, and after pressing control-d to reach EOF.

3. The program should be as efficient as possible.

Thanks a lot for all the help!
Last edited on
1) It appears reasonable.

2) Nothing jumps out at me as undefined behavior.

3) Yes. You're dealing with all possible values from 0-255, so no worries. Your isprint() at line 48 takes care of non-printable values.

4) I would tend to eliminate array f and allocate 256 occurrences of Char up front. Zero entries can be ignored at print time.



3) Yes. You're dealing with all possible values from 0-255, so no worries. Your isprint() at line 48 takes care of non-printable values.

Actually, I've just realized that in the loops I go up to 254, but I should anyhow go up to 255, i.e. I should for example change i<255 to i<256.


4) I would tend to eliminate array f and allocate 256 occurrences of Char up front. Zero entries can be ignored at print time.


But then I would need to initialise freq to 0 or -1 before using the struct (to then sort the buffer), right? Do you still think it's better to do as you're saying?
But then I would need to initialise freq to 0 or -1 before using the struct

True. It's probably a toss-up. You do get rid of the loops at 27-30 and 35-41 though.

If I can use one array instead of two, that tends to be my preference.
Ok, thanks. At the end I decide to go with your way ;)
Topic archived. No new replies allowed.