Baby Names Project

I have to make a project that I need for school where I take a file and print the top baby names for boys and girls, the amount depending on the command line input, using a function, as well as printing the frequency at which those names were used in that year.

7 Most Popular Boys and Girls Names
Girls Frequency Boys Frequency
Emma 20799 Noah 19144
Olivia 19674 Liam 18342
(this is an example of the output that I need to get)

I have gotten this much though don't know what I need to do to make it work.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
  #include <iostream>
#include <fstream>
#include <string>
#include <stdlib.h>

using namespace std;

const int totalNames=10000;

void printTopNames (int&,int& , string&, string& ,const int ,const int, int);

int main(int argc,char *argv[]){
    string name;
    string gender;
    int freq;
    ifstream fin;
    int girlArrayF[totalNames],boyArrayF[totalNames];
    string girlArrayN[totalNames],boyArrayN[totalNames];
    
    if (argc < 3){
        cout << "Wrong number of arguments" << endl << "Correct usage: filename #namesToPrint" << endl;
        return -1;
    }
    
    int numNames = atoi(argv[2]);
    
    fin.open(argv[1]);
    if (fin.fail()){
        cout << argv[1] << " not opened";
        return 1;
    }
    for (int i=0;i<numNames;i++) {
        while (fin >> name >> gender >> freq){
            if (gender == "F") {
                while (fin>>name){
                    for (int i=0;i<numNames;i++){
                        girlArrayN[i]=name;
                    }
                    break;
                }
                while (fin>>freq){
                    for (int i=0;i<numNames;i++){
                        girlArrayF[i]=freq;
                    }
                    break;
                }
            }
            else {
                while (fin>>name){
                    for (int i=0;i<numNames;i++){
                        boyArrayN[i]=name;
                    }
                    break;
                }
                while (fin>>freq){
                    for (int i=0;i<numNames;i++){
                        boyArrayF[i]=freq;
                    }
                    break;
                }
            }
            break;
        }
    }
    cout << argv[2] << " Most Popular Boys and Girls Names" << endl << endl;
    std::cout.width(11); std::cout << std::left << "Girls";
    std::cout.width(5); std::cout << std::right << "Frequency";
    cout << "    ";
    std::cout.width(11); std::cout << std::left << "Boys";
    std::cout.width(4); std::cout << std::right << "Frequency";

        printTopNames(girlArrayF[numNames] , boyArrayF[numNames] , girlArrayN[numNames] , boyArrayN[numNames] , totalNames , totalNames , numNames);  
    
    fin.close();
    return 0;
}

void printTopNames (int&,int&, string&, string&,const int ,const int , int){
    
    ifstream fin;
    string name,gender;
    string girlArrayN[totalNames],boyArrayN[totalNames];
    int freq,numNames;
    int girlArrayF[totalNames],boyArrayF[totalNames];
    
    while (fin>>name>>gender>>freq){
                
                for (int i=0;i<numNames;i++){
                    cout << girlArrayN[i] << girlArrayF[i] << boyArrayN[i] << boyArrayF[i] << endl;
                }
        } 
}
What is the format of the input file? I suspect that the loops at lines 35, 41, 49 & 55 should not be loops at all.

If you're given the frequencies in the input then you should create a struct that contains the frequency and the name. Add a < operator that sorts by frequency:
1
2
3
4
5
struct NameAndFreq {
    string name;
    unsigned frequency;
    bool operator >(const NameAndFreq &right) const;   // sort by frequency
};


Create two vectors, one for the boys and one for the girls:
vector<NameAndFreq> boys, girls;
Read the input file and put the data into these two arrays.

Then sort the arrays using the > operator. This will put the most frequent names first.

Then to print the N most frequent names, you just print the first N records in the array.

I suspect that you won't know some of what I've mentioned (vectors? Operators? Sorting?). If so, just ask questions.
The file is a text file listing all the information like: <name> <gender(M or F)> <frequency> Emma F 20799.

The problem though with what you said is that I am only allowed to use what we have been taught so far, so this has to be done using the void function that I have (I still need to work on it) and arrays.
Have you learned how to sort an array?
Have you learned about structures?
Are you required to use that exact definition of printTopNames() or can define what the parameters are?
If you're required to use the parameters given then what does each parameter mean?
Yes, we have learned both how to sort an array and structures.

As for the parameters, I have to use all seven of them. This is exactly what is says to have "the four arrays (pass the girl and boy name arrays and their corresponding frequency arrays by-reference) as well as the number girls and number of boys along with the number of top names to print"

That's all I have about the variables

Is this what you mean by what each parameter means?
boyArrayN and girlArrayN are for the names
boyArrayF and girlArrayF are for the frequency
totalNames is the max the array can go to (I didn't understand what the instructions ask for)
numNames is the input from the command line for the number of names

And sorry for all this. We got this over spring break and can't get any help for it

Thanks. Based on the description, the function should be something like:
1
2
3
printTopNames(string boyNames[], unsigned boyFreq[], unsigned numBoys,
              string girlNames[], unsigned girlFreq[], unsigned numGirls,
              unsigned numNamesToPrint);

Your code should sort the arrays based on frequency and then print the first numNamesToPrint.

To sort, you a separate function that sorts one pair of arrays:
sortNamesAndFreq(string names[], unsigned freq[], unsigned size);

Then inside printTopNames you can sort boys and girls with:
1
2
sortNamesAndFreq(boyNames, boyFreq, numBoys);
sortNamesAndFreq(girlNames, girlFreq, numGirls);
This is probably an exercise to make you understand the concept of a partial sort. The idea is that if we have 1932 items, and we want to print the 10 items with the highest frequency, we do not need to sort the entire array.

The standard library has an efficient implementation of partial_sort
http://en.cppreference.com/w/cpp/algorithm/partial_sort

If you are not allowed to use it, implement one of your own. For instance, a partial selection sort would look something like this (caveat: untested).
Note: validation is elided for brevity eg. there is no check that numNamesToPrint <= numBoys etc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// return the position of the highest frequency item in the array
std::size_t pos_highest_freq( unsigned int freq[], std::size_t num )
{
    std::size_t pos = 0 ;
    for( std::size_t i = 1 ; i < num ; ++i ) if( freq[i] > pos ) pos = i ;
    return pos ;
}

void partial_sel_sort( std::string names[], unsigned int freq[], std::size_t num_total, std::size_t num_to_sort )
{
    if( num_to_sort > 0 )
    {
        // bring the two largest freq items to the front
        const auto pos_highest = pos_highest_freq( freq, num_total ) ;
        using std::swap ;
        swap( names[0], names[pos_highest] ) ;
        swap( freq[0], freq[pos_highest] ) ;

        // partial sort the rest of the num_total-1 items in the arrays
        // to get the remaining num_to_sort-1 highest elements
        partial_sel_sort( names+1, freq+1, num_total-1, num_to_sort-1 ) ;
    }
}

void printTopNames( std::string boyNames[], unsigned int boyFreq[], std::size_t numBoys,
                    std::string girlNames[], unsigned int girlFreq[], std::size_t numGirls,
                    std::size_t numNamesToPrint )
{

    partial_sel_sort( boyNames, boyFreq, numBoys, numNamesToPrint ) ;
    partial_sel_sort( girlNames, girlFreq, numGirls, numNamesToPrint ) ;

    // print the first numNamesToPrint items from the arrays
}
Thank you both for the help.

The only other questions I have is when I try using the
'const auto' I get an error that says "auto changes meaning in C++11; please remove it" and I'm not sure what I need to do to change that.

The other question is if there is a better way to set the values for my arrays for the names and the frequencies. (not that they even work at the moment anyway)
> I get an error that says "auto changes meaning in C++11; please remove it"

The use of auto on line 14 is C++11.

Change it to:
1
2
// const auto pos_highest = pos_highest_freq( freq, num_total ) ;
const std::size_t pos_highest = pos_highest_freq( freq, num_total ) ;

and it should compile in legacy C++.


> The other question is if there is a better way to set the values for my arrays for the names and the frequencies.

Not sure what you mean by this.
Aren't you able to read in the names and frequencies from the input file?
Alright, thanks.

This should hopefully be the final problem that I have but whenever I try to call this function
1
2
3
4
5
6
//prototype
void printTopNames(std::string&,unsigned int&,std::size_t,std::string&,unsigned int&,std::size_t,std::size_t);

//definition
void printTopNames(std::string& boyArrayN[],unsigned int& boyArrayF[],std::size_t numBoys,std::string& girlArrayN[], unsigned int& girlArrayF[],
std::size_t numGirls,std::size_t numNames){

I get an error of "decleration of 'boyArrayN' as array of reference (the four arrays are supposed to be passed by reference.
Last edited on
This is why it's a good idea to have a function prototype that is identical to the function implementation.

In this case your prototype says you will be passing a single string, by reference, into the function for parameter 1 (the others have similar problems). The implementation is expecting a reference to an array of string so these two items don't match (they must match). Normally an array is not passed by reference unless you want to change the address of the array, I doubt that this is the case here, so you should just be passing the arrays (no reference required). So your function prototypes and implementations should look more like:

1
2
3
void printTopNames(std::string boyNames[], unsigned int boyNameFrequencies[], std::size_t numBoys,
                   std::string girlNames[], unsigned int& girlArrayFrequencies[], std::size_t numGirls, 
                   std::size_t numNames)


Don't forget to "wrap" those long lines and use meaningful variable names.


The other question is if there is a better way to set the values for my arrays for the names and the frequencies. (not that they even work at the moment anyway)

Try modifying your program so that it simply reads the input file into the 4 arrays and then prints the arrays.
Okay, it has been a couple days but I think that I am close to what I need, I just don't know what I need to do to fix it.

With this code, I get the names and frequencies of the girls correct but in the reverse order than I need. Though with the boys, it skips over all of the boys' names that I need.

(Needed output)

7 Most Popular Boys and Girls Names

Girls Frequency Boys Frequency
Emma 20799 Noah 19144
Olivia 19674 Liam 18342
Sophia 18490 Mason 17092
Isabella 16950 Jacob 16712
Ava 15586 William 16687
Mia 13442 Ethan 15619
Emily 12562 Michael 15323

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
#include <iostream>
#include <fstream>
#include <string>
#include <stdlib.h>

using namespace std;

const int totalNames=10000;

void sortNamesB(std::string,unsigned int,std::size_t);
void sortNamesG(std::string,unsigned int,std::size_t);
void printTopNames(std::string boyName[], unsigned int boyFreq[], std::size_t numBoys,
                   std::string girlName[], unsigned int girlFreq[], std::size_t numGirls, 
                   std::size_t numNames);
std::size_t highestFreqB(unsigned int,std::size_t);
std::size_t highestFreqG(unsigned int,std::size_t);

int main(int argc,char *argv[]){
    std::string name,gender;
    std::size_t freq;
    ifstream fin;
    unsigned int girlFreq[totalNames],boyFreq[totalNames];
    std::string girlName[totalNames],boyName[totalNames];
    std::size_t numBoys = 0,numGirls = 0;
    
    if (argc < 3){
        cout << "Wrong number of arguments" << endl << "Correct usage: filename #namesToPrint" << endl;
        return -1;
    }
    
    std::size_t numNames = atoi(argv[2]);
    numBoys = numNames;
    numGirls = numNames;
    
    fin.open(argv[1]);
    if (fin.fail()){
        cout << argv[1] << " not opened";
        return 1;
    }
    do{
        while(numBoys < totalNames && numGirls < totalNames && fin >> name >> gender >> freq){
            if(gender == "M"){
                boyName[numBoys] = name;
                boyFreq[numBoys] = freq;
                numBoys++;
                }
            else{
                girlName[numGirls] = name;
                girlFreq[numGirls] = freq;
                numGirls++;
                }
        }    
    } while(numGirls<numNames && numBoys<numNames);
    
    cout << argv[2] << " Most Popular Boys and Girls Names" << endl << endl;
    std::cout.width(11); std::cout << std::left << "Girls";
    std::cout.width(5); std::cout << std::right << "Frequency";
    cout << "    ";
    std::cout.width(11); std::cout << std::left << "Boys";
    std::cout.width(4); std::cout << std::right << "Frequency" << endl;

    printTopNames(boyName,boyFreq,numBoys,girlName,girlFreq,numGirls,numNames);

    fin.close();
    return 0;
}

std::size_t highestFreqB (unsigned int boyFreq[],std::size_t numBoys){
    std::size_t pos = 0;
    for(std::size_t i=0;i<numBoys;i++){
        if(boyFreq[i]>pos){
            pos=i;
        }
    }
    return pos;
}

std::size_t highestFreqG (unsigned int girlFreq[],std::size_t numGirls){
    std::size_t pos = 0;
    for(std::size_t i=0;i<numGirls;i++){
        if(girlFreq[i]>pos){
            pos=i;
        }
    }
    return pos;
}

void sortNamesB(std::string boyName[], unsigned int boyFreq[], std::size_t numBoys){
    if (numBoys > 0){
        std::size_t posHighestB = highestFreqB(boyFreq,numBoys);
        using std::swap;  
        swap(boyName[0],boyName[posHighestB]);
        swap(boyFreq[0],boyFreq[posHighestB]);

        sortNamesB(boyName+1,boyFreq+1,numBoys-1);
    }
}

void sortNamesG(std::string girlName[], unsigned int girlFreq[], std::size_t numGirls){
    if (numGirls > 0){
        std::size_t posHighestG = highestFreqG(girlFreq,numGirls);
        using std::swap;
        swap(girlName[0],girlName[posHighestG]);
        swap(girlFreq[0],girlFreq[posHighestG]);
    
        sortNamesG(girlName+1,girlFreq+1,numGirls-1);
    }
}
void printTopNames(std::string boyName[totalNames], unsigned int boyFreq[totalNames], std::size_t numBoys,
                   std::string girlName[totalNames], unsigned int girlFreq[totalNames], std::size_t numGirls, 
                   std::size_t numNames){
    sortNamesB(boyName,boyFreq,numBoys);
    sortNamesG(girlName,girlFreq,numGirls);
    for (std::size_t i=0;i<numNames;i++){
        std::cout.width(11); std::cout << std::left << girlName[i];
        std::cout.width(5); std::cout << std::right << girlFreq[i];
        std::cout << "    ";
        std::cout.width(11); std::cout << std::left << boyName[i];
        std::cout.width(4); std::cout << std::right << boyFreq[i] << endl;
    }
}


This is the output that I get


7 Most Popular Boys and Girls Names 
Girls Frequency Boys Frequency 
Emily 12562      Joseph 11995 
Mia 13442         Lucas 12078 
Ava 15586         David 12078 
Isabella 16950  Jackson 12121
Sophia 18490    Matthew 12809 
Olivia 19674      Jayden 12878 
Emma 20799     Aiden 13296
if(boyFreq[i]>pos){
should be
if(boyFreq[i]>boyFreq[pos]){


if(girlFreq[i]>pos){
should be
if(girlFreq[i]>girlFreq[pos]){



Just for comment:
- Functions sortNamesB() and sortNamesG() do exactly the same thing (with different arguments): you don't need both;
- Similarly functions highestFrequencyG() and highestFrequencyB().

- Strictly, YOU DON'T NEED TO SORT ANYTHING - you just need to find the maximum numNames times (NOT numNames + number of boys etc. as you are doing at present). (EDIT - just noticed this is the partial selection sort as done by JLBorges)

- Considering lines 31-33, you appear to start reading into the arrays at position numNames. Now, if you were intending just to find the maximum numNames times and swap into those positions that would be OK (except that you are currently finding the maximum many more times than that, doing a complete sort). The problem is that boyFreq[] has not actually been set for the first elements of the array and your compiler (and mine) is presumably letting you get away by it by initialising to 0, whereas I thought (but could be wrong) that it is undefined for such arrays.
Last edited on
Remove lines 32 & 33. The number of boys and girls at that point is zero, which is what you initialize those variables to at line 24.

Remove lines 40 and 53. You just need to inner loop at line 41.

You don't need sortNamesB() and sortNamesG(). Nor do you need highestFreqB and highestFreqG. You only need one sortNames function and one highestFreq function. Review how function parameters work.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
std::size_t highestFreq(unsigned int freq[],std::size_t num){
    std::size_t pos = 0;
    for(std::size_t i=0;i<num;i++){
        if(freq[i]>freq[pos]){    // using lastchance's fix
            pos=i;
        }
    }
    return pos;
}

void sortNames(std::string names[], unsigned int freq[], std::size_t num){
    if (num > 0){
        std::size_t posHighestB = highestFreq(freq,num);
        using std::swap;  
        swap(names[0],names[posHighestB]);
        swap(freq[0],freq[posHighestB]);

        sortNames(names+1,freq+1,num-1);
    }
}

Topic archived. No new replies allowed.