Finding out how many times a string shows up in a file.

I am trying to find out how to make a function that can find out the string that appears the most in that certain file. It's an exercise from Tony Gaddis' Starting out with C++. There is a file you can download that has the baseball teams that won the world series each year listed. It doesn't list the year, just the teams on separate lines. Here is the file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
Boston Americans
New York Giants
Chicago White Sox
Chicago Cubs
Chicago Cubs
Pittsburgh Pirates
Philadelphia Athletics
Philadelphia Athletics
Boston Red Sox
Philadelphia Athletics
Boston Braves
Boston Red Sox
Boston Red Sox
Chicago White Sox
Boston Red Sox
Cincinnati Reds
Cleveland Indians
New York Giants
New York Giants
New York Yankees
Washington Senators
Pittsburgh Pirates
St. Louis Cardinals
New York Yankees
New York Yankees
Philadelphia Athletics
Philadelphia Athletics
St. Louis Cardinals
New York Yankees
New York Giants
St. Louis Cardinals
Detroit Tigers
New York Yankees
New York Yankees
New York Yankees
New York Yankees
Cincinnati Reds
New York Yankees
St. Louis Cardinals
New York Yankees
St. Louis Cardinals
Detroit Tigers
St. Louis Cardinals
New York Yankees
Cleveland Indians
New York Yankees
New York Yankees
New York Yankees
New York Yankees
New York Yankees
New York Giants
Brooklyn Dodgers
New York Yankees
Milwaukee Braves
New York Yankees
Los Angeles Dodgers
Pittsburgh Pirates
New York Yankees
New York Yankees
Los Angeles Dodgers
St. Louis Cardinals
Los Angeles Dodgers
Baltimore Orioles
St. Louis Cardinals
Detroit Tigers
New York Mets
Baltimore Orioles
Pittsburgh Pirates
Oakland Athletics
Oakland Athletics
Oakland Athletics
Cincinnati Reds
Cincinnati Reds
New York Yankees
New York Yankees
Pittsburgh Pirates
Philadelphia Phillies
Los Angeles Dodgers
St. Louis Cardinals
Baltimore Orioles
Detroit Tigers
Kansas City Royals
New York Mets
Minnesota Twins
Los Angeles Dodgers
Oakland Athletics
Cincinnati Reds
Minnesota Twins
Toronto Blue Jays
Toronto Blue Jays
Atlanta Braves
New York Yankees
Florida Marlins
New York Yankees
New York Yankees
New York Yankees
Arizona Diamondbacks
Anaheim Angels
Florida Marlins
Boston Red Sox
Chicago White Sox
St. Louis Cardinals
Boston Red Sox
Philadelphia Phillies
New York Yankees
San Francisco Giants
St. Louis Cardinals
San Francisco Giants


I have to sort through these somehow to get the string that appears the most. I'm assuming you make two vectors, and store the file in one, and have the second vector store the highest appearing string. I don't know how to go about doing that, though.

I would really appreciate the help!
If you don't have any contrains of what to use the easiest way is to use a map<string, unsigned int>.
I actually haven't used maps before. The course I'm in hasn't gotten there yet.
This is what I wrote so far. I wrote it while I was thinking about what I wanted. Those comments are basically stating what I thought at the moment of writing it. I don't know if it's right, cause I can't get it to run. The error says the function call 'has not match for call to ' then states something about vectors and strings. What I want to know is if the body code is somewhat correct.

Sorry, I'm writing this in kind of a hurry. If I'm not making sense, I can elaborate when I have time to come back and look at my code more.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
void mostWins(vector<string> vec1, vector<string> vec2, string name){

    //This line opens whichever file you choose to open.
    ifstream inFile(name.c_str());
    string names;
    int counter = 0;

    //If the file is open, proceed with the below
    if(inFile.is_open()){
        while(getline(inFile, names)){
            vec1.push_back(names);
            }
    }
    else{
        cout << "\nSomething went wrong!" << endl;
    }
    inFile.close();

    //This for loop is getting the amount of times
    //the team name was found in the file.
    //The team name was entered by a user.
    for(size_t i = 0; i < vec1.size(); i++){

        //This line is for the input in the main function.
        if(names.compare(vec1[i]) == 0){
            counter++;
        }
    }

    //This line is outputting the amount of times a certain team won
    cout << "\nThe " << names << " have won " << counter << " times!\n";

    //This for loop is going to move all the data
    //from the first vector into the second vector.
    for(size_t i = 0; i < vec1.size(); i++){
        move(vec1.begin(), it, std::back_inserter(vec2)); ;
    }
}


Here are the steps that function takes, as I think they do:
step 1: Open a file
Step 2: If file is open, write the contents of it into a vector
Step 3: Close the file
Step 4: Look through the first vector
Step 5: Check how many times a string shows up
Step 6: Output the number of wins and team name
Step 7: Place the data from the first vector into the second.

Step 7 actually doesn't make sense now that I wrote it out. It's just copying all the data that was in the first, without transfering the data I wanted. Sorry, I'm still new and this is confusing me right now.

Thanks, in advance!
Why don't you create a struct like this:
1
2
3
4
5
6
struct WordInfo
{
   string word;
   size_t count;
}
and store all of them in a vector<WordInfo>

To count the words you read the file line by line and check if the word is already in the vector. If it is you increase to count of the WordInfo var otherwise you add a new WordInfo var with count 1.

I quickly hacked together a liitle example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
#include <fstream>
#include <iostream>
#include <string>
#include <vector>

using namespace std;

struct WordInfo
{
  string Name;
  size_t Count;

  WordInfo (string name, size_t count)
  {
    Name = name;
    Count = count;
  }
  string ToString ()
  {
    return Name + '\t' + to_string (Count);
  }
};

bool CreateData ();
bool ReadData (vector<WordInfo> &words);
bool DisplayData (vector<WordInfo> &words);
int FindWord (vector<WordInfo> &words, string name);

int main (void)
{
  vector<WordInfo> words;

  if (!CreateData ())
  {
    cerr << "Error creating data.txt\n";
    return EXIT_FAILURE;
  }
  if (!ReadData (words))
  {
    cerr << "Error reading data.txt\n";
    return EXIT_FAILURE;
  }

  DisplayData (words);

  system ("pause");
  return EXIT_SUCCESS;
}

bool CreateData ()
{
  static string names[] = 
  {
    "Yana", "Anna", "Lisa", "Valerie", "Kim", "Anna", 
    "Yana", "Cathy", "Ellie", "Valerie", "Jelena", "Denise"
  };

  ofstream dest ("Data.txt");
  if (!dest)
    return false;
  
  for (string name : names)
  {
    dest << name << '\n';
  }
  return true;
}

bool ReadData (vector<WordInfo> &words)
{
  ifstream src ("Data.txt");
  if (!src)
    return false;

  string line;
  while (getline (src, line))
  {
    int idx = FindWord (words, line);
    if (idx != -1)
    {
      words[idx].Count++;
    }
    else
    {
      WordInfo wi (line, 1);
      words.push_back (wi);
    }
  }
  return true;
}

bool DisplayData (vector<WordInfo> &words)
{
  for (WordInfo wi : words)
  {
    cout << wi.ToString () << '\n';
  }
  return true;
}

int FindWord (vector<WordInfo> &words, string name)
{
  for (size_t i = 0; i < words.size(); i++)
  {
    if (words[i].Name == name)
      return static_cast<int>(i);
  }
  return -1;
}


Maybe you can adapt it to your needs.
WARNING: Not properly tested.
Oh dang! I have not gotten to Data structures either. Sorry! I am barely learning about reading to and from files.

I am looking through your code to see if I understand anything yet, though. I am definitely trying to see if I can use any of this in my code, and adapt it. Like you said.

I see you using WordInfo as a data type of sorts? So you're creating a vector of WordInfo.
That struct WordInfo returns the name with a tab that has count converted to strings. I still don't really get it, lol. I'm sorry. I just don't understand structures right now. I really do appreciate you putting something like this together. I am going to try and look at your code later to see if I could understand some of it to use later on in my code.

My professor wants us to use vectors to get the team with the most wins. I think he put two vectors in the function argument.
Yes a struct is a data type
http://www.cplusplus.com/doc/tutorial/structures/

Here is an example with two vectors:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
#include <fstream>
#include <iostream>
#include <string>
#include <vector>

using namespace std;


bool CreateData ();
bool ReadData (vector<string> &words, vector<int> &wordCount);
void DisplayData (vector<string> &words, vector<int> &wordCount);
int FindWord (vector<string> &words, const string& line);
void Error (const string msg);

const string FILE_NAME = "Data.txt";

int main (void)
{
  vector<string> words;
  vector<int> wordCount;

  if (!CreateData ())
    Error ("\a\nCouldn't create Data.txt");

  if (!ReadData (words, wordCount))
    Error ("\a\nCouldn't read Data.txt");

  DisplayData (words, wordCount);

  system ("pause");
  return EXIT_SUCCESS;
}

bool CreateData ()
{
  static string names[] = 
  {
    "Yana", "Anna", "Lisa", "Valerie", "Kim", "Anna", 
    "Yana", "Cathy", "Ellie", "Valerie", "Jelena", "Denise"
  };

  ofstream dest (FILE_NAME.c_str());
  if (!dest)
    return false;
  
  for (string name : names)
  {
    dest << name << '\n';
  }
  return true;
}

bool ReadData (vector<string> &words, vector<int> &wordCount)
{
  ifstream src (FILE_NAME.c_str ());
  if (!src)
    return false;

  string line;
  while (getline (src, line))
  {
    int idx = FindWord (words, line);
    if (idx != -1) // word in the vector already
    {
      wordCount[idx]++;
    }
    else // need to create a new entry for the word and the count
    {
      words.push_back (line);
      wordCount.push_back (1);
    }
  }
  return true;
}

void DisplayData (vector<string> &words, vector<int> &wordCount)
{
  if (words.size () != wordCount.size ())
    Error ("Size of words and wordCount are not equal");

  for (size_t i = 0; i < words.size(); i++)
  {
    cout << words[i] << '\t' << wordCount[i] << '\n';
  }
}

int FindWord (vector<string> &words, const string& name)
{
  for (size_t i = 0; i < words.size(); i++)
  {
    if (words[i] == name) // found
      return static_cast<int>(i); // return index of name in words
  }
  return -1; // not found
}

void Error (const string msg)
{
  cerr << "Error creating data.txt\n";
  system ("pause");
  exit (EXIT_FAILURE);
}
move(vec1.begin(), it, std::back_inserter(vec2));

Does that "move" imply that you can use algorithms with the vector?
1
2
3
4
5
6
7
8
9
10
11
12
std::vector<std::string> winners;
// fill the list of winners

std::sort( std::begin(winners), std::end(winners) );

std::vector<std::string> teams( winners.size() );
auto tend = std::unique_copy( std::begin(winners), std::end(winners), std::begin(teams) );
teams.resize( std::distance( std::begin(teams), tend) );

for ( const auto & team : teams ) {
  std::cout << team << " won " << std::count( std::begin(winners), std::end(winners), team ) << '\n';
}
> My professor wants us to use vectors to get the team with the most wins.
> I think he put two vectors in the function argument.

We don't really need a second vector. We can do this in O(N log N) time, O(1) space

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// there may be more than one team with the most number of wins (multi-modaL)
// this returns one of those teams
std::string a_team_with_most_wins( std::vector<std::string> team_list )
{
    if( !team_list.empty() )
    {
        std::sort( std::begin(team_list), std::end(team_list) );

        std::size_t num_most_wins = 0 ;
        std::string team_with_most_wins ;

        auto iter = std::begin(team_list) ;
        while( iter != std::end(team_list) )
        {
            // http://en.cppreference.com/w/cpp/algorithm/upper_bound
            auto next = std::upper_bound( iter, std::end(team_list), *iter ) ;
            const std::size_t cnt = next - iter ;

            if( cnt > num_most_wins )
            {
                num_most_wins = cnt ;
                team_with_most_wins = *iter ;
            }

            iter = next ;
        }

        return team_with_most_wins ;
    }

    else return {} ; // return an empty string
}
Last edited on
@Thomas1965 Thanks, man! That actually helped a lot. I talked to my professor some more and looked at your code! It makes a whole lot more sense, and is way simpler than I thought it to be. Your ReadDate() is what helped me out quite a bit. That's what I was trying to understand. I didn't know how to get a single string out, then get how many times it showed up in the file. So, there would be just as many elements in the words vector as there are in the wordcount vector. I finally got it. I'll post my final code once I actually write it all down.

I really appreciate you taking the time out to write that code for me. Thanks, man!

@keskiverto I used that move algorithm because I looked up online how to store elements from one vector into another. I thought it would help, except for I put it in without understanding what it meant. I haven't gotten to learning about algorithms just yet. I do like looking at all this new code, though. It excites me to think I'm gonna be learning all this soon! Thanks, man! I appreciate it!

@JLBorges Are you using pointers in your code? I actually haven't gotten to those yet. I think I understand most of your code, though. I'm just not used to seeing so many "std::"'s, because I've always used the namespace std. Your function looks extremely simple, though. Which I only hope to be able to do later on. I really appreciate you helping out! Thanks, man!

Thanks, you guys! I'll put up my finished code once I've written it all out, in case someone ever wants to come back to look at some possible solutions for these type of problems!
> Are you using pointers in your code?

Using iterators.

The same code, without using iterators:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
std::string a_team_with_most_wins( std::vector<std::string> team_list )
{
    const std::size_t N = team_list.size() ;

    if( N != 0 )
    {
        // sort the sequence
        std::sort( std::begin(team_list), std::end(team_list) );

        std::size_t num_most_wins = 0 ;
        std::string team_with_most_wins ;

        std::size_t pos = 0 ;
        while( pos < N )
        {
            /////////// upper_bound (using position) /////////
            // find the position of the next element that is greater than team_list[pos].
            std::size_t next = pos ;
            while( next < N && team_list[pos] == team_list[next] ) ++next ;
            ///////////////////////////////////////////////////

            const std::size_t cnt = next - pos ; // cnt of the number of occurrences of team_list[pos]

            if( cnt > num_most_wins )
            {
                num_most_wins = cnt ;
                team_with_most_wins = team_list[pos] ;
            }

            pos = next ;
        }

        return team_with_most_wins ;
    }

    else return {} ; // return an empty string
}
Topic archived. No new replies allowed.