fstream and .csv files

Pages: 12
Hello everyone, I am having quite a problem with a program I am coding. I need this program to read through a .csv file and count how many rows there are, and also find a maximum value in a specific column..I was able to get the row count, but I can not find a way to find the maximum value down a column. For example, let's say this is the .csv:

123, A, 4,
432, B, 2,
321, C, 5

How can I get the program to read down the third column and find that 5 is the maximum value? Any help would be appreciated..

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main()
{
	string input_file_name;
	
	cout << "Give name of input file : ";
	cin >> input_file_name;

	ifstream input_stream(input_file_name.c_str());
	
	string buffer;
	int rows = 0;

	while(!input_stream.eof())
	{
		getline(input_stream,buffer,'\n');
		if(!input_stream.eof())
		rows = rows + 1;
	}
	cout << "Counted " << rows << " rows." << endl;
	
	input_stream.close();

	return 0;
}
You need to write a function to split based on comma from your variable "string buffer" to get the 3rd column value and then put into say a vector<int>. Then after finish reading the file, vector<int> would have all the 3rd column values inside.

Use C++ STL max_element function to get the maximum value from the vector<int>.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

	#include <vector>
	#include <algorithm>

	vector<int> vec;

	while(!input_stream.eof())
	{
		getline(input_stream,buffer,'\n');

		// a new function to split buffer
		vec.push_back(<third column value>);

		if(!input_stream.eof())
		rows = rows + 1;
	}
	cout << "Counted " << rows << " rows." << endl;
	
	input_stream.close();

	cout << "The largest element is " << *max_element(vec.begin(),vec.end()) << "\n";
Last edited on
Thank you for the quick reply. I am having a bit of trouble understanding since I am a beginner. Is there a way to split the file based on the commas and then store it as an array? It is because there is also another step I need the program to do after finding the maximum value in the third column and I am not sure if this vector method will work. I need it to output the maximum value in the third column(in this case, it is 5) and then output the number in the first column of the same row(in this case, it is 321).

Are vectors the way to go for this type of program? In the end, I want the program to output(using the example above), "Counted 3 rows, the largest element is 5, and the code is 321." Once again, any further help would be appreciated.

EDIT:
I have read a bit about vectors and it seems like it is the way to go, but I am not sure how to implement it into the program. Would pointers be useful in this case?
Last edited on
vector is part of C++ STL which aims to shorten our development time in building data structures. If you are a C programmer you may need to build your own dynamic growing array etc. Learn C++ STL it will worth your while and it is part of C++ standard so it is cross-C++ compilers.

Yes you need to write a function to split base on comma. Search internet and there are ready solution for this comma function.


"Counted 3 rows, the largest element is 5, and the code is 321."


You also need to display the first column code besides the third column value. Then you need amend in this manner

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

class data {
public :
  int m_code;
  int m_value;
  data(int code, int value) { m_code = code; m_value = value; }
};

class CompareValue {
  public:  
  bool operator()(const data& o1, const data& o2) {    
      return o1.m_value > o2.m_value;	
  }
};

vector<data> vec;
...
vec,push_back(data(<first col>,<third col>));
...
cout << "The largest element is " << *max_element(vec.begin(),vec.end(),CompareValue()) << "\n";

Last edited on
Don't loop against EOF.

Simple CSV:
http://www.cplusplus.com/forum/general/17771/#msg89650

Good luck!
Your while loop can be simplified to:
1
2
3
4
	while(getline(input_stream,buffer,'\n'))
	{
		rows = rows + 1;
	}

By putting the getline() function into the while condition you ensure that the while block is only executed if the read was successful. It will terminate on EOF.
Alright, I have researched as much as I could about this and read through the posts and through the links provided, but I am still having trouble understanding. I've even watched youtube videos to see if I could get another angle but this scenario seems to be a tricky one..I come from a background in Python, which is what I've learned last semester in college...This is my first semester learning C++ and it is definitely a pain..

I see that I need to use the getline function to get each row from the .csv file and store it into a string. Then, if I'm not mistaken, I must use stringstream to pull out the comma's. So will I have a bunch of strings(depending on how many rows there are in the .csv file) that I need to use stringstream on? And if so, will those strings be values separated by spaces?

P.S. I might sound like a total beginner, but please bear with me...Once I have that "ah-ha!" moment, I absorb things quickly..
Anyone have any ideas? I know there are some examples posted above, and I could probably just copy them over, but I would rather learn why it's coded the way it is instead of just copying it...
Once you have a whole line in buffer, putting it into a std::istringstream lets you read its contents as an input stream.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
std::istringstream iss(buffer);

// temps to stor the column data
int col_1; 
std::string col_2;
int col_3;

char comma; // temp to read in the comma separator

iss >> col_1; // read first column
iss >> comma; // read the first comma

std::getline(iss, col_2, ','); // read second column AND the second comma

iss >> col_3; // read the third column. 


I used std::getline() to read the second column because it looks like a string and if there are spaces in the string it would be treated as separate reads if I used >>. However by your daya it could be just a single char so in that case you could make col_2 a char and read it like the other values.

Last edited on
For your example:
123, A, 4
432, B, 2
321, C, 5
(Terminating commas removed)

It seems likely that you want to have a special class to handle each record (or line) of data. For example (assuming your data represents a role-playing type character):

1
2
3
4
5
6
struct player_t
  {
  unsigned hit_points;
  char     character_class;
  int      spells_learned;
  };

Next you need an overloaded input operator to read a single record, or, for these examples, a "player_t":

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
std::istream& operator >> ( std::istream& ins, player_t& player )
  {
  // First, get the entire line from the input stream.
  std::string raw_csv_data;
  std::getline( ins, raw_csv_data );

  // Now we'll separate out the pieces of the line
  // For each item, we'll read the proper datum,
  // then skip over the comma separating it from the next.
  std::istringstream iss( raw_csv_data );
  try
    {
    iss >> player.hit_points;
    iss >> std::ws;
    if (iss.get() != ',') throw 0;

    iss >> player.character_class;
    iss >> std::ws;
    if (iss.get() != ',') throw 0;

    iss >> player.spells_learned;

    // At this point, the input line should be exhausted. If the
    // line is in any other state, that's an error we need to perpetuate
    // back to the caller.
    if (!iss.eof())
      {
      ins.setstate( iss.getstate() );
      }
    }
  catch (int)
    {
    // If we get here, something went wrong trying to skip commas...
    ins.setstate( std::ios::failbit );
    }

  // That's it for this record. As always, return the caller's input stream.
  return ins;
  }

That was, fortunately, the most difficult part. Next it is easy. To read an entire file of records, simply make yourself a type and an overloaded operator to do it:

1
2
3
typedef std::vector <player_t> player_list_t;
// That's a list of players, of course.
1
2
3
4
5
6
7
8
9
10
11
std::istream& operator >> ( std::istream& ins, player_list_t& player_list )
  {
  player_t player;

  while (ins >> player)
    {
    player_list.push_back( player );
    }

  return ins;
  }

Load your file normally:

1
2
3
4
5
6
7
8
  player_list_t all_players;

  std::ifstream f( input_file_name.c_str() );
  f >> all_players;
  if (!f) fooey();  // (always check to make sure nothing went wrong)
  f.close();

  std::cout << "There are " << all_players.size() << " player characters in the file.\n";

The final part of your question, finding the maximum value, is a little involved, but straightforward. You need predicate functions -- those that return true or false given arguments -- that compares two records ("player_t"s, or whatever you've got) for the specific column.

For example, a predicate that can sort the second column (the player's character class) is simple enough. The predicate should be the same as the < comparison operator.

1
2
3
4
bool player_character_class_less_than( const player_t& lhs, const player_t& rhs )
  {
  return lhs.character_class < rhs.character_class;
  }

With this predicate, you can use an <algorithm> to either sort all the players by their character class or find the lowest or largest character class:

1
2
3
4
5
6
7
8
9
10
11
12
13
  unsigned index = std::distance(
    std::max_element(
      all_players.begin(),
      all_players.end(),
      player_character_class_less_than
      ),
    all_players.begin()
    );

  std::cout
    << "The player with the highest character class is number "
    << (index + 1)  // Computers start counting at zero, but humans start at one.
    << ".\n";

What that did is essentially find the character with 'C' (given the example data), convert it (via the distance() algorithm, found by #including <iterator>) to an index into the player list, and display it to the user in a human-friendly way.

Hope this helps. Remember, I just typed all this in (I have not compiled it to test against stupid errors). I may have mis-typed something. Good luck!
Thank you both for the help. Thank you especially for the detailed explanations. I read through it a couple times and still had some brain farts, but I will study it more in depth. Hopefully I won't have any questions since I know it must be frustrating to help out a beginner, but if I do, I hope you don't mind if I post them up here. Thanks again!
Ok, I've stayed in during this holiday weekend(Labor Day) just to finish this project. I've studied all the replies to this thread and I've researched all over the internet for a solution, but to no avail.. I am becoming extremely discouraged at this point. All the replies seem to use different methods, all of which I am having trouble implementing into my program. Although I have learned a lot from you guys so far, my brain is starting to hurt trying to figure this out. This is a last resort for me, but I am going to post up the link to the .csv file and an explanation of what I want this program to do. If anyone is generous enough to help me out here, I will be forever indebted to you. I feel that this is the only way I will learn to solve this specific problem. Know that you will be helping a beginning programmer find his way through this crazy maze known as the C++ language.

The link to the .csv file is available at the National Weather Service website:
http://www.spc.noaa.gov/wcm/data/2009_torn.csv

It is a file containing tornadoes in the US during 2009. It includes a whole bunch of data, but the two that I am most interested in is the magnitude of the tornado(11th column) and the state(8th column) in which it occurred...

I want the program to output "Counted 1182 tornadoes, the strongest tornado of magnitude 4 occurred in TX."

I have not learned classes or structures yet so using those would actually set me back in learning the basics. Ideally, I would want to use iostream, sstream, fstream, and string to solve this problem. I am not sure if vectors are the same as multidimensional arrays, but if they are not the same, I would rather not use vectors yet. Is this problem solvable without classes, structures and vectors? Once again, any help would be very much appreciated..

This is what I have so far, which is almost identical to my original post, but with a bit more efficiency..

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
using namespace std;

int main ()
{	
	string input_file_name;
	cout << "Give name of input file : ";
	cin >> input_file_name;

	ifstream input_stream(input_file_name.c_str());
	
	string buffer;
	int rows = 0;

	while (getline(input_stream, buffer, '\n'))
		rows++;
		

	cout << "Counted " << rows << " tornadoes," << endl;
	
	input_stream.close();

	return 0;
}
Well you are reading each line into buffer. Now you need to convert buffer into an input stream and read up to the 8th and then the 11th column:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
		std::string state;

		// convert buffer to input
		istringstream iss(buffer);

		int i; // column counter

		std::string entry; // read in each entry

		// read in up to 8th column (keeping the 8th)
		for(i = 0; i < 8; ++i)
		{
			std::getline(iss, entry, ',');
		}

		// keep a note incase its the one with max magnitude
		state = entry;


That is how you pick out your data. Continue on reading up to the 11th column and then you have your magnitude data to compare against the highest that you have recorded so far.
Last edited on
Ah, thank you for helping. I understand what you are showing me. I have tried implementing it into my program to test it out, but "entry" doesn't seem to have a value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
using namespace std;

int main ()
{	
	string input_file_name;
	cout << "Give name of input file : ";
	cin >> input_file_name;

	ifstream input_stream(input_file_name.c_str());
	
	string buffer;
	int rows = 0;

	while (getline(input_stream, buffer, '\n'))
		rows++;
		
	string state;
	string entry;
	int i;

	istringstream input_string_stream(buffer);

	
	for(i=0;i<8;++i)
		getline(input_string_stream,entry,'.');

	state=entry;

	cout<<state<<endl;

	cout<<entry<<endl;

	cout << "Counted " << rows << " tornadoes," << endl;
	
	input_stream.close();

	return 0;
}


I have outputted "state" and "entry" to the console to check what value they hold, but it comes out as a blank..

Give name of input file: 2009_torn.csv


Counted 1182 tornadoes,


Any ideas what I may be doing wrong? I have also tried to implement the above code into the while loop, but without success.
This line is wrong:
 
getline(input_string_stream,entry,'.');

It should have a comma ',' not a period '.'
 
getline(input_string_stream,entry,',');


Also lines 21-31 all need to be in the while loop that reads each line from the CSV file. After all you need to do that to every line don't you?

At the moment you read the whole file simply counting the lines and then try to process only the final line read.
Thank you so much Galik...I am understanding the stringstream and getline method much more now. After I picked out the data from the 8th column, it seems that I have to start another for loop using i=3 to get the 11th column. Does the getline method start at the last comma that I ended off at? I have been able to output the state and the magnitude so I know that it is working. Here is what I have so far:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
using namespace std;

int main ()
{	
	string input_file_name;
	cout << "Give name of input file : ";
	cin >> input_file_name;

	ifstream input_stream(input_file_name.c_str());
	
	string buffer;
	int rows = 0;

	while (getline(input_stream, buffer, '\n'))
{
		rows++;
		
		string state;
		string magnitude;
		string entry1;
		string entry2;
		
		int i;

		istringstream input_string_stream(buffer);

	
		for(i=0;i<8;++i)
		{	
			getline(input_string_stream,entry1,',');
		}

		for(i=0;i<3;++i)
		{
			getline(input_string_stream,entry2,',');
		}

		state = entry1;
		magnitude = entry2;

	cout<< state <<" "<< magnitude << endl;
}
	cout << "Counted " << rows << " tornadoes," << endl;
	
	input_stream.close();

	return 0;
}


Now, should I store the state and magnitude values into a multidimensional array, or would it be better to store each column into its own array. I am thinking that I can store each into an array and then find the max magnitude value and somehow find which element (mag_array[n]) of the array it was contained in. After finding the element, I can input that into state_array[?] to find the corresponding state. Will this work? Thanks again Galik, your help is very much appreciated.
With the for loops I would simply not reinitialise i to 0 in the second loop. That way its easier to select for column 11:
1
2
3
4
5
6
7
8
9
		for(i=0;i<8;++i)
		{	
			getline(input_string_stream,entry1,',');
		}

		for(;i<11;++i) // note, first for() element is empty, i keeps its previous value
		{
			getline(input_string_stream,entry2,',');
		}
I don't think you need to store anything in an array. You just need to keep track of the highest value. If you read a magnitude value that is higher than the previous maximum, then record the new value in place of the previous one and keep a note of which city code occurred at that point too.

Before doing that you will have to convert your magnitude string into an int value. You can do that like this:
1
2
3
4
std::string entry = "1234"; // example number as a string
int mag; // place to store the integer after conversion

std::istringstream(entry) >> mag; // convert the entry to an istream and read it into an int 

Now mag contains an integer. You can compare it against the previous highest value:
 
if(mag > max_mag) { max_mag = mag; }


That way max_mag always contains the highest magnitude found so far.
Last edited on
Galik, thank you so much for the help. I have read and absorbed all you have taught me so far. The program is working for the most part. I have made it look a little bit nicer, I think. Here is the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
using namespace std;

int main ()
{	
	string input_file_name;
	cout << "Give name of input file : ";
	cin >> input_file_name;

	ifstream input_stream(input_file_name.c_str());
	
	string buffer;
	int rows = 0;
	string state;
	string magnitude;
	string entry1;
	string entry2;
	int int_magnitude;
	int max_magnitude = 0;

	while (getline(input_stream, buffer, '\n'))
	{
		rows++;
		
		istringstream input_string_stream(buffer);

		int i;
	
		for(i=0;i<8;++i)
		{	
			getline(input_string_stream,entry1,',');
		}

		for(;i<11;++i)
		{			
			getline(input_string_stream,entry2,',');
			istringstream(entry2) >> int_magnitude;
		}

		if (int_magnitude > max_magnitude)
		{
			max_magnitude = int_magnitude;
		}		
//		state = entry1;
//		magnitude = entry2;
//
//		cout<< state <<" "<< magnitude <<" "<< max_magnitude << endl;
	}


	cout << "Counted " << rows << " tornadoes," << endl;
	cout << "the strongest tornado of magnitude " << max_magnitude
		<< " occurred in" << endl;
	
	input_stream.close();

	return 0;
}


Give the name of the input file : 2009_torn.csv
Counted 1182 tornadoes,
the strongest tornado of magnitude 4 occurred in


Everything has worked so far, but I am stuck again. How will I find which row the max_magnitude was found in? I think I will need that to find which state the max_magnitude value occurred in. Do I need to add a row count into both of the for() loops? If both columns had a row count, it seems it can work. Maybe if I can have it count how many rows until the max_magnitude value was found, I could use the same count to find the state it occurred in.
The only problem is, how would I have the count stop once it reaches the max_value? Is this idea even plausible in this situation?
Pages: 12