Please help with finding size of array from CSV file (unknown # of rows and columns)

Hello,

I have a CSV file with a large array of data and would like find the number of rows and columns that data has.

In my code now the dimensions max_columns and max_rows are already known, but I need to program a general case considering them unknown. Thank you in advance to anyone who can help me with this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#include <vector>
#include <iostream> 
#include <fstream>
#include <string>
#include <sstream>
using namespace std;

const int max_columns = 41;
const int max_rows = 24;	// Excludes header row

bool parse_header_row(const string & line, vector<string> &labels)
{
	stringstream ss(line);
	for (int col = 0; col < max_columns; ++col)
	{
		string lbl;
		if (!getline(ss, lbl, ';'))
			return false;  // some error 
		labels.push_back(lbl);
	}
	return true;
}

bool parse_data_row(const string & line, double data[max_columns])
{
	stringstream iss(line);
	string temp;

	for (int col = 0; col < max_columns; ++col)
	{  // get one column
		if (!getline(iss, temp, ';'))
			return false;  // some error 
		stringstream convertor(temp);
		convertor >> data[col];
	}
	return true;
}

int main()
{
	vector<string> labels;
	double data[max_rows][max_columns];
	ifstream file("C:\\file.csv");
	string line;

	if (!getline(file, line))
	{   // handle error 
		return 0;
	}
	if (!parse_header_row(line, labels))
	{	// handle error 
		return 0;
	}
	// Now handle the data rows
	for (int row = 0; row < max_rows; ++row)
	{
		if (!getline(file, line))
		{	// handle the error
			return 0;
		}
		if (!parse_data_row(line, data[row]))
		{	// handle the error 
			return 0;
		}
	}

        cin.get();
	return 0;

}
Last edited on
The classic way to deal with this is a vector of vectors.
 
  vector<vector<double>> data;


In practice, I would create a type for a row (other than the header row which we've dealt with).
1
2
typedef vector<double>  ROW;
vector<ROW> vdata;

Now, you can create ROWs as needed.
1
2
3
4
5
6
7
  ROW row;  // not to be confused with your index variable named row
  while (! end_of_line)
  {  val = parse_number(...);
      row.push_back (val);
   }
  //  Now add the row to the vector of rows
  vdata.push_back (row);



Last edited on
Can your 'data' variable be made into a vector?
The classic way to deal with this is a vector of vectors.


Thanks Abstraction, but I think I am too inexperienced in programming to fully understand the logic. I tried to simply copy/paste this info and run it, but received some errors. I assume I would have to make changes to this code before inserting it.

As an end result, I would like to display the number of rows and columns with cout, just in case that changes anything.
I tried to simply copy/paste this info and run it, but received some errors

I just gave you a high level. Since you seemed okay with the vector<string> we used for the header row, I assumed you could extend the idea.

This snippet was just to give you the idea. vector<vector<double>> data; A vector<double> is an arbitrary length array of doubles. This represents a "row". You can push as many doubles on to the vector as you have columns in the row.

typedef vector<double> ROW;
Here we created a new data type called ROW. A ROW is simply an array of doubles as I just explained. vectors of vectors can be confusing to think about, but a vector of ROWs is simply an arbitrary length array of ROW objects.

vector<ROW> vdata;
Here we create a vector of ROWs called vdata. As we read lines from the file, we create a temporary ROW object and push each column (double) from the line onto the temporary row, then at the end of line we push the temp row onto the vdata vector.

Finding the number of rows is simply:
1
2
 
  num_rows = vdata.size();


Finding the number of columns depends on whether the rows are all the same length, or if they can vary in length. If the rows all have the same number of coulmns, then you can get the number of columns from the lablels vector:
 
  num_cols = labels.size();

If the rows have different number of columns, then you need to iterate through vdata, finding the row with the largest size.







Thanks Abstraction for explaining what this all means piece by piece, it definitely helps clear some things up, but I'm still a little not sure how to use them all together and write everything using the correct format into the code I posted above. Would I need refer back to the vector<string> we used for the header file to create a working program? I'm sure this has all been explained clear enough to know by now, but I'm still struggling to grasp the complete picture.
Would I need refer back to the vector<string> we used for the header file to create a working program?

If you want to use the header labels, then yes.

I'm still a little not sure how to use them all together

This should help. Minor modifications to previous code. Note that the fixed dimensions of the data array are gone.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <vector>
#include <iostream> 
#include <fstream>
#include <string>
#include <sstream>
using namespace std;

typedef vector<double> ROW;

bool parse_header_row(const string & line, vector<string> &labels)
{	stringstream ss(line);
	
	while(! ss.eof())
	{	string lbl;

		if (!getline(ss, lbl, ';'))
			return false;  // some error 
		labels.push_back(lbl);
	}
	return true;
}

bool parse_data_row(const string & line, vector<ROW> & vdata)
{	stringstream iss(line);
	string cell;
	double val;
	ROW row;

	while (!iss.eof())
	{   // good status, get one cell
		if (!getline(iss, cell, ';'))
			return false;  // some error 
		stringstream convertor(cell);
		convertor >> val;		// Input double from cell
		row.push_back (val);	// Add value to row
	}
	vdata.push_back (row);		// Add row to vector of rows
	return true;  // end of line
}

int main()
{   vector<string>	labels;
    vector<ROW>    	vdata;
    ifstream file("C:\\file.csv");
    string line;

	if (!getline(file, line))
	{   // handle error 
		return 0;
	}
	if (!parse_header_row(line, labels))
	{	// handle error 
		return 0;
	}
	// Now handle the data rows
	while (getline(file, line)) 
	{	if (!parse_data_row(line, vdata))
		{	// handle the error 
			return 0;
		}
	}

    cin.get();
    return 0;
} 
Thank you once again Abstraction, you are a life saver. I'm going to study this code for awhile to try and understand it completely.

My next task is to pick out certain elements from the array, in this case the last two values from each row. Feel free to post any helpful hints :)
Topic archived. No new replies allowed.