Reading an CSV file...

Hey guys, been trying to figure this out for days now with no luck. I am creating a program that needs to read a line of an csv file into a vector of strings. Then each of these vectors are put into a vector to contain them all in the same place. I am struggling right now with getting the input into the right strings. The CSV file will look something like this:

"000","01","01","01",Continuityr1", "20/200/2000ohm (Auto-RANGE)","0.30ohm","this is a good test"
"002","01","01","01",Continuityr1", "Insulation 500V L/N", "20/200/2000Mohm (Auto-RANGE)","9.69Mohm","bla bla bla"

I will add code to get rid of the quotation marks once I can get each one individually inside the vector.

The code I have right now is basic and I have tried multiple different things. But because there is no comma on the end of the last input of the line then it adds the first of the next too the same input. So I've tried figuring out how to split them but I'm really unsure on what I'm doing, as I said this has had me stuck for a few days now!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
while (std::getline(csvFile, currentLine, ','))
			{
				int quotationMarks = 0;

				// If currentLine is abnormally large
				if (currentLine.size() > 6)
				{
					for (int a = 0; a < currentLine.size(); a++)
					{
						if (currentLine[a] == '"')
						{
							quotationMarks++;
						}
					}

					if (quotationMarks > 2)
					{
						// Get the rest of the line
						std::getline(csvFile, currentLine, '\n');

						// Pushback full vector onto the main vector
						vectorOfInputVectors.push_back(tempVector);
						tempVector.clear();
					}
				}

				// Pushback word onto temp vector
				tempVector.push_back(currentLine);
			}


With this the output is pretty messed up...
http://imgur.com/a/pkgIg

I have set it to print each string individually and then leave a line then move onto the next vector. I have no clue where to go from here... Anyone's help would be appreciated massively.
The first thing I would recommend is that you use a stringstream to parse each line, then you won't have the problems of the input wrapping to the next line. Also Instead of a vector of strings you may want to consider a vector of struct, where the structure has variables for every field of the line.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
...
std::string strip_quotes(std::string field);
...
    // Process one complete line at a time.
    while(std::getline(csvFile, currentLine))
    {
        // Use a stringstream to parse that one line.
        stringstream sin(currentLine);

        // Now you can get one field at time.
        string field;
        while(getline(sin, field, ','))
            tempVector.push_back(strip_quotes(field));
    }


The strip_quotes() function would strip the quotes from the string.

Hello DylanMorganx,

The first part of your problem resides in the data file. As a "CSV" file commas separate each field. There is no need for any spaces after a comma. you have three extra spaces in your file that will cause a problem. I believe the next concept of a "CSV" file is that each line is the same. Your file has eight fields in the first line and nine fields in the second line. They should be the same, e.g., line one shoukd have an extra comma before "20/200/2000ohm (Auto-RANGE)". Even though it represents an empty field it makes the program easier to write because it is consistent.

jlb has a good option of how to deal with the quotes. For his function std::string strip_quotes(std::string field); you might consider this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <sstream>

while (std::getline(inFile, line))
{
	int quotationMarks = 0;
	std::istringstream ss(line);

	for (size_t lp = 0; lp < 9; lp++)
	{
		if (lp < 8)  // <--- For the first eight fields.
		{
			std::getline(ss, currentLine, ',');
			if (!(currentLine.empty()))
			{
				currentLine.erase(0, 1);
				currentLine.pop_back();
			}
			tempVector.push_back(currentLine);
		}
		else
		{
			std::getline(ss, currentLine);  // <--- Reads the last field and deals with the \n.
			currentLine.erase(0, 1);
			currentLine.pop_back();
			tempVector.push_back(currentLine);
		}
		std::cout << "\n " << currentLine << std::endl;  // <--- Used for testing.
	}
}

This code could be shortened. It is what I come up with quickly. This is based on the data file having the same number of fields and no extra spaces.

Hope that helps,

Andy
std::getline(ss, currentLine); // <--- Reads the last field and deals with the \n.

The primary reason for reading an entire line into a string then using a stringstream to parse that line is to eliminate the need to handle the last item differently. And normally you would keep each line separate, in this case probably in a vector<vector<string>>.

And remember that a line can end with a comma and if the line does end with a comma it usually means that the last field is empty.

The "input" file also seems to be missing a couple of quotation marks as well. And note in "normal" CSV files quotation marks are "normally" used to denote strings, especially when the strings could contain the delimiter.








Hello @jlb

Thanks for the input.

And note in "normal" CSV files quotation marks are "normally" used to denote strings, especially when the strings could contain the delimiter.

This is the first I have seen quotes used in a "CSV" file and have not had to deal with them before.

Still a bit new to using string streams, I see I have some learning to do. My code seemed right at the time. I will try some work and testing.

Andy
Hey guys, sorry I forgot to mention this... The CSV file is the output of a machine. Therefore, I can't change the CSV file itself. So I'll have to figure out how to get around the problems within the file. This program I am writing for a friend to easily convert the CSV file from the machine into a sorted table.

The first thing I would recommend is that you use a stringstream to parse each line, then you won't have the problems of the input wrapping to the next line. Also Instead of a vector of strings you may want to consider a vector of struct, where the structure has variables for every field of the line.


I added the code you have and it works perfectly. I understand how it works now as well so thank you for the help :) As for the struct, I don't think this is necessary for this as I just need to sort through the information given. I may end up changing to a std::list later on in the development.

The first part of your problem resides in the data file. As a "CSV" file commas separate each field. There is no need for any spaces after a comma. you have three extra spaces in your file that will cause a problem. I believe the next concept of a "CSV" file is that each line is the same. Your file has eight fields in the first line and nine fields in the second line. They should be the same, e.g., line one shoukd have an extra comma before "20/200/2000ohm (Auto-RANGE)". Even though it represents an empty field it makes the program easier to write because it is consistent.


As for the CSV file I can't do anything about that sadly. Thank you for the suggested code for the quotation marks, I think I have a rough idea on how I want it to work anyway :)
Last edited on
I may end up changing to a std::list later on in the development.

Unless you profile your code I would stick with a std::vector. The std::vector is usually the best container for most jobs.

Topic archived. No new replies allowed.