Reading and storing .csv values using data streams

Hi everyone,

I'm trying to code a program which requires opening and reading a .csv file using datastreams and storing the words in a string array. The thing is that my .csv file has the words comma separated (organised into lists) and the program must read each line one by one. Also the user must be able to select the file that must be read by the program. This is what I've got so far but I am not sure how to go on from here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <iostream>
#include <iomanip>
#include <fstream>

using namespace std;

int main(
{
    int k = 0;
    // Open Stream
    ifstream myFile("hello.csv");
    if(!myFile){
        cout << "Error opening file" << endl;
        return 0;
    }
    // transfer data
    string x[2000] = 0;
    while (!myFile.eof()){
    myFile >> x[k];
    cout << x[k] << endl;
    }

    //close stream
    myFile.close();
    return 0; 
} 


Please any help would be really appreciated. Thanks.
Last edited on
CSV files are just text, with commas between cell values.

I recommend using the getline but overried the delimiter to comma, see if that is what you need.
I am kind of barbaric with text files and tend to read them into one giant byte buffer and crawl thru that myself. And the only reason why is it works for everything with minor changes, so my core code there stays the same across various throwaway file fixers etc. You can also do getline and find all the commas and bust it up line-wise.

while(eof) type loops are troublesome. be careful or use another approach for that (google it for details).

I recommend using the command line arg for your file name, so you can drag and drop onto your program. But you can also do cin>>name ... myfile(name) instead of a hard coded file name.
Last edited on
Hello jefazo92,

It helps if you compile the program before you post it. You would have found errors that you could have corrected or asked about.

You program used an input file. Post the input file or at least a fair sample so everyone knows what to work with and exactly how it is laid out.

You are missing the closing parentheses of "main". And on line 17, What type of string is it?

As jonnin is saying while (!myFile.eof()) does not work the way you are thinking. Line 18 enters the loop because "eof" is not set yet. line 19 reads the last available entry and line 20 processes it. Going back to the loop condition "eof" is not set yet so you enter the loop. line 19 tries to read past "eof" and can not, so "eof" is set on the stream, but you still process line 20 with the last information in the array. Only then after you have processed the last read twice does the while condition fail.

One way to use this could be
1
2
3
4
5
6
7
8
myFile >> x[k];

while (!myFile.eof())
{
	std::cout << x[k++] << std::endl;

        myFile >> x[k];
}

With the the read at the end of the loop and the first read before the loop. The while condition will fail at the correct time.

Not knowing what the input file looks like there is another potential problem that the formatted input is likely to cause.

The following code should give you an ides for improvements.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <iomanip>
#include <string>

#include <fstream>

int main() // <--- Missing closing ).
{
	int k = 0;

	// Open Stream
	std::ifstream myFile("hello.csv");

	if (!myFile)
	{
		std::cout << "Error opening file" << std::endl;
		return 0;
	}

	// transfer data
	std::string x[2000];

	while (std::getline(myFile,x[k], ','))
	{
		std::cout << x[k++] << std::endl;
	}

	//close stream
	myFile.close();

	return 0;
}


One last thing I noticed is that "k" is defined and set to zero, but its value never changes. So, everything that you might read goes into element x[0] leaving 1999 unused elements. Seems a waste right now.

Also a few well placed blank lines really help to make it easier to read the code.

Hope that helps,

Andy
I am kind of barbaric with text files and tend to read them into one giant byte buffer and crawl thru that myself.


Question about this: How large of a buffer do you allocate for this? I, too, like to put everything into one buffer that I can split, parse and format to my liking.

In C, I usually allocate the buffer based on the size of the file.
Last edited on
Thank you for your answers. I managed to do it using getline and stringstream and, jonnin and Hello Andy's suggestions.
Last edited on

Question about this: How large of a buffer do you allocate for this? I, too, like to put everything into one buffer that I can split, parse and format to my liking.

how much ram you got?
I *routinely* handle 4-6GB xml files with this technique on my fairly crappy work laptop. I don't have anything bigger so I haven't tried. It works so long as you have that much ram in a solid block open to borrow. I think one of our servers has over a TB ram …
Last edited on
Ok, the OP has signed off that this is done, and the thread is officially hijacked ;)

We're talking about large memory buffers for reading text files, and I'm in....but....there's a catch.

It's quite a waste.

There's a better way, @jonnin & @talos, and it's really fast.

I map the file to memory.

Now, there's a way to do that on Windows, and it's nothing like the way it's done on *Nix, but if you use Boost's interprocess library (or, alternatively, their iostreams library), they offer portable memory mapping that works.

Basically, the file is "opened" (not read), and mapped. What you get back is a pointer. If you map the entire file, that pointer will rip through the entire file as if it were already read into RAM (it's backed by the virtual memory system of the OS).

What is NOT happening is reading data and copying that into an allocated buffer (that's actually double work happening behind the scenes when you read into a buffer you allocate - usually processing fairly large blocks).

Benchmark this technique and you'll never, ever, even consider allocating a buffer to read the file in again.

I seriously recommend it if you're already going that way.

Works for writing, too, with a catch. You have to manually configure the output file size for the technique (and that's a winding tale, depending on your requirements, not worth the text here).

Topic archived. No new replies allowed.