csv file reading

Hello guys I am a novice at C++, however I have this project for school. I am trying to read a CSV file using C++ visual studio and store each value into string and output it to a file. I am using commas as the delimiter but when it reaches ("2,156,624,900",) it doesn't store properly as it reads to the comma and stop example "2 is stored. How could the code be edited to store 2,156,624,900 instead or is there a way I could avoid the first line and store the rest of values with the quotation marks as the delimiter for that value. If there is any questions you could ask. The following is the code I am using.


#include <iostream>
#include <string>
#include <fstream>
#include <math.h>

int main()

{
std::string temporary[12];std::string country;std::string year;std::string sex;std::string age;std::string suicidesAmt;std::string population;
std::string suicidesPer; std::string countryY; std::string HDI; std::string gdpY; std::string gdpC; std::string generation; std::string temp[12];
int row = 1;




std::ifstream master;

master.open("master.csv");
int i = 1;
int row = 1;
if (master.is_open())
{
std::cout << "The file is open thus reading and storing." << std::endl;

while (row < 27822)
{

switch (i % 12)
{


case 1:
{
std::getline(master, temporary[0], ',');
country = country + temporary[0] + '\n';
i++;
break;

}
case 2:
{
std::getline(master, temporary[1], ',');
year = year + temporary[1] + '\n';
i++;
break;

}
case 3:
{
std::getline(master, temporary[2], ',');
sex = sex + temporary[2] + '\n';
i++;
break;

}
case 4:
{
std::getline(master, temporary[3], ',');
age = age + temporary[3] + '\n';
i++;
break;

}
case 5:
{
std::getline(master, temporary[4], ',');
suicidesAmt = suicidesAmt + temporary[4] + '\n';
i++;
break;

}
case 6:
{
std::getline(master, temporary[5], ',');
population = population + temporary[5] + '\n';
i++;
break;

}
case 7:
{
std::getline(master, temporary[6], ',');
suicidesPer = suicidesAmt + temporary[6] + '\n';
i++;
break;
}
case 8:
{
std::getline(master, temporary[7], ',');
countryY = countryY + temporary[7] + '\n';
i++;
break;

}
case 9:
{
std::getline(master, temporary[8], ',');
HDI = HDI + temporary[8] + '\n';
i++;
break;

}
case 10:
{
std::getline(master, temporary[9], ',');
gdpY = gdpY + temporary[9] + '\n';
i++;
break;

}
case 11:
{
std::getline(master, temporary[10], ',');
gdpC = gdpC + temporary[10] + '\n';
i++;
break;

}
case 0:
{
std::getline(master, temporary[11], '\n');
generation = generation + temporary[11] + '\n';
i++;
row++;
break;

}

}


}



std::ofstream write_con;std::ofstream write_yea;std::ofstream write_sex;std::ofstream write_age;std::ofstream write_samt;
std::ofstream write_pop;std::ofstream write_sper;std::ofstream write_conty;std::ofstream write_hdi;std::ofstream write_gdpy;
std::ofstream write_gdpc;std::ofstream write_gen;

write_con.open("country.txt");
write_yea.open("year.txt");
write_sex.open("sex.txt");
write_age.open("age.txt");
write_samt.open("suicides_no.txt");
write_pop.open("population.txt");
write_sper.open("suicides/100k pop.txt");
write_conty.open("country-year.txt");
write_hdi.open("HDI for year.txt");
write_gdpy.open("gdp_for_year($).txt");
write_gdpc.open("gdp_per_capita($).txt");
write_gen.open("generation.txt");

std::cout << "Writing to files:\n";

write_con << country << std::endl;
write_yea << year << std::endl;
write_sex << sex << std::endl;
write_age << age << std::endl;
write_samt << suicidesAmt << std::endl;
write_pop << population << std::endl;
write_sper << suicidesPer << std::endl;
write_conty << countryY << std::endl;
write_hdi << HDI << std::endl;
write_gdpy << gdpY << std::endl;
write_gdpc << gdpC << std::endl;
write_gen << generation << std::endl;


std::cout << "Finished writing values to files\n";

write_con.close();write_yea.close();write_sex.close();write_age.close();write_samt.close();write_pop.close();
write_sper.close();write_conty.close();write_hdi.close();write_gdpy.close();write_gdpc.close();write_gen.close();

}



std::getchar();
std::getchar();
return 0;
}
Last edited on
Here's an idea. The CSV part may be a little over-complicated (?), and it's not perfect, but I think it'll do what you want. I only tested it with three fields and output files. You can add the rest.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <cctype>

const int MaxRow = 27822;

std::ifstream& read(std::ifstream& fin, std::string& field, char delim)
{
    field.clear();
    char ch;
    
    // skip whitespace
    while (fin.get(ch) && ch != delim && std::isspace(ch))
        ;

    if (!fin || ch == delim)  // empty field
        ;

    else if (ch == '"') // field starts with double-quotes
    {
        // read up to next double-quote that isn't preceded by a backslash
        while (fin.get(ch) && ch != '"')
        {
            if (ch == '\\' && !fin.get(ch)) // skip backspaces
                break; // eof
            field.push_back(ch);
        }
        // read up to and including the delim
        while (fin.get(ch) && ch != delim)
            ;
    }

    else
    {
        // read up to the delimiter
        getline(fin, field, delim);
        // insert initial char at front
        field.insert(0, 1, ch);
        // remove whitespace from the end
        while (field.size() && std::isspace(field.back()))
            field.pop_back();
    }

    return fin;
}

int main()
{
    std::ifstream master("master.csv");
    if (!master)
    {
        std::cerr << "Cannot open master.csv\n";
        return 1;
    }

    std::vector<std::string> filenames
    {
        "country.txt",
        "year.txt",
        "sex.txt"
    };

    std::vector<std::ofstream> fout;
    for (const auto& name: filenames)
        fout.push_back(std::ofstream(name));

    for (int row = 1; row < MaxRow; ++row)
    {
        std::string field;
        for (size_t i = 0; i < fout.size() - 1; ++i)
        {
            if (!read(master, field, ',')) break;
            fout[i] << field << '\n';
        }
        if (!read(master, field, '\n')) break;
        fout.back() << field << '\n';
    }
}

Last edited on
Sorry, it is not C++ ready to use, it is an excerpt from a pipe stage I once used on VM/CMS. At least the comments could give you an idea what you should to bear in mind when using CSV seriously:
if ph = '' then ph = 'NixBix'         /* default place holder for "" */
'CALLPIPE (sep % end § name CSVCLN.REXX) *:',   /* ------- in ------ */
  '% change /""""/'ph'/',             /* single inch-sign            */
  '% change /'oc'"""/'oc'"'ph'/',     /* leading inch-sign           */
  '% change 1.3 /"""/"'ph'/',         /* same in fst column          */
  '% change /'oc'""/'oc'/',           /* empty cells                 */
  '% change 1.2 /""//',               /* same in fst column          */
  '% change /""/'ph'/',               /* example: "zizu = 8"" etc."  */
  '% strip trailing' oc,          /* no empty cells at end of record */
  '% xlate 1-* 40 00',                /* Mask the blanks.            */
  '% tokenize /"/ x01',               /* Tokenize and delimit.       */
  '%q:outside /"/ /"/',               /* branch quoted values        */
  '% xlate 1-*' oc nc,                /* replace old by new sep-char */
  '%f:faninany',                      /* collect all parts of record */
  '% deblock linend 01 terminate',    /* Re-form original records.   */
  '% change /'ph'/"/',                /* place holder to inch-sign   */
  '% xlate 1-* 00 40',                /* Unmask the blanks.          */
  '%*:',                              /* ----------- out ----------- */
  '§q:',                              /* from OUTSIDE                */
  '% nfind "' !!,                     /* Get rid of the quotes.      */
  '%f:'                               /* to FANINAY                  */
failure:; error: exit (RC * (RC ^= 12 & RC ^= 8))   /* RC = 0 if EOF */

Being myself a beginner in C++ I am not yet in the position to quickly redo this on the PC without piping.
BTW, CSV listings are used for long term archiving data. Even if old tools won't work any more on future platforms, it is quite simple to program new interpreters for it. Now prove it! ;)
In this RFC it says that a double-quote inside double-quotes should be escaped by another double-quote. In the program I posted above I escaped it with a backslash.

https://tools.ietf.org/html/rfc4180

At any rate, if I wanted to parse a CSV file seriously I would use a library (boost must have something).
The above excerpt from my pipe stage is now 20 years old. I was not aware of the RFC you mention, I was driven by the need to get useful data out of the tables I got from subcontractors and internal collaborators. With useful I mean useful for me (and the tools I used on the mainframe).
Parsing a CSV is a job for a finite state machine. Using an FSM makes the code really easy to write.

What's with the funky switch statement? Just read each token in the line one after the next.
Topic archived. No new replies allowed.