Storing data from 2 columns in a .csv

Pages: 12
Originally posted in beginners with no luck - I have a file that can range from 100 rows to 10000+ that is comma delimited with 8 columns.
The first 32 rows (also comma delimited) will always be ignored (geographical header information).
I will be wanting the data from column2 and column3.

For this I believe I would need (2) For Loops as such:

1
2
3
4
for(i=0;i<2;++i)
{
    getline("do something here");
}

and
1
2
3
4
for (i=0;i<3;++i)
{
    getline("do something here")
}


Also would using a vector or array with dynamic storage be the better way to tackle this problem?

Any help would be appreciated, totally lost on where to start after accessing the file.
Last edited on
so input the first 32 lines with getline just don't do anything with it.

1
2
3
4
5
string trash;
for(int I = 0; I < 32; I++)
{
     getline(input, trash);
}



then use getline with ',' for the delimiter to get the individual colums getline(input, string, ',');

If you don't know how big the file is going to be then I would use vector
Last edited on
See:
http://www.cplusplus.com/reference/string/string/getline/

This may give you an idea:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
fstream fs;
...
string line;
for(i=0;getline(fs, line);++i)
{
  if(i < 32)
    ;
  else
  {
    std::stringstream ss(line);
    string column;
    for(j=1;getline(ss, column, ',');++j)
    {
      if(1 == j)
        ...
      else if(2 == j)
        ...
    }
  }
}

See also:
http://www.cplusplus.com/reference/sstream/stringstream/?kw=stringstream
Thanks Yanson an coder, I'll mess around with what y'all suggested an take a look at those links and will post my progress.
Yea, so I'm totally lost and drawing a blank. This is my current code with the addition you meantioned to start with the first 32 lines. I know coder pasted what should be used with continuing the code but no clue where to go from here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main()
{
    string filename;
    string soundvelocity_2;
    string depth_3;
    string line;
    
    cout << "Enter the SoundVelocity file you want to format: ";
    getline( cin, filename );

    filename += ".csv";
    string path = "C:\\Program Files\\DataLog Express\\";
    string fullname = path + filename;
    ifstream myfile(fullname.c_str());

    for(i=0;getline(fs, line);++i)
{
  if(i < 32)
    ;
  else
  {
    std::stringstream ss(line);
    string column;
    for(j=1;getline(ss, column, ',');++j)
    {
      if(1 == j)
       //do not know what to do
      else if(2 == j)
        //do not know what to do    
     }
  }
} 
    return 0;

}
Ok, since you want the data from column2 and column3:
1
2
3
4
      if(2 == j)
       //do not know what to do
      else if(3 == j)
        //do not know what to do 
Yes: column cotains the respective value. Since you didn't tell what you want with this data I can't tell what to do at this point.

You may cout << "column " << j << column << endl; for the first try

Oh and: fs -> myfile
Sorry, to start with I would like to take the data from those two columns and write it to another .csv file with column3, column2.

I will be doing more with this data once its extracted by extracting only the value I will need that will be buried somewhere down the columns. I am just trying to take this program a step at a time and learn and comprehend exactly what it is doing.

Thanks for your patience.

edit:could you show me how to store it in a vector so I can process it even more in the future?

I'll also work on rewording and going more in depth with what I am trying to do so you can have a better idea.
Last edited on
coder,

As stated the first 32 lines will be ignored
1
2
if(i < 32)
    ;

Next i would live to take the information from column2 and column3 and store it in a <vector> since it would be beneficial for dynamic allocation.

Column2 data will have (x) rows before it starts reading a value:
19/08/2013 10:39:47.000,0,0.009,29.621,-0.002,0.014,-4.227,1508.28

Once column2 starts reading a value I would like to start storing data for column2 and column3 at that row++:
19/08/2013 10:51:32.000,1547.122,1.543,29.552,59.068,35.812,22.495,1545.548

Then I want to stop storing once column3 reaches it's maximum value:
19/08/2013 10:58:23.000,1502.544,223.176,12.228,41.002,35.662,28.057,1502.078

Then I want to output the data that was store to a .csv file as column3, column2.

I hope this helps clarify and I appologize for not posting this earlier.

You have string "line" and that line contains values separated by commas?
You want the second and third value?

If the line would contain "foo,bar,gaz,goo", you could find the first comma. Then you could find the first comma that occurs after the first comma. Then you could find the first comma that occurs after the second comma.

You want to write out whatever is between the second and third comma, then an explicit comma, and last whatever is between the first and second comma.

std::string has members find() and substr().
vectors are pretty straightforward:
http://www.cplusplus.com/reference/vector/vector/

In your case:
1
2
3
4
5
6
7
8
9
10
vector<string> column2;
vector<string> column3;
...

      if(2 == j)
       column2.push_back(column);
      else if(3 == j)
        column3.push_back(column);

...


To store the columns you may do something like this:
fs << column3[i] << "," << column2[i] << endl;
with using the fs->myfile you had stated earlier where exactly does that go? I was getting a "fs was not declared".
offshoreworker wrote:
with using the fs->myfile you had stated earlier where exactly does that go? I was getting a "fs was not declared".
What I meant was: rename fs as myfile

In my examples I'm just using generic names. Rename them as they suits your needs
Maybe this just isn't for me, still lost. Thanks though for trying to help coder.
I was looking at this thread, I'm still trying to understand the data in the input file. Though I've seen a couple of example lines, I'm not clear on the meaning of this:
Column2 data will have (x) rows before it starts reading a value
and also I don't quite understand this:
Then I want to stop storing once column3 reaches it's maximum value:

What is (x)? And don't you need to read the entire file to determine the maximum.

Perhaps it would be possible for you to make a sample of the data file available, maybe upload it to some file-sharing service such as dropbox or wherever you prefer.
Good questions from Chervil.

While waiting for real answer, lets assume:
1. col2 == 0, until it becomes interesting
2. col3 increases monotonously, while interesting.
Therefore:
cin.ignore 32 times

string line
do
  getline(line)
  extract col2 and col3 from line
while col2 is 0

// Now "line" contains the first input line, where col2 is not 0.
append pair( col2 and col3 ) to vector of pairs

max = col3
while ( getline(line) )
  extract col2 and col3 from line
  if ( col3 < max ) break
  else
    max = col3
    append pair( col2 and col3 ) to vector of pairs

Note "append to vector" could be "write to file".
Last edited on
Chervil,

You can see the full data set:
http://www.fileconvoy.com/dfl.php?id=g03ca77f1080855f2999393789d93d53a3920d6369

In regards to Column2 data will have (x) rows before it starts reading a value

Column2 starts on line 33. It contains a value of 0 until line 738.

Column3 reaches its maximum value at line 1143.

Therefore I am wanting to extract the data from line 738-1143 and output column3, column2.
Just do this:
for(i=0;getline(myfile, line);++i)
on line 22
Ok, the col3 is not monotonous. You have to read all the rest and keep in memory rather than writing out directly.
max = col3; count = 1;
while ( getline(line) )
  extract col2 and col3 from line
  append pair( col2 and col3 ) to vector of pairs
  if ( max < col3 ) max =  col3, count = vector.size()

print count elements from vector
My version. It may be flawed, just consider it a first attempt.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#include <iostream>
#include <fstream>
#include <iomanip>
#include <cmath>
#include <string>
#include <sstream>
#include <vector>

    using namespace std;

struct values {
    double num2;
    double num3;
};

double toNumber(const std::string & str);

int main()
{

    ifstream fin("SV Cast_Test_Data.csv");

    values data;
    vector<values> dataVec;


    string line;
    for (int i=0; i<32;i++)
        getline(fin, line);
        

    string col2,col3, dummy;
    double num2, num3;

    const char comma = ',';
    const char tab  = '\t';

    bool ignore_zero = true;

    while (getline(fin, line))
    {
        istringstream ss(line);
        getline(ss, dummy, comma);
        getline(ss, col2, comma);
        getline(ss, col3, comma);
        num2 = toNumber(col2);
        num3 = toNumber(col3);
        if (ignore_zero)
        {
            if (num2 != 0.0)
                ignore_zero = false;
        }

        if (!ignore_zero)
        {
            data.num2 = num2;
            data.num3 = num3;
            dataVec.push_back(data);
        }
    }

    // find col3 max
    double max3 = 0;
    int index = 0;
    for (unsigned int i=0; i<dataVec.size(); i++)
    {
        if (dataVec[i].num3 >= max3)
        {
            max3 = dataVec[i].num3;
            index = i;
        }
    }

    // create output file
    ofstream fout("output.txt");

    for (int i=0; i<= index; i++)
    {
        fout << dataVec[i].num3 << comma << dataVec[i].num2 << endl;
    }

    return 0;
}

double toNumber(const std::string & str)
{
    istringstream ss(str);
    double num = 0;
    ss >> num;
    return num;
}
Last edited on
Chervil,

That does exactly what I have been trying to wrap my mind around for the past 2 days or so. I greatly appreciate it. You have no idea how much time this will save me in the future. Look at it this way, this file only went down to a depth of 223.309 meters when we do jobs all the way to 4572 meters. This method will work way faster then manually going through an extracting the data I need.

If you couldn't tell I am a beginner, I typed every bit of code and was able to understand and follow what you were doing with the data as I typed it.

Thanks again. I will continue trying to learn an build upon.
Pages: 12