Read data from CSV files into array

For those familiar with R, I basically want to mirror SOME of the dataframe functionality in C++. In a nutshell:

1. I want to be able to read data from a CSV file and store it in a 2D array.
2. The number of columns or rows isn't determined beforehand.
3. (Visualizing the array as a table) I want to be able to add/edit/remove columns and rows from the 2D array.

I would be very grateful for any guidance in this regard. I have tried searching for a library that does so, but no luck.
Not familiar with R, but a vector of vectors should give you what you want:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <vector> 
using namespace std;

template <typename T> 
class Row : public vector<T> 
{};

template <typename T> 
class Matrix : public vector<Row<T>>
{};

int main ()
{   Row<int>    row;
    int         col = 0;
    Matrix<int>     matrix;    // Use whatever type is suitable for your data
    
    //  Populate row(s)
    row.push_back (col);
    //  Populate the matrix with rows
    matrix.push_back (row);
}
Last edited on
if you are working in windows you can use an excel widget (its excel lite of sorts) in a program if you want a full spreadsheet capability. I don't recall what you need to get this inserted into the program, but its part of the MS tools somewhere.

Here is something for you to play with.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
#include <fstream>
#include <sstream>
#include <stdexcept>
#include <string>
#include <vector>

std::string& trim_right(
  std::string&       s,
  const std::string& delimiters = " \f\n\r\t\v" )
{
  return s.erase( s.find_last_not_of( delimiters ) + 1 );
}

std::string& trim_left(
  std::string&       s,
  const std::string& delimiters = " \f\n\r\t\v" )
{
  return s.erase( 0, s.find_first_not_of( delimiters ) );
}

std::string& trim(
  std::string&       s,
  const std::string& delimiters = " \f\n\r\t\v" )
{
  return trim_left( trim_right( s, delimiters ), delimiters );
}

std::vector <std::vector <std::string> >
read_csv( const std::string& filename, char delimiter = ',' )
{
  std::vector <std::vector <std::string> > result;
  
  std::ifstream f( filename );
  if (!f) throw std::runtime_error( "read_csv(): Failure to open file " + filename );
  
  std::string s;
  while (getline( f, s ))
  {
    if (trim( s ).empty()) continue;
    std::vector <std::string> row;
    std::istringstream ss( s );
    while (getline( ss, s, delimiter ))
      row.emplace_back( trim( s ) );
    result.emplace_back( row );
  }
  
  return result;
}

#include <algorithm>
#include <iostream>

int main( int argc, char** argv )
{
  if (argc != 2)
  {
    std::cerr << "usage:\n  " << argv[0] << " CSVFILE\n\n"
      "Reads a comma-separated CSV file containing NO quotes.\n";
    return 1;
  }
  try
  {
    // Read the CSV file
    auto table = read_csv( argv[1] );
    
    // Print number of records
    std::cout << table.size() << " records\n";
    
    // Print max number of fields per record
    std::cout << std::max_element( table.begin(), table.end(), 
      []( const std::vector <std::string> & a, const std::vector <std::string> & b )
      {
        return a.size() < b.size();
      } 
    )->size() << " fields per record\n";
    
    // Print the first 5 records (or as many as available)
    auto N = std::min( (std::size_t)5, table.size() );
    const char* commas[] = { "", ", " };
    
    std::cout << "First " << N << " records:\n";
    for (std::size_t n = 0; n < N; n++)
    {
      bool b = false;
      std::cout << "  " << n << " : ";
      for (auto s : table[n])
        std::cout << commas[b++] << s;
      std::cout << "\n";
    }    
    
    // Print the last 5 records
    std::cout << "Last " << N << " records:\n";
    for (std::size_t n = 0; n < N; n++)
    {
      bool b = false;
      std::cout << "  " << (table.size() - N + n) << " : ";
      for (auto s : table[table.size() - N + n])
        std::cout << commas[b++] << s;
      std::cout << "\n";
    }
  }
  catch (const std::exception& e)
  {
    std::cerr << e.what() << "\n";
    return 1;
  }
}

Note: this code is pretty simple. It assumes that there are NO COMMAS in fields (and that fields are NOT QUOTED).

If you have to deal with quoted data, that makes life a whole lot more complicated. Let me know if you do.

Hope this helps.
Thanks!

@Duthomhas: I'm getting errors when I compile the above code on my local machine (it runs fine on cpp.sh). Here's the error log for your reference:

https://pastebin.com/07hnQg8L

What went wrong here? Is my compiler version incompatible with the code? Seems like a problem with the auto keyword, emplace_back and lambda expressions..

EDIT: Never mind. I changed the compiler to C++11 and it works now.
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#include <iostream>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <vector>
using namespace std;

using vec = vector<string>;
using matrix = vector<vec>;

//======================================================================

matrix readCSV( string filename )
{
   char separator = ',';
   matrix result;
   string row, item;

   ifstream in( filename );
   while( getline( in, row ) )
   {
      vec R;
      stringstream ss( row );
      while ( getline ( ss, item, separator ) ) R.push_back( item );
      result.push_back( R );
   }
   in.close();
   return result;
}

//======================================================================

void printMatrix( const matrix &M )
{
   for ( vec row : M )
   {
      for ( string s : row ) cout << setw( 12 ) << left << s << " ";    // need a variable format really
      cout << '\n';
   }
}

//======================================================================

void deleteRow( matrix &M, int row )
{
   if ( row < M.size() ) M.erase( M.begin() + row );
}

//======================================================================

void deleteCol( matrix &M, int col  )
{
   for ( vec &row : M )
   {
      if ( col < row.size() ) row.erase( row.begin() + col );
   }
}

//======================================================================

void edit( matrix &M, int i, int j, string value )
{
   if ( i < M.size() && j < M[i].size() ) M[i][j] = value;
}

//======================================================================

int main()
{
   matrix pets = readCSV( "pets.csv" );
   printMatrix( pets );

   cout << "\n\n";

   deleteRow( pets, 3 );
   deleteCol( pets, 3 );
   edit( pets, 1, 2, "12" );
   printMatrix( pets );
}

//====================================================================== 


pets.csv
Animal,Name,Age,Food,Owner,Notes
Dog,Fido,6,Chewies,R. Smith,Barks
Cat,Fluffy,8,Whiskers,M. Jones,Miaows
Hamster,Giles,2,Scratchies,A. Green 
Snake,Hissie,3,Mice,Bob


Animal       Name         Age          Food         Owner        Notes        
Dog          Fido         6            Chewies      R. Smith     Barks        
Cat          Fluffy       8            Whiskers     M. Jones     Miaows       
Hamster      Giles        2            Scratchies   A. Green     
Snake        Hissie       3            Mice         Bob          


Animal       Name         Age          Owner        Notes        
Dog          Fido         12           R. Smith     Barks        
Cat          Fluffy       8            M. Jones     Miaows       
Snake        Hissie       3            Bob

@lastchance: Thanks! One more thing though: your code treats the matrix as a vector of string vectors. Wouldn't the age column also consist of string entries?

Suppose that the file I want to read columns of different data types, like in your example above, 5 columns are string, 1 is int. The logical thing to do would be to treat columns as vectors. So we have 5 string and 1 int vectors, and then we'd want to form a vector of those vectors.

Is there any way to do that? Do I use variant datatype?
Last edited on
Sorry to not respond before now -- yes, my code requires C++11 at minimum. (You can basically assume that is true of any C++ you find online these days.)

You'll notice last_chance's CSV reader code is almost identical to mine. (Mine minds a few P's and Q's his doesn't.)

The trick with a CSV file is you cannot glean any information about the type of a thing. Everything is a string.

However, you are correct — the boost::variant datatype is probably your best answer. Use some typedefs to help:

1
2
3
4
5
6
7
8
9
10
11
12
...
#include <boost/variant.hpp>

typedef boost::variant <std::string,int> field_type;
typedef std::vector <field_type> record_type;
typedef std::vector <record_type> table_type;

table_type 
read_csv( ... )
{
  ...
}

The neat thing about boost.variant is that it automagically casts between the listed types (assuming each type is type-convertible).
@sphyrch, the CSV file only knows about character content, so you lose nothing by reading into strings. You can soon convert to ints or doubles later. If you want column vectors then you will either have to rely on rows being equal length or pad them until they are.

You can't rely on one column having a common type - even the age column has a string header. And how would you make your program intelligent enough to know whether 12 is an int, double, ... or even a string?
Last edited on
@lastchance: For now I'm trying to understand this one step at a time. So I can just clean the CSV file in the first place and remove the header. Right now I'm just assuming that each row has a uniform data type.
What lastchance is saying is that a CSV file does not tell you what kind of data is in the file — all CSV data is a string.

If you want it to be something else, you have to already know what kind of data it is.

For example, using the pets.csv data above, the “Age” column represents integer data, but all other columns represent string data. There is really no way for the computer to "know" this; you must know it beforehand and attempt the conversion from string to integer for the appropriate data.
closed account (48T7M4Gy)
<tuple>'s are another way to go.

Read the first line of the CSV file to determine how many columns there are in each tuple and then read the whole file as <tuple>'s

Once that is done, the column and/or row manipulation follows.
Topic archived. No new replies allowed.