efficient std vector I/O (.mat format)

Hello People,

I have written a pear of templates in order to write and read std vectors in and from a Octave/Matlab compatible (.mat) files.
This is the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
// Example program
#include <iostream>
#include <vector>
#include <algorithm>
#include <stdlib.h>
#include <time.h>
#include <cassert>
#include <numeric>
#include <fstream>
#include <iterator>

using namespace std;

// Save vector as matrix type in Octave (.mat) file.
template <typename T>
void	save_vector_as_matrix( const std::string& name, const std::vector<T>& matrix, ofstream &FILE )
{
	std::size_t	columns;
	columns = matrix.size();
	std::string	nameLine = "# name: ", typeLine = "# type: matrix";
	std::string	rowsLine = "# rows: 1", columnsLine = "# columns: ";
	nameLine += name;
	FILE << nameLine << endl;
	FILE << typeLine << endl;
	FILE << rowsLine << endl;
	FILE << columnsLine << columns << endl;

	for ( std::size_t column = 0; column < columns; column++ )
		FILE << " " << matrix[column];

	FILE << "\n\n" << endl;
} // end template save_vector_as_matrix


// Load matrix type from Octave (.mat) file into vector.
template <typename T>
void	load_matrix_to_vector( std::vector<T>& arr, ifstream &FILE )
{
	std::size_t	columns;
	std::string	str, STR;

	if ( !std::getline(FILE, str) )
	{
		std::cout << "\nOctaveInterface.h inconsistency:" << endl;
		std::cout << "In function load_matrix_to_vector:" << endl;
		std::cout << "Cannot read line from file." << endl;
		exit( EXIT_FAILURE );
	}
	STR = "# type: matrix";
	if ( str.compare(STR) == 0 )
	{
		if ( !std::getline(FILE, str) )
		{
			std::cout << "\nOctaveInterface.h inconsistency:" << endl;
			std::cout << "In function load_matrix_to_vector:" << endl;
			std::cout << "Cannot read line from file." << endl;
			exit( EXIT_FAILURE );
		}
		STR = "# rows: 1";
		if ( str.compare(STR) != 0 )
		{
			std::cout << "\nOctaveInterface.h inconsistency:" << endl;
			std::cout << "In function load_matrix_to_vector:" << endl;
			std::cout << "matrix type does not fulfill function conditions." << endl;
			std::cout << "It must be: " << STR << endl;
			std::cout << "Yet, it is: " << str << endl;
			exit( EXIT_FAILURE );
		}

		str.resize(11);
		FILE.read(&str[0],11);
		STR = "# columns: ";
		if ( str.compare(STR) == 0 )
		{
			FILE >> columns;
		}
		else
		{
			std::cout << "\nOctaveInterface.h inconsistency:" << endl;
			std::cout << "In function load_matrix_to_vector:" << endl;
			std::cout << "File corrupted." << endl;
			exit( EXIT_FAILURE );
		}
		std::getline(FILE, str);

		arr.resize(columns);
		for ( std::size_t column = 0; column < columns; column++ )
			FILE >> arr[column];
	}
	else
	{
		std::cout << "\nOctaveInterface.h inconsistency:" << endl;
		std::cout << "In function load_matrix_to_vector:" << endl;
		std::cout << "arr must be of type matrix." << endl;
		exit( EXIT_FAILURE );
	}
} // end template load_matrix_to_vector


int	main()
{
	std::vector<std::size_t>	a = {0,1,2,3,4,5,6,7,8,9};

	std::cout << "\na:\n";
	for(const auto& s : a)
		std::cout << s << " ";
	std::cout << "\n";

	// open a file in write mode.
	ofstream outfile;
	outfile.open("./a.mat", ios::out | ios::trunc);

	// file preamble.
	outfile << "# This is a file created by a test function" << endl;
	outfile << "# Author: Dematties Dario Jesus." << endl;

	outfile << "\n\n" << endl;
	
	// saves afferentArrayDimensionality
	save_vector_as_matrix("a", a, outfile);

	// close the opened file.
	outfile.close();

	std::cout << "\nvector a saved\n";
	a.clear();	
	assert(a.size() == 0);

	bool	check_a = false;

	std::string	str;
	std::string	STR;

	// open a file in read mode.
	ifstream infile;
	infile.open("./a.mat", ios::in | std::ifstream::binary);

	while ( std::getline(infile, str) ) {

		STR = "# name: a";
		if ( str.compare(STR) == 0 ) {
			load_matrix_to_vector(a, infile);
			check_a = true;
		}

	}
	// close the opened file.
	infile.close();

	assert(check_a == true);
	
	std::cout << "\na:\n";
	for(const auto& s : a)
		std::cout << s << " ";
	std::cout << "\n";

	return 0;
}


The problem is that it is extremely inefficient because it goes through a different I/O operation for every vector element. You start to notice this with very large vectors. I have coded in this way because, in order to keep Octave/Matlab file format compatibility, I have to insert a space character between every pair of vector elements.

The question is: how can I make std vector I/O operations efficiently and keep -at the same time- the Octave/Matlab file format compatibility?


To speed up writing to can save the data first to a stringstream and then save the stringstream as a string in one go.
matlab specifically can write C-like output with some of its commands, producing high performance binary files. Is it possible to change the .m code to write the data this way to avoid the clunky .mat file format? mat has to support the horrid matlab capability of 'anything can be of any data type at any time' and that makes it overly complicated.

I understand this may not be desired for your project, but the results are quite good if you are allowed to do it.
Last edited on
> The problem is that it is extremely inefficient. You start to notice this with very large vectors.

How much time does it take?
For a vector of half a million integers, this takes about 150 milliseconds (GNU), 336 milliseconds (LLVM)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#include <iostream>
#include <vector>
#include <iterator>
#include <numeric>
#include <fstream>

namespace
{
    const std::string name_tag = "# name: " ;
    const std::string type_tag = "# type: matrix" ;
    const std::string rows_tag = "# rows: 1" ;
}

// Save vector as matrix type in Octave (.mat) stm.
template <typename T>
std::ostream& save_vector_as_matrix( const std::string& name, const std::vector<T>& matrix, std::ofstream& stm )
{
	stm << name_tag << name << '\n' << type_tag << '\n' << rows_tag << '\n'
	    << "# columns: " << matrix.size() << '\n' ;

    std::copy( matrix.begin(), matrix.end(), std::ostream_iterator<T>( stm, " " ) ) ;
	return stm << "\n\n\n" ;
}

template <typename T>
std::istream& load_matrix_to_vector( std::vector<T>& matrix, const std::string& name, std::istream& stm )
{
    std::string str ;
    if ( !( std::getline( stm, str ) && str == ( name_tag + name ) ) ) goto failure ;
    if( !( std::getline( stm, str ) && str == type_tag ) ) goto failure ;
    if ( !( std::getline( stm, str ) && str == rows_tag  ) ) goto failure ;

    char ch;
    std::size_t expected_size ;
    if( !( stm >> ch && ch == '#' && stm >> str && str == "columns:" && stm >> expected_size ) )
        goto failure ;

    matrix.clear() ;
    using iterator = std::istream_iterator<T> ;
    std::copy( iterator(stm), iterator(), std::back_inserter(matrix) ) ;
    if( matrix.size() != expected_size ) goto failure ;

    return stm ;

    failure:
        stm.setstate( std::ios::failbit ) ;
        matrix.clear() ;
        return stm ;
}

template <typename T>
std::vector<T> load_matrix( const std::string& name, std::istream& stm )
{
    std::vector<T> matrix ;
    if( !load_matrix_to_vector( matrix, name, stm ).eof() ) std::cerr << "input failure!\n" ;
    return matrix ;
}

int main()
{
    std::vector<int> saved_vec( 500'000 ) ;// vector of half a million elements
    std::iota( saved_vec.begin(), saved_vec.end(), 0 ) ;
    const std::string vec_name = "this is a test vector of integers" ;
    const std::string file_name = "test_vector.mat" ;

    {
        std::ofstream file(file_name) ;
        if( save_vector_as_matrix( vec_name, saved_vec, file ) ) std::cout << "saved to file\n" ;
    }

    {
        std::ifstream file(file_name) ;
        const auto loaded_vec = load_matrix<int>( vec_name, file ) ;
        if( loaded_vec == saved_vec ) std::cout << "read from file successfully\n" ;
    }
} 

http://coliru.stacked-crooked.com/a/d29fcc9ee341b61f

C++ file streams are fully buffered; writing it first to a string stream would typically make it slower (double buffering tends to be slow).
Wow! what a code JLBorges!

buffering tends to be slow


You are right, but it will be necessary for me in the future, because I am migrating this code to MPI. I'll need to have all the data in a buffer before writing it to a file, in order to know the limits of the writing operation among different processes (Thanks Thomas1965).

How much time does it take?


I have other templates to write vector of vectors and multidimensional vectors to Octave arrays. It is a lot of information I/O operations from and to files. It could take forever with the code I posted, believe me.
To the contrary, the code you posted not only is clean, but also efficient.

I understand this may not be desired for your project, but the results are quite good if you are allowed to do it.


It is absolutely necessary for my project to have the data available in Octave (.mat) format for posterior testing visualization and manipulation.

Thank you very much!
This forum is amazing!
Topic archived. No new replies allowed.