Fast retrieval/ reading of data

I have a 1024 by 1024 dataset and I've written a piece of code that tokenize line by line and storing them elements by elements which is extremely slow.

The dataset is available in .txt and .dat format. Could you please teach, guide or advise me on how to import the data in a faster manner? Thanks a million in advance.
can you show how you read in the values? Example code will be nice
The dataset is available in .txt and .dat format.

What does the .txt file contents look like - a sample would be useful.
How does the .dat file differ? Is the contents stored differently?
Sorry, went to sleep last night.

The dataset is available in .txt and .dat format.


Haha, sorry, don't mind that, just checked out the files, turned out to be the same, didn't know that before.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <vector>
using namespace std;

int main(){

	string line;

	int n = 1024;

	float pinvL[1024][1024];
	int coun = 0;
	int coun2;

	vector<string> tokens;
	
	cout << "Loading pinvL\n";
	ifstream pinvLText ("pinvL.txt");
	if (pinvLText.is_open())
	{
		while ( pinvLText.good() && coun!= n)
		{
			getline (pinvLText,line);
			istringstream iss(line);
			copy(istream_iterator<string>(iss),
				istream_iterator<string>(),
				//ostream_iterator<string>(cout, " "),
				back_inserter<vector<string> >(tokens));


			for ( coun2 = 0; coun2 < n; coun2++)
			{
				pinvL[coun][coun2] = atof(tokens.at(coun2).c_str());
				cout << "[" << coun << "][" << coun2 << "] : " << pinvL[coun][coun2] << "\n";
			}

			tokens.clear();
			//cout << "\n";
			coun++;				
		}
		pinvLText.close();
	}
	else cout << "Unable to open file\n";
	cout << "Finished loading pinvL\n";
	
}


like this


The content of the text/ dat file looks like

say 3 by 3

-0.123 0.124 4.12
-0.45 0.454 2.12
-0.12 0.5421 1.12

delimited by space in a line
and lines are delimited by newlines

but of 1024 by 1024


Thanks
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <iostream>
#include <fstream>

int main()
{
            enum { N = 1024 } ;
            float pinvL[N][N] = { {0} } ;

            std::ifstream pinvLText("pinvL.txt");
            int cnt_read = 0 ;
            for( int row = 0 ; row < N ; ++row )
            {
                for( int col = 0 ; col < N ; ++col )
                {
                    if( pinvLText >> pinvL[row][col] ) ++cnt_read ;
                    else goto read_error ;
                }
            }

            // do something with the data in pinvL

            return 0 ;

    read_error:
            std::cerr << "error in reading data\n" ;
            return 1 ;
}
Thanks a lot for the efficient code JLBorges and thanks all!! If there is other person who is facing this problem please remember to not display element during each retrieval, it cuts off a lot lot lot of time. =D. Have a nice day.
Last edited on
One more comment. If you had full control over the format of the data file, it could be more efficient to use a binary format rather than ordinary text.
Topic archived. No new replies allowed.