Unsolved problem, don't let it sink: Read huge txt's into memory efficiently?

Pages: 1234
1
2
3
4
5
6
7
8
9
10
11
int main(){
	char const *data = "100.111 12.34 5.23e3 21234 200 0.0001 ";
	string s=data;
	s.resize(s.length()*TIMES+1);
	int i=TIMES-1;
	while(i--)s+=data;
	istringstream iss(s);
	vector <float> v(TIMES*6);
	for(vector<float>::iterator it=v.begin();it != v.end();it++)
		iss >> *it;
}


First of all, change the resize to a reserve. You changing the size of the string to data plus about (1000000 *s.length() ) "default characters", which are probably null bytes. Then you are adding data to the end of this already massive string 1000000 more times, which will cause reallocations. Bottom line, you are not testing the same data at all.

Using += is fine as long as you already have the space reserved, which you do not using resize. You would have to use replace to replace characters already in the string if you want to use resize, which is unnecessary.
for your test.cpp with that commented:
./a.out  0.14s user 0.07s system 99% cpu 0.205 total

for test.c with the equivalent commented:
./a.out  0.09s user 0.04s system 89% cpu 0.142 total

Wow, you're right! Ha, It's very fast, but not as fast as strcpy!
@jimc sohguanh already did that. I agree, but I assumed that + had the same behavior as strcat. Now that it's working on the right data, the STL code has abysmal performance:
/a.out  6.15s user 0.18s system 98% cpu 6.419 total

Last edited on
@jimc sohguanh already did that. I agree, but I assumed that + had the same behavior as strcat.


Sohguanh switched to using resize+replace, not reserve with "+=". Replace can have overhead if you are not very careful; it is not limited to replacing the same size strings, and with a large string, mismatched sizes will kill performance. "+=" should be the same as or better than strcat since "+=" can be inlined. Also, using the time shell command is not a very accurate measure. It is much better to insert timing into the code as Duoas does.

As to the which is faster, C or C++, I would say try running stl sort or search vs. C qsort and bsearch. The C versions have to call the comparison function via pointer at run time. The C++ versions can often resolve and inline the comparison at compile time. C++ isn't necessarily faster or slower; it depends on what you are doing. Programmers often do things in C++ that do not perform well by accident, like unnecessarily constructing a string in an inner loop. A string has to know its length, so it has to do a strlen which isn't always necessary in addition to memory management. You can't do this by accident in C.
Just to confirm my victory, I combined the libc string-making code with the STL reading code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
using namespace std;
#define TIMES 1000000
int main(){
	char const data[]="100.111 12.34 5.23e3 21234 200 0.0001 ";
	int i=0;
	int len=strlen(data);
	char *s=(char*)malloc(len*TIMES+1);
	while(i<TIMES){
		strcpy(s+len*i,data);
		i++;
	}
	istringstream iss(s);
	vector <float> v(TIMES*6);
	for(vector<float>::iterator it=v.begin();it != v.end();it++)
		iss >> *it;
}

this gives a time of:
./a.out  6.25s user 0.20s system 98% cpu 6.559 total

The c reading code takes approx. half a second.
Iterators and vectors are slow. I don't know how to make them fast, but I know that arrays and pointers are always fast, so I use them when I want speed. Or maybe it's istringstream, but I already found that only has *2 over strtof.

EDIT: I just realized: use a script to translate the file into a single long line with one space between the numbers, and you can use my C approach, which works on 1 million numbers in .5 seconds! This thread is solved.
Last edited on


I'd expect iterators to be a bit quicker as you're always saving yourself the unnecessary column lookup.

I would expect iterators to definitely be faster for multidimensional vectors, but probably about the same for single dimension vectors.

I'd expect vector<bool> to be slower as the bits are packed into a byte.


While that may have been true in the past, my test have been showing almost the same speed. I am still working to try and determine if it is a truly valid test though. It is only recently that processors got fast, low latency shift hardware. They can shift any number of bits in a single clock. Also, since setting up the mask for bit addressing is independent of the memory access of the right word, it can be done in parallel on any modern super scalar, OOO processor.

Counting instructions to determine performance stopped working a long time ago. These days, you just about have to run it and time it with representative data. Too many cache effects, super-scalar effects, and OOO effects. I need to boot up one of my really old machines to see how the behavior has changed.

Also, vector<bool> is 8 times smaller than the same number of bits in an array of bool, since the array has to use at least a char for each bit. 8 times smaller makes it more easily cacheable. I did get slightly better speed using an iterator vs. subscripting with vector<bool>, but not too much better. This probably has to do with scripting creating a new reference each time, while an iterator just shifts the mask in a pre existing iterator most of the time.

Without optimization, the array is much faster, but at O3 with g++ 4.0.1, I am getting about the same speed. O3 makes the array a lot faster also, but vector obviously benefits a lot more, since subscripting operations are function calls without in-lining. I don't know what some of the posters are talking about regarding subscripting. Vector subscripting is supposed to be unchecked access. The at() function is used for checked access. I suppose some IDEs may add checking in debug mode, but that better go away if optimization is turned on. I don't use an IDE, I just use emacs and g++.


I can't see how it depends on OS, filesystem or physical media.


Some OSs/filesystems may read the entire file into a disk cache right from the start, which means that copying it into another string is a somewhat useless copy; it is already in memory. It does avoid a lot of smaller read system calls though, but the expense of a read() call varies from system to system. Modern processors and OSs can execute a context switch from user to system space very fast, so I would doubt that copying 900MB around in memory is worth it in most cases. Especially if you are just making a linear pass through the data; not seeking around.

Other systems my just read part of the file, and then go back and read more on demand; essentially it gets mapped into pages, and pages are read in on demand. Getting hit with seek latency is fine on an SSD, not to bad on a hard disk, and horrible on a cd-rom. Running stuff over an NFS, or other network mount can be painfully slow compared to local disk. As far as files go, a couple of hundred MB isn't really that large. It is a waste to use a text file though. A binary file would be significantly faster.

The c reading code takes approx. half a second.
Iterators and vectors are slow. I don't know how to make them fast, but I know that arrays and pointers are always fast, so I use them when I want speed. Or maybe it's istringstream, but I already found that only has *2 over strtof.


A vector iterator is just a pointer generally, so there should be very little difference vs. a dynamically allocated array in C, unless you are compiling with debug/non-optimized settings. A static, compile time sized array is different.

I would not be surprised that an istringstream is slower at conversion than strtof or strtol. I have no problem with using these functions in C++ when appropriate, they just don't play well with C++ strings.
I just realized that passing s+1 to strtof to skip whitespace is unnecessary. strtof skips all leading whitespace!
Jimc, I agree with what you've said.
EDIT: Ok, I may have been too hasty. My C reading code only reads 1000000 floats, whereas the c++ code reads 6000000. By rectifying this error, we get the following:
C reading code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TIMES 1000000
int main(){					
	char const data[]="100.111 12.34 5.23e3 21234 200 0.0001 ";
	int i=0;
	int len=strlen(data);
	char *s=malloc(len*TIMES+1);
	while(i<TIMES){
		strcpy(s+len*i,data);
		i++;
	}
	float *v=malloc(TIMES*6*sizeof(float));
	i=TIMES*6;
	char *t=s;
	while(i--)v[i]=strtof(t,&t);
}

./a.out  2.30s user 0.11s system 99% cpu 2.427 total

c++ reading code, using c string maker to ensure the same input:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
using namespace std;
#define TIMES 1000000
int main(){
	char const *data = "100.111 12.34 5.23e3 21234 200 0.0001 ";
        string s2(data);
        string s(s2);
        s.resize(s.length()*TIMES+1);
        int i=0;
        while(i<TIMES){
          s.replace((i*s2.length()),s2.length(),s2);
          i++;
        }
	istringstream iss(s);
	vector <float> v(TIMES*6);
	for(vector<float>::iterator it=v.begin();it != v.end();it++)
		iss >> *it;
}

./a.out  6.18s user 0.15s system 99% cpu 6.379 total

But at least it's not as bad as perl,
@v = map $_+0.0, split / /,("100.111 12.34 5.23e3 21234 200 0.0001 " x 1000000);
which still hasn't ended after 2 min now.
Last edited on
1
2
3
4
       istringstream iss(s);
       vector <float> v(TIMES*6);
       for(vector<float>::iterator it=v.begin();it != v.end();it++)
          iss >> *it; //this line commented the whole program runs very fast 


For once I would like to agree with rocketboy9000 and C++ istringstream is quite SLOW in comparison to it's C counter-part on the string-to-float function.

Please note C++ vector and string class still churn out respectable timing. For such cases, I would think to get the most performance, the string-to-float function we resort to C strtof but the container and string we can still use C++ isn't it.

So it is not ALL Standard C++ classes that slow things down. It only affect certain Standard C++ classes and in this situation istringstream.

Edit:
1
2
3
        
        vector <float> v(TIMES*6);
        copy(s.begin(),s.end(),back_inserter(v));


If we don't have the requirement to do a string-to-float operation, above using Standard C++ churn out competitive timing to C code.


Last edited on
Duoas lines 13 to 80: clock() ?

1
2
//s.resize(s.length()*TIMES+1);
s.reserve(s.length*TIMES); //why the extra character? 


If we don't have the requirement to do a string-to-float operation
The problem is that is never was used. You were just making the conversion, but never check if was wrong or do something about it.
I guess I will read it directly, and perform an error check after every row.

rocketboy9000 with gcc: std::ios_base::sync_with_stdio(false); at the beginning of the code to improve c++ iostreams
http://www.codeforces.com/blog/entry/925 (now is not working)
None of those sources you listed have actual statistics, so I tried making some.
The tests aren't equivalent.

1. Building of the source input strings aren't the same.
2. The C++ has extra code in the read loop. It uses post increment, which is more expensive than the more appropriate pre increment. It has an iterator compare.
The C code has a simple integer check.
3. The C code is optimised for the data. It uses the fact that the floats are seperated by a single space and uses that magic number to increment to the exact position of the next number. The C++ code has to search thru white space.
4. The C program does not release memory allocated as part of the test, whereas the C++ one does.

I suppose we could fix this code to make it fair, but why bother.

The test is pointless as a comparison between the performance of the two libraries and only proves that you need to understand each language to use it well.
Last edited on
kbw, all those things have been addressed in the later posts. 1. I now the *same* code for that. 2. That should be equivalent on a reasonably good compiler. 3. that was actually unnecessary as strtof will do that for me. 4. that is done as part of the process end (on linux, anyway.)
Please read all the posts (or at least skim them) before criticizing me. I have admitted to making a few errors, but the new code *proves*, indisputably, that something is slow about using a vector and istream rather than a pointer and strtof.
EDIT: and I don't claim to know what the cause is. But the fact remains. ne555, the link you posted should be irrelevant as I am using stringstream, not iostream. If that does affect it, that would be a sad day for STL.
Meh. Here are the facts. Accept them. Use them to write better code. Or don't. But the facts will be the same regardless.
EDTT 2: And if you don't believe that sohguanh's code has the same meaning as my C code, here is a direct comparison using the same code to initialize the string:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <stdlib.h>
#include <string.h>
using namespace std;
#define TIMES 1000000
int main(){
        char const data[]="100.111 12.34 5.23e3 21234 200 0.0001 ";
	int i=0;
	int len=strlen(data);
	char *s=(char*)malloc(len*TIMES+1);
	while(i<TIMES){
		strcpy(s+len*i,data);
		i++;
	}
	istringstream iss(s);
	vector <float> v(TIMES*6);
	for(vector<float>::iterator it=v.begin();it != v.end();it++)
		iss >> *it;
}

./a.out  5.95s user 0.15s system 99% cpu 6.110 total
Last edited on
2. That should be equivalent on a reasonably good compiler.
I don't think so.

3. that was actually unnecessary as strtof will do that for me.
strtof() doesn't appear in The Single UNIX ® Specification, nor is it available on Windows (Microsoft's C compilers). You probably should be using strtod().

4. that is done as part of the process end (on linux, anyway.)
Leaving the cleanup to the OS is improper. And more importantly, because you don't call free, it's not timed.

Please read all the posts (or at least skim them) before criticizing me.
I did skim the posts. My comments are meant to be helpful.

Since the tests are meant to compare the performance of istringstream and strtod(), there's no reason why the source strings couldn't be constructed by the same code, with the C++ code initialising a string from the final buffer. A series of memsets would be a better way to initialise the buffer as all the lengths are fixed.

In the solutions that Duoas and myself suggested, we didn't use istringstream, we read directly from the ifstream. I thought istringstream was the cause of the slowness to begin.
Last edited on
Hmm...
I don't think so.

I think so. I think [code]while(i++<1000){ ...[code] is the same exact code as [code]while(i<1000&&++i){ ...[code]
strtof() doesn't appear in The Single UNIX ® Specification, nor is it available on Windows (Microsoft's C compilers). You probably should be using strtod().

True. But that will probably be even faster.
Leaving the cleanup to the OS is improper. And more importantly, because you don't call free, it's not timed.

I used the *command* time for the timing. The entire process dispatch and end is timed.\
Since the tests are meant to compare the performance of istringstream and strtod(), there's no reason why the source strings couldn't be constructed by the same code, with the C++ code initialising a string from the final buffer. A series of memsets would be a better way to initialise the buffer as all the lengths are fixed.

Look at my edit above. The same code is used to construct the string.
@rocketboy

How old is your system? It ran in ~8 seconds here.

time perl -le '$s="100.111 12.34 5.23e3 21234 200 0.0001 " x 1000000;@x=split /\s/,$s;@v=map{$_+=0.5}@x;print scalar(@x);print join("\n",@x[0 .. 10])'
6000000
100.611
12.84
5230.5
21234.5
200.5
0.5001
100.611
12.84
5230.5
21234.5
200.5

real 0m7.872s
user 0m6.422s
sys 0m1.322s
while(i++<1000){ ... is the same exact code as while(i<1000&&++i)
That wasn't the loop I had in mind. I was comparing:
for(vector<float>::iterator it=v.begin();it != v.end();it++)
with:
while(i--)
They're hardly the same.

The entire process dispatch and end is timed.
Yes it is, but you never call free, so that isn't timed. The default heap works by making allocations from the OS, then suballocating from that pool. When the program ends, whole pool is given back by the OS, not by your app. It doesn't go thru the pool releasing each block back to itself before that final release. My point is, the C++ app is doing that release that your C app isn't doing, so the timed comparison isn't fair.
Last edited on
@sohguanh

Thanks for showing copy - I was wondering if that might work.

Given this case where the header is in the first 5 rows, can we use getline to read those, then use copy to pull in the rest?
rocketboy9000 wrote:
I should get MASSIVE performance increases by doing C stuff instead of C++ stuff.
Yes the link is irrelevant for your test.
I just thought that you will like it, as a form of reading input efficiently (iostream beats cstdio)
http://www.codeforces.com/blog/entry/925 (now it is working)
Edit: And I argue that you should be reading directly to the float variables, instead of performing conversions
Last edited on
Just for fun, a naive translation of the Perl oneliner to Scala:
 
Iterator.fill(1000000)("100.111 12.34 5.23e3 21234 200 0.0001 ").mkString.split(" ").map(_.toFloat)


This single line executes in about 3.5 s.
On the same computer, the rocketboy9000's C code takes 2.11 s.
Strange, but the C++ code takes forever to finish and I couldn't measure it (and I double checked that I copy-pasted it correctly)...

So, if the C++ version requires over 5 seconds, it is... extremely slow. :D (just kidding - really they are all quite fast, there is no point in doing such stupid microbenchmarks).

Back to the topic: Basing on the intuition I expect the original problem can be solved rerasonably in about 30-60 seconds. You cannot get below 15 seconds because of the typical disk performance (~60 MB / s).

BTW:

I should get MASSIVE performance increases by doing C stuff instead of C++ stuff


I get MASSIVE performance increases by doing C++ stuff instead of C stuff. Just because in a more abstracted language I can code more efficient algorithms that I would not have time/courage to implement in pure C (it would be too complex).
Last edited on
How old is your system?

I dunno, it's an old hand-me-down from my dad, a Dell Latitude x300. So *googles...* 2003.
My point is, the C++ app is doing that release that your C app isn't doing, so the timed comparison isn't fair.

So you want to do it with the new operator and no delete? Ok, try it. Or I could use alloca instead of malloc in the C, if you prefer.
And I argue that you should be reading directly to the float variables, instead of performing conversions.

Well, yeah, but that's just basically mmap, which is already written for you.
I get MASSIVE performance increases by doing C++ stuff instead of C stuff.

That's programmer performance, which is highest I would argue in Perl, but Perl has slow runtime performance. For complex problems where you also need speed, you can't beat C++. Don't get me wrong. I LIKE C++. But sometimes one library is better than another for certain problems. Your Scala code is pretty cool, doing everything with method chaining.
That's programmer performance, which is highest I would argue in Perl, but Perl has slow runtime performance. For complex problems where you also need speed, you can't beat C++. Don't get me wrong. I LIKE C++. But sometimes one library is better than another for certain problems. Your Scala code is pretty cool, doing everything with method chaining.


This simple exercise has shown C does not out-perform C++ massively in some sense. As I test out, an inappropriate use of a C++ class lead to the skewed results. If we avoid using that class and replace with another we get comparable results didn't we ?

So the lesson we learnt is we must not be too quick to condemn languages. Sometimes we need to see if we code equivalent or we make proper use of the classes and their member functions in this case.

Perl is interpreted and it is understandable if it run abit slower than C and C++. But when you have a business requirement that need to do some complex pattern text matching, trust me, you will embrace Perl over C and C++ over those minor performance difference. I been there and I know.

Conclusion: One must use different language for different means. I also do Java development too but it seems to take a back-seat in this forum since one Java supporter forumer has "vanished" :P
Pages: 1234