Streaming data. NTFS v ext4

I have data files that I use to stream data into vectors for analysis. When using Windows(NTFS), it takes like 10x as long to stream the file than Linux. I'm wondering if it's possible to speed up the process in Windows. The data files I have are fairly large (200MB - 2GB), and using Windows in this case is preferred due to applications that cannot run natively in Linux.

Here is some sample code that I use to stream the data. Any help/advice/tips would be highly appreciated.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void createData (ifstream &dataFile, double tick, double tickA, string name) //Imports data into program.
    {
        unsigned short int month, day, year, hour, min, seconds;
        double open, high, low, close;
        unsigned int volume;
        char delim;

        if (dataFile.is_open())
        {
            while (dataFile >> month >> delim >> day >> delim >> year >> hour >> delim >> min >> delim >> seconds >> delim >> delim >> delim >> open >> delim >> high >>
                   delim >> low >> delim >> close >> delim >> volume)
            {
                matrix.push_back({month, day, year, hour, min, seconds, open, high, low, close, volume}); 
            }
        }
Last edited on
Are you dual-booting Windows and Linux on the same physical machine, with respective OS partitions on the same physical hard disk?

If not, there are MANY variables outside what you've posted to consider.
- What actual versions of Windows and Linux?
- What machines - processor, memory?
- What mass storage devices? An SSD will just knock a spinning disk right out of the park in performance terms. But even pairs of HDD have wide performance characteristics.
- What compilers did you use, what optimisation levels are in use?

> it takes like 10x as long to stream the file than Linux
Is that an actual measurement, or just some subjective feeling that Windows takes a bit longer?
Also, 10x what?
0.1s vs 1s is barely noticeable.

What do you do after you've processed the data?
More specifically, how long does that processing take?
For example, if we're comparing 0.1s vs 1s to read in the data, and then a further minute to finish processing all that data, then maybe you should be looking at the final minute, not the first second.

Another question is, even if you make the Windows version have equivalent performance, are you going to change your behaviour?
If the whole thing takes a few seconds, and you're only doing this once a day (or less), so what?
But if it's a case of seconds vs minutes, and you're doing this 20 times a day - that's a whole new problem.

Does the size of the data file matter?
For example, if they're evenly matched up to say 10MB and then Windows falls off a cliff, then that might be worth knowing.



Here are a couple of examples to test.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
void createData (ifstream &dataFile, double tick, double tickA, string name) //Imports data into program.
    {
        unsigned short int month, day, year, hour, min, seconds;
        double open, high, low, close;
        unsigned int volume;
        char delim;

        if (dataFile.is_open())
        {
            string line;
            int linecount = 0;
            // Minimum time it takes to just read the file
            while ( getline(dataFile,line) ) {
                ++linecount;
            }
            cout << linecount << endl;
        }

void createData (ifstream &dataFile, double tick, double tickA, string name) //Imports data into program.
    {
        unsigned short int month, day, year, hour, min, seconds;
        double open, high, low, close;
        unsigned int volume;
        char delim;

        if (dataFile.is_open())
        {
            int linecount = 0;
            // delta-time it takes to parse the input
            while (dataFile >> month >> delim >> day >> delim >> year >> hour >> delim >> min >> delim >> seconds >> delim >> delim >> delim >> open >> delim >> high >>
                   delim >> low >> delim >> close >> delim >> volume)
            {
                ++linecount;
            }
            cout << linecount << endl;
        }

At what point do you see the large difference between the two systems?
- just reading the file
- the delta when using >> to extract all the fields
- the delta when using matrix.push_back

If it's the last step, consider http://www.cplusplus.com/reference/vector/vector/reserve/


I am dual booting on the same machine. Two of the exact same SSD. The machine has an i7-6700 with 32GB of RAM. I'm not even concerned about analyzing the data, that speed is similar. The concern is with just streaming. On Ubuntu, the time to stream a 200MB file into the program was 4.16 seconds compared to 62.22 seconds on Windows 10. The larger files only increase that gap from what I've measured. Data of this size or larger is ran around 50 times/day. I've ran these tests numerous times on different machines and the results are similar.

I've had this discussion on different forums and the consensus was that ext4 just ran quicker than NTFS. That's what I've gathered. I was referred to a site (https://www.phoronix.com/forums/forum/software/general-linux-open-source/1468-performance-of-filesystems-compared-includes-reiser4-and-ext4?1765-PERFORMANCE-OF-FILESYSTEMS-COMPARED-(includes-Reiser4-and-Ext4)_=), and apparently ext4 is faster.

I've never been able to get an answer on if it's possible to speed up the process, but I've tried everything. Even if I could get Windows 10 to read from an ext4 drive would work wonders.
> Even if I could get Windows 10 to read from an ext4 drive would work wonders.

Ext2Fsd https://en.wikipedia.org/wiki/Ext2Fsd
Thanks for the link. I've tried that before. It recognizes there's an ext partition, but it is unable to access it for some reason. Feel like I'm doing something wrong, but maybe I can figure it out and get something to work
> On Ubuntu, the time to stream a 200MB file into the program was 4.16 seconds compared to 62.22 seconds on Windows 10.
> The larger files only increase that gap from what I've measured.
So what about the intermediate steps I talked about?

You see, I also have a Windows 10 running on an SSD with NTFS.
1
2
3
4
5
6
7
8
9
10
11
12
13
? dir 500
22/05/2019  07:29       524,288,000 500

? type foo.cmd
@echo off
echo %TIME%
copy /b 500 another_500
echo %TIME%

? foo.cmd
 7:36:56.00
        1 file(s) copied.
 7:36:59.48

The raw speed of the file system is not the issue. I can copy a half-GB file in a little over three seconds, and I rather suspect that your system could achieve something similar.

So the extra minute is being burnt somewhere else, and that somewhere else is likely some step within the code you have.

You need to start getting some quantitative profile data to make an informed decision as to how to proceed. Simply assuming "oh, it must be the file system" and attempting to glue in a windows driver to read ext4 isn't going to solve the problem.

Last edited on
some more numbers... I have a rotating disk with windows 10 on a cheap laptop:

dumb c++ write 1/2 GB to disk with ofs::write 4.21 (would memory mapped etc go even faster?). Note this is not faster than the copy below, because it did not have to read it first. It is just avoiding the read because it already has the data in hand. Presumably your code is in that state, though?)
windows copy: just over 6 seconds (using console timer as above)
xcopy: just over 6 sec. about the same.
cygwin's cp: ~10 sec

looks like you may want to do your own copy program, or write it to disk in binary block format in c++. Regardless, if this rotating disk can do it in 4.2, that SSD really should move it right along.

if you have a HIGH MEMORY system, you COULD use a ram disk if the data is volatile (that is, you write it, something reads it, you delete it?) and never touch the disk at all.

It has also been shown that light compression or hardware compression is faster than writing big data to disk for many applications. Can you compress and write less faster than you can write the full file, and is that useful? You can also test using window's folder compression on the target folder, try it both on and off?
Last edited on
On Ubuntu, the time to stream a 200MB file into the program was 4.16 seconds compared to 62.22 seconds on Windows 10.
200MB in 62.22 seconds is 3.2MB/s. Mass storage hasn't been that slow since the early 90's if my memory serves. There's definitely something else going on here.

One simple thing to try: use a larger stream buffer.
OK, test results time.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
#include <iostream>
#include <fstream>
#include <iomanip>
#include <string>
#include <vector>
#include <chrono>
using namespace std;

struct Trades {
  unsigned short int month, day, year, hour, min, seconds;
  double open, high, low, close;
  unsigned int volume;
  string name;
  string date;
  string time;
  string operation;
};

void createData1 (ifstream &dataFile, vector<Trades> &matrix)
{
  if (dataFile.is_open())
  {
    string line;
    int linecount = 0;
    // Minimum time it takes to just read the file
    while ( getline(dataFile,line) ) {
      ++linecount;
    }
    cout << linecount << endl;
  }
}

void createData2 (ifstream &dataFile, vector<Trades> &matrix)
{
  unsigned short int month, day, year, hour, min, seconds;
  double open, high, low, close;
  unsigned int volume;
  char delim;

  if (dataFile.is_open())
  {
    int linecount = 0;
    // delta-time it takes to parse the input
    while (dataFile >> month >> delim >> day >> delim >> year >> hour >> delim >> min >> delim >> seconds >>
            delim >> delim >> delim >> open >> delim >> high >>
            delim >> low >> delim >> close >> delim >> volume)
    {
      ++linecount;
    }
    cout << linecount << endl;
  }
}

void createData3 (ifstream &dataFile, vector<Trades> &matrix)
{
  unsigned short int month, day, year, hour, min, seconds;
  double open, high, low, close;
  unsigned int volume;
  char delim;

  if (dataFile.is_open())
  {
    while (dataFile >> month >> delim >> day >> delim >> year >> hour >> delim >> min >> delim >> seconds >>
            delim >> delim >> delim >> open >> delim >> high >>
            delim >> low >> delim >> close >> delim >> volume)
    {
      matrix.push_back({month, day, year, hour, min, seconds, open, high, low, close, volume});
    }
    cout << matrix.size() << endl;
  }
}

int main()
{
  using namespace std::chrono;
  {
    vector<Trades> matrix;
    ifstream fin("5m_records.csv");
    auto t0 = steady_clock::now();
    createData1(fin,matrix);
    auto t1 = steady_clock::now();
    auto d = duration_cast<duration<double>>(t1-t0);
    cout << d.count() << " seconds" << endl;
  }
  {
    vector<Trades> matrix;
    ifstream fin("5m_records.csv");
    auto t0 = steady_clock::now();
    createData2(fin,matrix);
    auto t1 = steady_clock::now();
    auto d = duration_cast<duration<double>>(t1-t0);
    cout << d.count() << " seconds" << endl;
  }
  {
    vector<Trades> matrix;
    ifstream fin("5m_records.csv");
    auto t0 = steady_clock::now();
    createData3(fin,matrix);
    auto t1 = steady_clock::now();
    auto d = duration_cast<duration<double>>(t1-t0);
    cout << d.count() << " seconds" << endl;
  }
}


Data generated with
 
perl -e 'print "5,21,2019 17,30,21,,,1000.0,1100.0,900.0,1000.0,1000000\n" x (1024*1024*5);' > 5m_records.csv


Ubuntu/G++/Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz/16GB Memory

$ g++ -std=c++11 bar.cpp
$ ./a.out 
5242880
0.265614 seconds
5242880
6.4865 seconds
5242880
7.76307 seconds
$ g++ -O2 -std=c++11 bar.cpp
$ ./a.out 
5242880
0.197165 seconds
5242880
6.24965 seconds
5242880
7.24447 seconds


Windows/MSVC/Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz/8GB Memory

$ cl /EHsc bar.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24234.1 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

bar.cpp
Microsoft (R) Incremental Linker Version 14.00.24234.1
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:bar.exe
bar.obj

$ bar.exe
5242880
38.5177 seconds
5242880
91.8235 seconds
5242880
107.3821 seconds

$ cl /EHsc /O2 bar.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24234.1 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

bar.cpp
Microsoft (R) Incremental Linker Version 14.00.24234.1
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:bar.exe
bar.obj

$ bar.exe
5242880
3.56811 seconds
5242880
41.9356 seconds
5242880
43.8617 seconds


There seems to be something going on in the Microsoft stream handling.
The delta-t from parsing the input to also pushing it into the vector is fairly minimal.

Optimisation helps, but is still nowhere near the Ubuntu tests.
So with this, am I to assume that there's something I can do with the system/code? Or is Linux the option I should take to stream this data?
@dhayden

What would be a larger stream buffer that I could use?
is it the OS or the Compiler? What happens if you do that with a different compiler, but on windows? I can't perl on this system :(
What would be a larger stream buffer that I could use?

You can do it like this, but it made very little difference for me. I tried with 1MB and 10MB buffers:
1
2
3
4
5
char buf[1024*1024];
...
  ifstream fin;
  fin.rdbuf()->pubsetbuf(buf, sizeof(buf));  // set the buffer before opening the stream
  fin.open("5m_records.csv");


I think the bottom line is that the C++ extraction operators are just slow. Look at the definition in the reference section and you'll find that it must do a surprising amount of stuff.

Using fscanf() on my system ran 3x faster. Here is a modified version of salem c's test program:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
#include <iostream>
#include <fstream>
#include <iomanip>
#include <string>
#include <vector>
#include <chrono>
#include <cstdio>
using namespace std;

char buf[1024*1024*10];

struct Trades {
  unsigned short int month, day, year, hour, min, seconds;
  double open, high, low, close;
  unsigned int volume;
  string name;
  string date;
  string time;
  string operation;
};

void createData1 (ifstream &dataFile, vector<Trades> &matrix)
{
  if (dataFile.is_open())
  {
    string line;
    int linecount = 0;
    // Minimum time it takes to just read the file
    while ( getline(dataFile,line) ) {
      ++linecount;
    }
    cout << linecount << endl;
  }
}

void createDataC()
{
  unsigned short int month, day, year, hour, min, seconds;
  double open, high, low, close;
  unsigned int volume;
  int linecount = 0;
  FILE *fp = fopen("5m_records.csv", "r");
  while (fscanf(fp, "%hu%*c%hu%*c%hu%*c%hu%*c%hu%*c%hu%*c%*c%*c%lf%*c%lf%*c%lf%*c%lf%*c%u",
	       &month, &day, &year, &hour, &min, &seconds,
		&open, &high, &low, &close,
		&volume) == 11) {
      ++linecount;
  }
  cout << linecount << '\n';;
}

void createData2 (ifstream &dataFile, vector<Trades> &matrix)
{
  unsigned short int month, day, year, hour, min, seconds;
  double open, high, low, close;
  unsigned int volume;
  char delim;

  if (dataFile.is_open())
  {
    int linecount = 0;
    // delta-time it takes to parse the input
    while (dataFile >> month >> delim >> day >> delim >> year >> hour >> delim >> min >> delim >> seconds >>
            delim >> delim >> delim >> open >> delim >> high >>
            delim >> low >> delim >> close >> delim >> volume)
    {
      ++linecount;
    }
    cout << linecount << endl;
  }
}

void createData3 (ifstream &dataFile, vector<Trades> &matrix)
{
  unsigned short int month, day, year, hour, min, seconds;
  double open, high, low, close;
  unsigned int volume;
  char delim;

  if (dataFile.is_open())
  {
    while (dataFile >> month >> delim >> day >> delim >> year >> hour >> delim >> min >> delim >> seconds >>
            delim >> delim >> delim >> open >> delim >> high >>
            delim >> low >> delim >> close >> delim >> volume)
    {
      matrix.push_back({month, day, year, hour, min, seconds, open, high, low, close, volume});
    }
    cout << matrix.size() << endl;
  }
}

int main()
{
  using namespace std::chrono;
  ifstream fin;
  fin.rdbuf()->pubsetbuf(buf, sizeof(buf));
  fin.open("5m_records.csv");
  {
    vector<Trades> matrix;
    fin.seekg(0);
    auto t0 = steady_clock::now();
    createData1(fin,matrix);
    auto t1 = steady_clock::now();
    auto d = duration_cast<duration<double>>(t1-t0);
    cout << d.count() << " seconds" << endl;
  }
  {
    vector<Trades> matrix;
    cout << "Reading with C library\n";
    auto t0 = steady_clock::now();
    createDataC();
    auto t1 = steady_clock::now();
    auto d = duration_cast<duration<double>>(t1-t0);
    cout << d.count() << " seconds" << endl;
  }
  {
    vector<Trades> matrix;
    auto t0 = steady_clock::now();
    fin.clear();
    fin.seekg(0);
    createData2(fin,matrix);
    auto t1 = steady_clock::now();
    auto d = duration_cast<duration<double>>(t1-t0);
    cout << d.count() << " seconds" << endl;
  }
  {
    vector<Trades> matrix;
    fin.clear();
    fin.seekg(0);
    auto t0 = steady_clock::now();
    createData3(fin,matrix);
    auto t1 = steady_clock::now();
    auto d = duration_cast<duration<double>>(t1-t0);
    cout << d.count() << " seconds" << endl;
  }
}

5242880
0.371693 seconds
Reading with C library
5242880
5.38509 seconds
5242880
18.435 seconds
5242880
18.5083 seconds

@lumbeezl, Sorry I didn't notice this earlier.

File I/O is one of Windows Achille's Heels. @dhayden is correct on this point in the previous post about this.

Put simply, the fastest I/O on disk in Windows is to memory map the file. There is no faster way on Windows. Google gives the approach, and I think boost has a library you might want to look into.

There are vast differences of implementation of ext4 (and ext3 and ext2) compared to NTFS that make these files systems generally better, faster, safe, more efficient, but that's not really what's killing performance. It's Windows itself, and the way libraries use Windows for file I/O (not really relative to the format of the filesystem).

When you map the file, the file appears as a huge block of RAM. Your file I/O, at that point, looks like sequencing through a block of RAM. The OS satisfies the need for more data through page faults, which sounds like it should be slow (requesting a pointer on memory not backed by data read thus far, generating a fault the OS handles by filling the data from disk, then allowing the faulted instruction to resume). It is, however, fast. It is fundamental to the OS. You may never have realized it, but it is fundamental to the execution of a program from disk.

When software begins executing (and I oversimplify for brevity here), in essence the program begins to execute before it's even loaded from disk. This sounds counter intuitive, maybe even impossible. However, the attempt to execute an instruction from memory that has not been backed by data read from disk generates a page fault, which is satisfied by the virtual memory system (as it does for memory mapped files), which then fills a page with executable code from storage, and the program is allowed to continue. Eventually (as in microseconds) execution will again hit a location in memory that was not yet loaded from disk, issuing another page fault, causing another vm reaction to fill that page, and execution continues. Doing this means that the code you never actually execute is not loaded into RAM (on page boundaries), making it space efficient. Overall, this is faster than waiting for code to load in RAM (especially code that may never be executed), and THEN starting to execute.

You have touched on the main point, though. Increasing the buffer size tends to improve the efficiency of the underlying methods used for 'standard' file I/O in C and C++, but memory mapped files are what you really want. It can reach performance levels on par with Linux, but Linux (and UNIX) are just generally better at this.
Last edited on
I don't think memory mapping the file will help much.
The problem isn't moving data off disk, as shown by the raw copy command results.

I also added @dhayden's code to mine, to compare fgets() with getline() and sscanf with the extractors.

$ cl /EHsc  bar.cpp
$ bar.exe
C fgets, 5242880 records, 2.00438 seconds
C fgets+sscanf, 5242880 records, 13.4581 seconds
C++ getline, 5242880 records, 40.5704 seconds
C++ extractors, 5242880 records, 92.5081 seconds

$ cl /EHsc /O2 bar.cpp
$ bar.exe
C fgets, 5242880 records, 2.00834 seconds
C fgets+sscanf, 5242880 records, 13.5875 seconds
C++ getline, 5242880 records, 3.56614 seconds
C++ extractors, 5242880 records, 41.97 seconds

Getting the file into the lower levels of the code isn't the problem. fgets() can munch millions of records a second no problem. The elephant in the room would seem to be the stream extractors.

Not sure what's happening with the optimised C++ getline though. It seems a lot quicker than it was yesterday - weird.
@salem c

The problem isn't moving data off disk, as shown by the raw copy command results.



Think about that for a second in comparison to what I wrote, and realize that the copy of the operating system is using the OS base method, memory mapped operation.

As I pointed out, nearly everything you do with files on Windows is ultimately implemented as memory mapped file operations. File buffering, like that of fwrite and fread for example, is something implemented on top of that, as are the other interfaces mentioned (from fgets to getline). Stream operators are implemented, as you see, particularly inefficiently here.

What I'm trying to say is we're agreeing on the basis more than your post indicates - the file system isn't the problem, and Windows' ability to move data onto and off the disk isn't the problem, nor is NTFS vs ext4 (though there are other issues with NTFS that make it very slow, as in large drives with small allocation blocks without much disk cache configured).

The real problem(s) are when something is between what the OS does under the hood to move data (which works well enough) and what is being written in C or C++ which interfaces to it.

If you haven't tried memory mapped file operations, you're missing out. Ultimately it ends up fairly easy, can be written portabley if a library is used to abstract the specifics of mapping (the concept subsequently in nearly identical on Linux vs Windows), and it eliminates all 'middle-ware' that one may have issues with like fgets/getline/etc. When you trade those for methods of handling buffers, file I/O can be very straight forward this way. Testing the results would show it works fast.
Last edited on
Maybe so, but the elephant in the room is the god-awful slowness of parsing the data once it's in memory.

Shaving a few uS off the time by memory mapping the file directly ain't gonna change that.
Topic archived. No new replies allowed.