Fast way to read content

Hello,
I am trying to make graph program that reads file and creates graph.

How could code read a long text file and store it ram so it could be accessed quickly by date and time numbers?
Or should code read long file line by line to graph straight from disk?

It would be best if data can be accessed from middle not every time from beginning to end.

I have previously made program that collects data like this to file, some files have multiple numbers per second and more than 100 000lines.

1
2
3
4
5
6
7
8
9
10
11
12
2014.07.30  17:31:10	30.2
2014.07.30  17:46:10	30.1
2014.07.30  18:01:10	29.4
2014.07.30  18:16:10	29.7
2014.07.30  18:31:10	28.7
2014.07.30  18:46:10	28.3
2014.07.30  19:01:10	27.5
2014.07.30  19:16:10	27.5
2014.07.30  19:31:10	27
2014.07.30  19:46:10	26.8
2014.07.30  20:01:10	26.2
2014.07.30  20:16:10	25.9



What sort of storing options should i consider to read it fast by finding correct time?
Last edited on
How are you graphing this data?

Are you using some sort of graphing package?
Or are you creating a simple text histogram?
Is the data always in timestamp order on the disk?

If the data on disk is in timestamp order, I see no reason to read it all into memory. I would implement a read loop with a simple timestamp filter. i.e. read a record. If the timestamp passes the filter, either create the histogram row for the timestamp or pass the timestamp and value to the graphing package. If the timestamp doesn;t pass the filter, discard the record.

If the data on disk is not in timestamp order, then you probably want to read the data in and sort it before plotting it.

If the data is in a text file, there isn't really any "find it fast" solution. You're going to be constrained by the time it takes to read the entire file.

edit: If you're really concerned about the time it takes to locate a starting point in the file, you could do a crude bisecting serach of the file. i.e. get the eof. position to eof/2. Scan for next '\n'. Read the next row. Continue bisecting the file until you find a row that is with <interval> minutes of your desired starting time.
Last edited on
Well, obviously scrolling element by element would be impossible, since that would take forever. What you could do is, at first, get the number of lines of the whole thing and set the index to being at the first line smack in the middle. From there, check if the value that you are looking for is higher or lower- depending on which, either go half-way towards the front or the end. Do that one more time, either half-way towards the middle or the ends again (or possibly more times depending on the speed you need), then parse though to find the required date. That way, you aren't searching through the entire file, but aren't doing an overly complex algorithm just to get close.
It might be more complex to do than i imagine at this point as i am beginner in c++ but i hope to draw data in lines and bars using opengl , so far i have done some tests aligning bars.

Can getline read line from stringstream to string or i should convert stringstream to string and then read line?

1
2
3
4
5
6
7
8
9
10
11
12
  
stringstream buffer;
string currentline;

       ifstream rmyfile ("C:\WWW\test.txt");
       if (rmyfile.is_open())
       {
          buffer << rmyfile.rdbuf();
         rmyfile.close();
       }
 currentline = getline( buffer, 1 ); // how to read stringstream line to string, if it is possible this way?
Last edited on
If someone knows how to read lines by line number from stringstream please or should file be read to string array?
Last edited on
You didn't answer this question from my previous post:
abstractionanon wrote:
Is the data always in timestamp order on the disk?


Line 8 is not going to work the you are thinking it does. rdbuf() returns a pointer to a streambuf. The << operator using a streambuf pointer will:
Retrieve as many characters as possible from the input sequence controlled by the stream buffer object pointed by sb (if any) and inserts them into the stream, until either the input sequence is exhausted or the function fails to insert into the stream.
http://www.cplusplus.com/reference/ostream/ostream/operator%3C%3C/

This does NOT mean that the << operator will read the entire file. Only that it transfer as many characters as the streambuf holds.

I see no reason to read the entire file into a stringstream. That is what your ifstream is for. The ifstream will continue to return data until the entire file has been read doing physical reads from the disk as needed.

q139 wrote:
Can getline read line from stringstream to string

Yes, however, your line 11 is bogus. There are two variants of getline.
1) istream::getline which reads into a character array and returns an istream reference.
http://www.cplusplus.com/reference/istream/istream/getline/
2) getline (string) which reads from an istream into a string and also returns an istream reference.
Neither returns a string. What you want is this:
1
2
 
  getline (buffer, currentline);


Since the file consists of variable length text records, there is no easy way to position into the file by line number. Since it does not appear that you have line number is the record on disk, the bisecting search approach I outlined above won't work. Now if you were to change the file format to a fixed record size, that would somewhat simplify the bisecting search I outlined by eliminating the need to scan for the start of the next record.

I would not worry about the time it takes to read 100,000 records. That's less than a 3MB file given the ~26 byte record format you've shown above.

I would simply read records from the file as follows:
1
2
3
4
5
6
7
8
9
10
 
  string date, time;
  double data; 
  while (getline(myfile,currentline))
  {  stringstream ss(currentline);  // Convert line to stringstream
      ss >> date >> time >> data; 
      if (filter(date, time)) 
      {    //  Create datapoint on graph
      }
   }

Last edited on
Thank you for taking time to answer properly.

I was hoping for code to read data based on line number.
It appears to be more complex than i imagined, google has not helped yet.
I will use this code to do further testing for drawing but i think it would be easyer to align time if all lines could be accessed by number and last line number is know.
Then implementing zoom and redraw would be easier.

Thanks again for all help, to finish this idea properly it may take very long for me.
Since you now mention that you want to support zooming, I'll change my answer regarding not storing the data in memory. I had initially assumed read once and display "on-the-fly" was all you needed to do since you hadn't said otherwise.

To be able to redraw the graph from the data from different scales and possibly different time windows, I would certainly want the data in memory.

You have a couple of choices:
1) The simplest is a std::vector of structs. The struct would contain the timestamp and the data. You can refer to the a specific record by it's record number (index to the vector). This would entail searching the entire vector for the records you want to display (unless you know the specific record numbers).
2) A std::map<timestamp, double>. This would allow efficient ordered access based on desired starting timestamp. This would be my choice.

If you're drawing graphs with timestamps along the x axis, zooming the x axis is not trivial. You need to adjust the spacing and the units of the X axis labels. i.e. One label per day? One label every hour? Every 10 minutes? etc. How many of each fit on the width of the graph?
You're going to want to store your timestamp in a form that can be easily manipulated to determine day, hour, minute, etc.

Thanks alot, maps is very useful, now all lines can be mapped by numbers and content should be accessed very fast.

I think all timestamps for bottom can be calculated by knowing start time and end time and resolution if chart data is drawn in linear steps.

Maps has given me me lots of hope to go further with current knowledge.
Last edited on
Topic archived. No new replies allowed.