How to skip lines in file without using a loop?

Hi
A really quick one, I have a data file and I want to start reading from line n. The thing is, it's a big file and using a loop might be slow. So is there a way to do this:

1
2
3
4
5
6
7
8
9
10
11
12
13
int main()
{
std::ifstream inFile("file.dat");
std::string line;
int n = 100;
for(int i =0;i<n;i++)
{
getline(inFile,line);
}

std::cin.get();
return 0;
}


Just without using a loop, something like a function that jumps straight to the wanted line?

Thanks! :)
Last edited on
In theory it is possible using seekg(), see:

http://www.cplusplus.com/reference/istream/istream/seekg/

That would require though that all lines have a known length.
how big is this file, really? Is it more than half your ram? Are you in control of this file, or is it provided to you? If you are in control, the above ^^^ is the way to go, pad the lines out to a fixed width and jump it. It makes the file larger in disk bytes, but it will let you process it more efficiently. You can use os-folder-compression if the disk space is an issue.

if you are not in control of the file (it is provided), then you should look at memory mapped files. Doing that, a while(x++ <linenum) getline(..) loop should not be 'that' slow (its slower than jumping around, of course).

If you need to start at N over and over many times for different N, you should set up a structure in your program that lets you jump to nth line instantly, like a vector of strings or a vector of char* offesets into the file (start of each line as a pointer).
Last edited on
Something like this might be fast enough.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <iostream>
#include <fstream>
#include <limits>

const auto StreamMax = std::numeric_limits<std::streamsize>::max();

const int LinesToSkip = 100;

int main()
{
    std::ios_base::sync_with_stdio( false ); // syncing with C's stdio can slow things down; turn it off

    std::ifstream in( "longfile.txt" );

    for( int i = 0; i < LinesToSkip; ++i )
        in.ignore( StreamMax, '\n' );

    // ...
}

Hi @coder777, @jonnin and @dutch!

I don't have control over my file (it's given), the file contains the data about the map in the game I'm making, it contains the data about the assets in the map. So it can be very small or very big. I'm making a function that loads the relevant assets from the map file and I don't want to scroll through the whole file just to load an asset at the end of the file.
I might use the loop method but I just want to check if there is any better way.

Thanks @dutch, I'll try to use your way if I can find a non-loop one.

Thanks again for the help! :D
A "non-loop" version is impossible unless your lines are all exactly the same length. If they aren't the same length then you are left counting newlines.

Test the above and let me know if it's not fast enough.

BTW, the sync_with_stdio call in my example only helps with the standard streams, so it shouldn't make a difference here (and is therefore not needed).
Last edited on
If the file is relatively unchanging, then it may be worth reading the whole file once, and record the https://www.cplusplus.com/reference/istream/istream/tellg/ position for the start of each line.

To jump to the Nth line in future, you just have to lookup the previously recorded tellg position and pass it to seekg

1
2
3
4
5
vector<streampos> lines;
ifstream inf("file");
lines.push_back(inf.tellg());
string s;
while( getline(inf,s) ) lines.push_back(inf.tellg());
I don't want to scroll through the whole file just to load an asset at the end of the file.

what does that mean? You can find the end position instantly, if you want to append to it you don't even need to find it yourself but open it for appending and c++ will set the position to the end for you, if that is what you want to do. Its reading something in the middle that is going to take either a brute force look or a pre-known offset or the like. Anything in the middle of the file has to be located the hard way.

if you were looking for specific words, instead of specific lines, there are tricks there too. I guess what we are saying is that the more you know about your file, in details, the more tweaks you can sneak into the code. The less you know, the more it will look like iterations.
Hey guys! Thank for replying!

I don't want to store the whole file somewhere in my program, sometimes it would be too big to be practical in terms of loading time (without considering the process of converting the strings to actual data) so I will stick with just reading from the file.

I think that at the end of the day I'll use the loop method, it's just easier. And if that takes 500 more milliseconds, it's isn't that bad.

Thanks for the support
> I don't have control over my file (it's given),
> the file contains the data about the map in the game I'm making,
¿so it's your game but you download the maps?

> it contains the data about the assets in the map
> I don't want to scroll through the whole file just to load an asset at the end of the file.
¿do you mean that you only care about a small portion of the map file?
¿or that you load things on-demand? ¿can't just process the whole file?


it's hard to give good answer if you don't provide any info.
You need to create a map of the file, recording the position of the information(s) you want.

Then, you only need to map out the exact location of things when the file is modified, which should hopefully only be once when you first create the mapping.

Thereafter, load the mapping, then access the file.

(Remember, in C++, open the file in binary mode in order to locate and seek to the correct location. This means that when you get a line you should strip any stray '\r' characters from the end of the line you read.)

Hope this helps.
@Odglog,
What exactly are you doing?
Why do you need to seek to this position in the file?
Does the position change?
Does it change in a way that can be calculated from the changes (short of reading the file over again from the beginning)?
I offer....
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include<iostream>
#include<iomanip>
#include<vector>
#include<string>
#include<fstream>
using namespace std;

int main()
{
  vector<streampos> assets;
  ifstream inf("foo.txt");
  streampos here = inf.tellg();
  string s;
  while( getline(inf,s) ) {
    if( s == "Asset" )
      assets.push_back(here);
    here = inf.tellg();
  }
  cout << "There are " << assets.size() << " assets" << endl;

  cout << "Which one to read?";
  int which;
  cin >> which;

  inf.clear();
  inf.seekg(assets[which]);
  while( getline(inf,s) ) {
    cout << s << endl;
    if( s == "End" ) break;
  }

  return 0;
}


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Asset
blah
h
blahvvvvvvvvvv
blah
End
Asset
h
ah
lah
blah
End
Asset
blah
End
Asset
blahbblah
End
Asset
blah
h
h
blah
End


$ ./a.out 
There are 5 assets
Which one to read?2
Asset
blah
End
$ ./a.out 
There are 5 assets
Which one to read?4
Asset
blah
h
h
blah
End


If it really is a monstrously long file, then you could even just do the indexing once and then save the assets vector to another file (say "index.txt").
Ok, sorry for not providing enough information.

>so it's your game but you download the maps?

Yes. the game itself takes the map file and converts it to actual gameplay. The map files aren't provided by the game, they will be made in another application I make (the level builder).

>do you mean that you only care about a small portion of the map file?

Yes, my maps might end up being big so I divide the map to chunks, and that way I will be able to load only the relevant part of the map (the area of the player, for example), every time I want to create a chunk in my file I just add "#Chunk x y" and the program will load all the assets from the file until the next chunk.
I also have an array that contains all the positions of the chunks in the file.

Whenever I create a map, I get all chunks positions and initialize the positions array.

>Does the position change?
Yes, the position I want to jump to changes depending on the position of the chunk I want to load, if I want to load the chunk (0,1), I will go to the index [1*MapLen+0] of the positions array and get the position (in lines) of the chunk (0,1) in the map file.


> I don't want to scroll through the whole file just to load an asset at the end of the file.
Ok, I think I didn't explain myself right, what I mean is that when I want to load a chunk at line 1000 in a 1533 lines long file just with scrolling 1000 lines from the start, it seems quite inefficient.

I hope that helps, if you need more information tell me.
> The map files aren't provided by the game,
> they will be made in another application I make
if you make the map builder, you do have control on the map file format, so you can make all the chunks the same size

> I also have an array that contains all the positions of the chunks in the file.
so it should be as simple as inFile.seekg(index(x,y));
just store the jump in bytes, instead of in lines
Sorry for missing for a long time
If I can control the line length and use seekg() I don't know how to store the locations of the "#Chunk"'s by bytes. This is how I store them currently:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
	std::vector <int> v;
	int INUM = 0, x, y;// INUM is the number of the line, x and y are the in-world position of the chunk

	while (getline(MapFile, line))
	{
		INUM++;
		if (line.find("#Chunk") != std::string::npos)
		{

			v= extractIntegerWords(line);// Function that returns an int vector out of strings
			x = v[0];
			y = v[1];
			ChunksIDs[y * Lx + x] = INUM;
			std::cout << x << ' ' << y << '\n' << ChunksIDs[y * Lx + x] << '\n';
		}
	}



I tried using tellg() but it returned -1 so that doesn't really help, any suggestions?

Thanks!
you don't need to store anything for the fixed width 'line' technique.

if every line is 100 bytes, the 5th line is seekg(500) (0-99, 100-199, 200-299, 300-399, 400-499, 500-599 < this is the chunk you want). This is critical: end of line is bytes. you are in binary file mode now, not text, and you have to account for those, and end of line can be 2 bytes or 1, depends on OS / how it was created. It would be better if you did not put end of lines in this file at all, and move forward using it as a binary file instead.

how to do it?
if you have "hello world" in a string, you need to pad after the d in world out to the size you want. so you need the length of your string, then write to the file the difference … all zero bytes will work.
how do to do that? make an array/vector of bytes that represents an empty record (lets call them records, not lines, now!). So that would be char mt[100] = {0};
now after writing hello world, you can write it: file.write(mt, 100-hw.length()) (if I did that right, check the math, check the file in a hex editor as well).

note that we are talking 1 byte characters at the moment. If you need to use something else here, you need the sizes in bytes when you write().

make a main program. open a file in binary mode and attempt to do what I just said, put 10 or so lines in it. Then reopen to read in binary and get the first, last, and a couple of random central lines using seek. Get that working, understand it, and apply it back into your real programs once you 'get' it. Its not that hard, but its going to take 30 min of hands-on to get a feel for it the first time.
Last edited on
Ok, I'll use seekg(), thanks so much everyone for the help! :)
Topic archived. No new replies allowed.