Sorting a "big" file

Hi,

I have a very big excel sheet (which I will be converting into .txt mode later on by copy-paste :D ) about name of movie, its male lead, its female lead, and the year of release.

Now I want my program to display the list according to user's choice (like if he wishes to see all the movies released in 2014 she/he shall get so) but the problem is that the file is too big and I have no idea how/what to do.

If the list would have been small I would have used a simple struct/class and saved the details... but the file is too big to manually add struct/class

Any ideas?
Last edited on
Are you sure it's too big? Text files are small and memories are large. If you have 10k of data on each movie (and that's a lot) and you have details on 10,000 movies, that's still only 100MB.

But perhaps a better questions is "why not use a database?"
OP wrote:
(which I will be converting into .txt mode later on by copy-paste :D )

*Facepalm* File -> Save As: Near the bottom of the save dialog box, under the text field where you fill in the file name will be a drop down box. Select either .tab or .csv for a plain text format.

+1 dhayden.

@ OP: The file is large now because Excel is an archival file format, enjoy the irony on that one. Once you have the data as plain text it will be much smaller.
When I said/wrote big excel sheet I meant it had huge database (i.e 1000 movies) I didn't meant about its size in mbs.... sorry for the confusion......

the problem is that I cant create a class for such big(by big I mean 1000 movies, 1000 actors, 1000 actress & 1000 years) file manually as I have a text file which goes somewhat like this

Turtwig	55	68	64	45	55	31	318	Grass	
Grotle	75	89	85	55	65	36	405	Grass	
Torterra	95	109	105	75	85	56	525	Grass	Ground
Chimchar	44	58	44	58	44	61	309	Fire	
Monferno	64	78	52	78	52	81	405	Fire	Fighting
Infernape	76	104	71	104	71	108	534	Fire	Fighting
Piplup	53	51	53	61	56	40	314	Water	
Prinplup	64	66	68	81	76	50	405	Water	
Empoleon	84	86	88	111	101	60	530	Water	Steel
Starly	40	55	30	30	30	60	245	Normal	Flying
Staravia	55	75	50	40	40	80	340	Normal	Flying
Staraptor	85	120	70	50	50	100	475	Normal	Flying
Bidoof	59	45	40	35	40	31	250	Normal	
Bibarel	79	85	60	60	71	55	410	Normal	Water
Kricketot	37	25	41	25	41	25	194	Bug	
Kricketune	77	85	51	55	51	65	384	Bug	
Shinx	45	65	34	40	34	45	263	Electric	
Luxio	60	85	49	60	49	60	363	Electric	
Luxray	80	120	79	95	79	70	523	Electric	
Budew	40	30	35	50	70	55	280	Grass	Poison
Roserade	60	70	55	125	105	90	505	Grass	Poison


ps- I know these are pokemon and not movie databas but I hope you get what I am trying to say

So I have to make a program so that I can sort these details.....

EDIT- I am not using any database as its a school project based entirely on c++ :(
Last edited on
the problem is that I cant create a class for such big(by big I mean 1000 movies, 1000 actors, 1000 actress & 1000 years) file manually as I have a text file which goes somewhat like this


OK, I'm lost. That's not one giant class OP, that's four separate classes and one more to coallate them. Even if it was one giant class, why would the size be a problem?
Last edited on
All I have learnt in school is to save data in class form and then access it by using objects....

I manually assign class and object and I know there should be another approach but I dont know it know...

here's what I would do

1
2
3
4
5
6
7
8
9
10
class movie
{
int year;
string actor;
string actress;
string title;
public:

//ctor and all that stuff
};


after this I would manually assign the details (which you can guess would take years). So what is the alternate method?

PS
Computergeek01 wrote:
OK, I'm lost

So am I :D
Last edited on
In that case, see my post above: http://www.cplusplus.com/forum/general/174995/#msg866196 and go with the '.csv' format. This delimits the columns with commas. Then check this out: http://www.cplusplus.com/doc/tutorial/files/
But how do I sort the data?

I am pretty much sure that we have to use .txt format and not .csv
Last edited on
Always remember that files extensions are arbitrary. They are just the part of the file name that Microsoft chose to use in order to decide what program a file should be opened with. The extension '.csv' is a plain text format, if you right click on the file resulting from my post above and select "Open With" -> "Notepad" you can see what I mean. You can even right click and rename the file to change it's extension if that makes you feel better, it won't make a functional difference.

As for sorting the data, you would just use a loop. That should have been one of the first things you covered in class.
Last edited on
As for sorting the data, you would just use a loop.
I suggest using std::sort(). The wrong sorting algorithm will make it all run too slow.

OP, this is really not that much data. Make your class and then write the code to populate it from the csv or (probably better) tab-delimited file that you create from the excel file. If possible, post a dozen rows or so of the file so we can see exactly what the data looks like.

Text is small. Code is small. Data is small. What's big is media (pictures, music and movies). You're dealing with a small amount of data.

Here's first 50 data (pokemon example I just scraped the movie idea and switched to pokemon :D )

Nat	Pokemon	HP	Atk	Def	SpA	SpD	Spe	Total	Type I
1	Bulbasaur	45	49	49	65	65	45	318	Grass
2	Ivysaur	60	62	63	80	80	60	405	Grass
3	Venusaur	80	82	83	100	100	80	525	Grass
4	Charmander	39	52	43	60	50	65	309	Fire
5	Charmeleon	58	64	58	80	65	80	405	Fire
6	Charizard	78	84	78	109	85	100	534	Fire
7	Squirtle	44	48	65	60	54	43	314	Water
8	Wartortle	59	63	80	65	80	58	405	Water
9	Blastoise	79	83	100	85	105	78	530	Water
10	Caterpie	45	30	35	20	20	45	195	Bug
11	Metapod	50	20	55	25	25	30	205	Bug
12	Butterfree	60	45	50	80	80	70	385	Bug
13	Weedle	40	35	30	20	20	50	195	Bug
14	Kakuna	45	25	50	25	25	35	205	Bug
15	Beedrill	65	80	40	40	80	75	380	Bug
16	Pidgey	40	45	40	35	35	56	251	Normal
17	Pidgeotto	63	60	55	50	50	71	349	Normal
18	Pidgeot	83	80	75	70	70	91	469	Normal
19	Rattata	30	56	35	25	35	72	253	Normal
20	Raticate	55	81	60	50	70	97	413	Normal
21	Spearow	40	60	30	31	31	70	262	Normal
22	Fearow	65	90	65	61	61	100	442	Normal
23	Ekans	30	60	44	40	54	55	283	Poison
24	Arbok	60	85	69	65	79	80	438	Poison
25	Pikachu	35	55	30	50	40	90	300	Electric
26	Raichu	60	90	55	90	80	100	475	Electric
27	Sandshrew	50	75	85	20	30	40	300	Ground
28	Sandslash	75	100	110	45	55	65	450	Ground
29	Nidoran ♀	55	47	52	40	40	41	275	Poison
30	Nidorina	70	62	67	55	55	56	365	Poison
31	Nidoqueen	90	82	87	75	85	76	495	Poison
32	Nidoran ♂	46	57	40	40	40	50	273	Poison
33	Nidorino	61	72	57	55	55	65	365	Poison
34	Nidoking	81	92	77	85	75	85	495	Poison
35	Clefairy	70	45	48	60	65	35	323	Normal
36	Clefable	95	70	73	85	90	60	473	Normal
37	Vulpix	38	41	40	50	65	65	299	Fire
38	Ninetales	73	76	75	81	100	100	505	Fire
39	Jigglypuff	115	45	20	45	25	20	270	Normal
40	Wigglytuff	140	70	45	75	50	45	425	Normal
41	Zubat	40	45	35	30	40	55	245	Poison
42	Golbat	75	80	70	65	75	90	455	Poison
43	Oddish	45	50	55	75	65	30	320	Grass
44	Gloom	60	65	70	85	75	40	395	Grass
45	Vileplume	75	80	85	100	90	50	480	Grass
46	Paras	35	70	55	45	55	25	285	Bug
47	Parasect	60	95	80	60	80	30	405	Bug
48	Venonat	60	55	50	40	55	45	305	Bug
49	Venomoth	70	65	60	90	75	90	450	Bug
50	Diglett	10	55	25	35	45	95	265	Ground
Okay, make a class and write code to read a member from a stream. If you think you might need to write the file then create code to write to a stream also.

Write code to read read members from a file and populate a vector.

You can sort the vector using any key you like. As long as you use a fast sorting algorithm (like std::sort()) you should be able to sort a million or more of these very quickly (like less than 10 seconds).
dhayden wrote:
write code to read a member from a stream


That's exactly the problem.
I cant find a way to read a member from stream...

I tried .clv method but its not working....

I cant find a delimiter so that I can read the details....
closed account (48T7M4Gy)
Sounds like you need to
1. read each line from your txt file
2. tokenise it ( ie break it down to the various data parts )
3. use each token to construct a new object
4. put the objects in a container, map, vector array, whatever
5. process the array

As already indicated above one line is one object (instance) of the class. 1000 lines = 1000 objects, not a big deal.

Actually, if you manage the serialization (reading to/from the file) you can do even better than simple tokenizing. Either way, it's not a hugely difficult proposition.

https://isocpp.org/wiki/faq/serialization
Last edited on
So here's what I tried

1
2
3
4
5
6
7
8
class pokemon_data
{
int number;
string name;
string type;
public:
getdata();
};


and the data filler function (the function which will get data)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
void pokemon_data::getdata()
{
char line[80];

ifstream fin("pok.txt", ios::in);
int count=0;
while(!fin.eof)// 1. reading each line from text file (as said by kemort)

{
//not pretty much sure what to do now but will try my best
fin.getline(line,80,' ');//I think this will definitely divide the excel table into different stuff 

//pretty much sure that I am lost now... 

}

}


What to do next? any ideas?
closed account (48T7M4Gy)
Now what you do is:
1. check that is working OK by printing out each line to make sure the file is being read
2. tokenise each line, look that up in the online help here and use the sample as a guide

http://www.cplusplus.com/reference/cstring/strtok/?kw=strtok
Last edited on
got it... thanks... :D
Topic archived. No new replies allowed.