C++: Reading and Sorting Binary Files

Pages: 123
With the new code currently written as is would it follow the 19 byte format that seems to be the problem? Because there's a missing byte which you would maybe have to pad in the new output file to make it 20?

If you leave the output at 19 bytes will there be a problem?
All you know for sure is that the file you were given has 19 bytes for the name. But how you store the data inside the program, and how you create the output is a matter for you to decide.

If you leave the output at 19 bytes will there be a problem?
You may lose marks for producing an incorrect output. (that could apply whichever choice you make).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>
#include <Windows.h>
using namespace std;


int main() 
{

	string filename[MAX_PATH + 1];

	int n = 0;
	const int nameLength = 19;
    const int maxRecords = 100;

    int scores[maxRecords];
    char names[maxRecords][nameLength];

    int count = 0;

	cout << "Enter binary data directory and file name: ";
	cin.getline(filename, MAX_PATH);
// Open file for binary read-write access.
	    ifstream fin(filename.c_str(), ios::binary);

	if (!fin) {
	cout << "Could not open " << filename << endl;
	system("PAUSE");
	return -1;


// Read data from the file.
fin.read(names[count], sizeof(names[0]))
         && fin.read((char *) &scores[count], sizeof(scores[0]));
{
        count++;
        fin.ignore(); // skip the padding byte
	}
// Display the data and close.
for (int i=0; i<count; i++)
    {
        cout << "Name: " << names[i] << " Score: " << scores[i] << endl;
    }
fin.close();
system("PAUSE");
return 0;
}


I'm trying to alter it around but it won't compile and I have no idea why. I tried to use your file input method but I tried to replace it with a way for the user to specify the file path of the binary file.
hi, looks like there's been so much work while i was asleep.
anyway, in this code, you are trying to open a file in binary, but didn't specify it to be opened for input operations as well.
line 26 of the last code should be:
ifstream fin(filename.c_str(), ios::binary | ios::in );

about the length of the record, i noticed that the records are separated with 4 Null bytes, except for the last record, after it there's only 3 null bytes.
if you add an additional null, or delete the 3 nulls you will have a file length divisible by 4.

the name is 19 bytes, and the grade is 1 byte, that means the whole record (name + grade) is 20 bytes long, maybe i'm wrong on this, but definitely the professor didn't make the question clear enough.
@Rechard3
Your input is very welcome, so though I may disagree, please don't be offended.
ifstream is an input stream, just as ofstream is an output stream. At any rate the code I posted was tested on two different compilers.

When it comes to the binary representation of an integer, there are a couple of points to consider, both related to the platform on which the code is executed. One is that the number of bytes may vary, I tend to assume 4 bytes for an int because I use a 32-bit processor, but this value may differ on say a 16-bit or 64-bit processor. The other consideration is the order of the individual bytes, the endian-ness. Commonly using Intel processors, the lowest-order byte is placed first, known as little-endian. The alternative is to place the highest-order byte first.
http://en.wikipedia.org/wiki/Endianness

Mostly we don't need to worry about this, if the same machine is used for both reading and writing the file. But when passing files across systems, for example digital photographs, the issue does need to be considered.

consider this code:
1
2
3
    ofstream fout ("binary_integer.dat", ios::binary);
    int n = 0x12345678; // decimal 305419896
    fout.write((char *) &n, sizeof(n));

On my machine, the file contains the following hex values: 78 56 34 12
Similarly, n = 0xff11; becomes 11 FF 00 00

Anyway, my point is, I don't think the file in the original question has a one-byte integer followed by three of four bytes of padding, rather my interpretation is a four-byte integer followed by one or zero bytes of padding.
Last edited on
@animus You are mixing up c-strings and c++ std::string.

this is an array of strings, but you only need one string.
string filename[MAX_PATH + 1];
change it to:
string filename;

This is the syntax for a c-string:
cin.getline(filename, MAX_PATH);
when using std::string change it to:
getline(cin, filename);

There is also a closing brace } missing after the if statement at line 28
if (!fin) {


One more thing, you lost the while loop at line 35.

change this:
34
35
36
// Read data from the file.
fin.read(names[count], sizeof(names[0]))
         && fin.read((char *) &scores[count], sizeof(scores[0]));

to this:
34
35
36
37
38
// Read data from the file.
	while (
fin.read(names[count], sizeof(names[0]))
         && fin.read((char *) &scores[count], sizeof(scores[0]))
         )
Last edited on
Thanks for the help, man I can't tell you how much you've helped out... But I'm now stuck on the next steps of the average function as well as the sorting.

Sorry for this mess of a code:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>
#include <Windows.h>
#include <iomanip>
using namespace std;


int average (int n);

int main() 
{

	string filename;

	int i,j;
	int n = 0;
	int sum = 0;
	int size = 4;
	const int nameLength = 19;
    const int maxRecords = 100;
	int index[100];


    int scores[maxRecords];
    char names[maxRecords][nameLength];

    int count = 0;

	cout << "Enter binary data directory and file name: ";
	getline (cin, filename);
// Open file for binary read-write access.
	    ifstream fin(filename.c_str(), ios::binary);

	if (!fin) {
	cout << "Could not open " << filename << endl;
	system("PAUSE");
	return -1;
	}

// Read data from the file.
	while (
fin.read(names[count], sizeof(names[0]))
         && fin.read((char *) &scores[count], sizeof(scores[0]))
	)
	{

        count++;
        fin.ignore(); // skip the padding byte
	}

	
// Display the data and close.
	 cout << "Your file's data unsorted: " << endl;
    cout << endl;
        cout << setw(10) << "Name" << setw(20) <<  "Test Score" << endl;

        for (i=0;i<n;i++){
                cout << setw(10) << names[i] << setw(20) << scores[i] << endl;
        }

                for (i=0;i<n;i++)
        {
                index[i]=i;
        }

        for (i=0;i<n;i++)
        {

                for (j=i+1;j<n;j++)
                {
                        int temp;
                        if (scores[index[i]] > scores[index[j]])
                        {
                                temp = index[i];
                                index[i] = index[j];
                                index[j] = temp;
                        }
                }
        }

		cout << "The average of the test scores in your file is:  " << average (sum);

sum=sum+scores[i];


fin.close();
system("PAUSE");
return 0;
}

int average (int sum, int size)
{ 
	return sum/size;
}


I can't compile this, halp.
Last edited on
I keep getting this error when I try to compile and I have no idea why.

fatal error LNK1120: 1 unresolved externals

edit: I fixed it, but now the output just shows zero and none of the data.

The output looks like this

C:\Users\Jacky\Desktop\project6.dat Your file's data unsorted: Name Test Score The average of the test scores in your file is: 0Press any key to continue . . .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>
#include <Windows.h>
#include <iomanip>
using namespace std;


int average (int n,int size);

int main() 
{

	
	string filename;
	int i,j;
	int n = 0;
	int sum = 0;
	int size = 4;
	const int nameLength = 19;
    const int maxRecords = 100;
	int index[100];
	int scores[maxRecords];
    char names[maxRecords][nameLength];
	int count = 0;


	cout << "Enter binary data directory and file name: ";
	getline (cin, filename);
// Open file for binary read-write access.
	    ifstream fin(filename.c_str(), ios::binary);

	if (!fin) {
	cout << "Could not open " << filename << endl;
	system("PAUSE");
	return -1;
	}

// Read data from the file.
	while (
fin.read(names[count], sizeof(names[0]))
         && fin.read((char *) &scores[count], sizeof(scores[0]))
	)
	{

        count++;
        fin.ignore(); // skip the padding byte
	}

	
// Display the data and close.
	 cout << "Your file's data unsorted: " << endl;
    cout << endl;
        cout << setw(10) << "Name" << setw(20) <<  "Test Score" << endl;

        for (i=0;i<n;i++){
                cout << setw(10) << names[i] << setw(20) << scores[i] << endl;
        }

                for (i=0;i<n;i++)
        {
                index[i]=i;
        }

        for (i=0;i<n;i++)
        {

                for (j=i+1;j<n;j++)
                {
                        int temp;
                        if (scores[index[i]] > scores[index[j]])
                        {
                                temp = index[i];
                                index[i] = index[j];
                                index[j] = temp;
                        }
                }
        }

		cout << "The average of the test scores in your file is:  " << average (sum,size);

sum=sum+scores[i];


fin.close();
system("PAUSE");
return 0;
}

int average (int sum, int size)
{ 
	return sum/size;
}
Last edited on
Sorry for the triple post. Anyways here's the new code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>
using namespace std;


int average (int sum,int size);



int main() 
{

	
	string filename;
	int i,j;
	int n = 0;
	int sum = 0;
	int size = 4;
	const int nameLength = 19;
    const int maxRecords = 100;
	int index[100];
	int scores[maxRecords];
    char names[maxRecords][nameLength];
	int count = 0;


	cout << "Enter binary data directory and file name: ";
	getline (cin, filename);
// Open file for binary read-write access.
	    ifstream fin(filename.c_str(), ios::binary);

	if (!fin) {
	cout << "Could not open " << filename << endl;
	system("PAUSE");
	return -1;
	}

// Read data from the file.
	while (
fin.read(names[count], sizeof(names[0]))
         && fin.read((char *) &scores[count], sizeof(scores[0]))
	)
	{

        count++;
        fin.ignore(); // skip the padding byte
	}

	
// Display the data and close.
	 cout << "Your file's data unsorted: " << endl;
    cout << endl;
        cout << setw(10) << "Name" << setw(20) <<  "Test Score" << endl;

        for (i=0;i<5;i++){
                cout << setw(10) << names[i] << setw(20) << scores[i] << endl;
        }

                for (i=0;i<5;i++)
        {
                index[i]=i;
        }

        for (i=0;i<5;i++)
        {

                for (j=i+1;j<5;j++)
                {
                        int temp;
                        if (scores[index[i]] > scores[index[j]])
                        {
                                temp = index[i];
                                index[i] = index[j];
                                index[j] = temp;
                        }
                }
        }

		sum=sum+scores[i];
		cout << "The average of the test scores in your file is:  " << average (sum,size) << endl;

		
	
		cout << "Your sorted binary file looks like this:  " << endl;
		cout << setw(10) << "Name" << setw(20) <<  "Test Score" << endl;
		for (i=0;i<5;i++)
        {
                cout << setw(10) << names[index[i]] << setw(20) << scores[index[i]] << endl;
        }


		
		ofstream fout ("project6-edited.dat", ios::binary);
        fout.write ((char*)names[index[i]], sizeof(20) );
		fout.write ((char*)scores[index[i]], sizeof(int) );
		
		
		fin.close();
system("PAUSE");
return 0;
}

int average (int sum, int size)
{ 
	return sum/size;
}





Only problem now is that the output looks like this:

Enter binary data directory and file name: c:\users\jacky\desktop/project6.dat
Your file's data unsorted:

Name Test Score
A.Smith 89
T. Phillip 95
S. Long 76
J. White 100
╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠Y
-858993460
The average of the test scores in your file is: -214748365
Your sorted binary file looks like this:
Name Test Score
╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠Y
-858993460
S. Long 76
A.Smith 89
T. Phillip 95
J. White 100
Press any key to continue . . .

There's actually a ton more of those weird symbols but I couldn't post them and my average isn't working either.
Last edited on
@Chervil:
hey man, i can't find any offence in an honest advice, thank you very much, you actually cleared some important concepts about the little indian thing.

so in a few words: no offence taken.
Hi, I guess we're in different timezones, hence the delayed reply.

There are several cases where you aren't making use of the information which is already there.

A minor point, take a look at these two lines:
1
2
    const int maxRecords = 100;
    int index[100];

Notice the value 100 is stated on both lines. Now what happens if you need to change the 100 to different number? You need to remember to change it in both places.

It's a better design to do this:
1
2
    const int maxRecords = 100;
    int index[maxRecords];

Now, if you change the value 100 to some other number, all of the arrays will adjust to the new size and keep in step with one another.

Later on, there is another, more important example of using a "magic number" typed in multiple places throughout the code.
Take a look at this code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
    for (i=0;i<5;i++)
    {
        cout << setw(10) << names[i] << setw(20) << scores[i] << endl;
    }

    for (i=0;i<5;i++)
    {
        index[i]=i;
    }

    for (i=0;i<5;i++)
    {
        for (j=i+1;j<5;j++)
        {


See how the value '5' occurs in multiple different places? What happens if you need to change it? You would need to change the code in multiple places - always a risky procedure, for two reasons:
• You might miss an occurrence and forget to change it.
• The value '5' might be in the code for some other reason, for example it might specify the width of a column, or the number of decimal places etc. Thus there is a risk of accidentally changing some unrelated figure.
The solution - use a named constant or variable and type its name in each place where it is required.

Now as it happens, 5 is the wrong figure, that explains why the output contains all sorts of strange characters. It also happens that the required value is already held in a variable named count. This variable was used to count how many records were read from the input file, hence the name. So replace the figure 5 with the variable count in each case:
1
2
    for (i=0; i<count; i++)
    {
etc. etc.

Now, the calculation of the average. There are two problems here.
This line is standing alone
 
sum = sum + scores[i];

It should be inside a loop, in order to accumulate the total for all the records. Also, since it isn't inside the loop, what is the value of i ? It is whatever value caused the previous for-loop to terminate. in other words it is the index of an array element just past the last record.

Secondly, the function average is called like this: average (sum,size)
What is size? it is initialised with the value 4. What happens if the file contains a different number of records? If only we had a variable somewhere which contained a count of the number of records which were actually read from the file... see where this is going? Get rid of size and replace it with count.

And lastly we come to the output file. There are a couple of issues.
1
2
3
    ofstream fout ("project6-edited.dat", ios::binary);
    fout.write ((char*)names[index[i]], sizeof(20) );
    fout.write ((char*)scores[index[i]], sizeof(int) );


One is this: sizeof(20), 20 is an integer, so the expression is the same as sizeof(int) which is not what is required. What you need here is the length of the char strings, which you could find using either sizeof(names[0]) or directly as it is specified in the constant nameLength

This line needs to pass the address of the variable to be output:
1
2
3
fout.write ((char*)scores[index[i]], sizeof(int) );
// cange it to:
fout.write ((char*) &scores[index[i]], sizeof(int) ); // notice & operator 

and of course the write() statements need to be inside a loop, in order to output all of the records.


Last edited on
Hey, man! Thanks a lot it's really helped a ton. Anyways there's just one last problem, and that's the newly written binary file hardly contains anything written.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>
using namespace std;


int average (int sum,int size);



int main() 
{

	
	string filename;
	int i,j;
	int n = 0;
	int sum = 0;
	int size = 4;
	const int nameLength = 19;
    const int maxRecords = 100;
	int index[maxRecords];
	int scores[maxRecords];
    char names[maxRecords][nameLength];
	int count = 0;


	cout << "Enter binary data directory and file name: ";
	getline (cin, filename);
// Open file for binary read-write access.
	    ifstream fin(filename.c_str(), ios::binary);

	if (!fin) {
	cout << "Could not open " << filename << endl;
	system("PAUSE");
	return -1;
	}

// Read data from the file.
	while (
fin.read(names[count], sizeof(names[0]))
         && fin.read((char *) &scores[count], sizeof(scores[0]))
	)
	{

        count++;
        fin.ignore(); // skip the padding byte
	}

	
// Display the data and close.
	 cout << "Your file's data unsorted: " << endl;
    cout << endl;
        cout << setw(10) << "Name" << setw(20) <<  "Test Score" << endl;

        for (i=0;i<count;i++){
                cout << setw(10) << names[i] << setw(20) << scores[i] << endl;
        }

                for (i=0;i<count;i++)
        {
                index[i]=i;
        }

        for (i=0;i<count;i++)
        {

                for (j=i+1;j<count;j++)
                {
                        int temp;
                        if (scores[index[i]] > scores[index[j]])
                        {
                                temp = index[i];
                                index[i] = index[j];
                                index[j] = temp;
                        }
                }
        }

		for (i=0;i<count;i++)
		{
		sum=sum+scores[i];
		}
		
		cout << "The average of the test scores in your file is:  " << average (sum,count) << endl << endl;

		
	
		cout << "Your sorted binary file looks like this:  " << endl;
		cout << setw(10) << "Name" << setw(20) <<  "Test Score" << endl;
		for (i=0;i<count;i++)
        {
                cout << setw(10) << names[index[i]] << setw(20) << scores[index[i]] << endl;
        }


		for (i=0;i<count;i++)
		{
		ofstream fout ("project6-edited.dat", ios::binary);
        fout.write ((char*) &scores[index[i]], sizeof(int) );
		fout.write ((char*) &names[index[i]], sizeof(nameLength) );
		}
		
		fin.close();
system("PAUSE");
return 0;
}

int average (int sum, int count)
{ 
	return sum/count;
}


I'm guessing I wrote the write statements wrong.
Line 101 ofstream fout ("project6-edited.dat", ios::binary);
should not be inside the loop. As it is the file is opened and closed on each pass through the loop.
Thanks, I changed it so the code is like this

1
2
3
4
5
6
7
ofstream fout ("project6-edited.dat", ios::binary);
		for (i=0;i<count;i++)
		{
		
        fout.write ((char*) &scores[index[i]], sizeof(int) );
		fout.write ((char*) &names[index[i]], sizeof(nameLength) );
		}



But the new binary file just has this: L S. LY A.Sm_ T. Pd J. W

Do I need to make another loop separate for the scores and names index array?
That a step forward, but there are several errors in your write statements.
http://www.cplusplus.com/reference/ostream/ostream/write/
The function write() takes two parameters.
s Pointer to an array of at least n characters.
n Number of characters to insert.

If you want to output a string, like this:
char name[10] = "Pineapple";
then the code could be either
 
    fout.write(name, 10);

or
 
    fout.write(name, sizeof(name));


If you want to output an integer, because it isn't already a pointer to an array of characters, you have to do two things:
• use the & operator to get the address of the object
• That gives a pointer to an int, it needs to be cast into a pointer to an array of characters, using either (char *) or reinterpret_cast <char *>
The second version is preferred for c++ code.
1
2
3
    int n = 1234;
    fout.write((char *) &n, sizeof(n));
    fout.write(reinterpret_cast <char *>( &n), sizeof(n));


... sorry this is a long explanation. I don't think it's useful to just cut and paste the code, I think it's necessary to understand the background to why it is the way it is, in order that you can then try and write the code for yourself.


The main point here is that the errors are actually in the output of the name, which is the easier of the two cases, the integer is handled ok.
Last edited on
Hmm, ok. I'm completely stumped but I guess I'll try and somehow figure it out.
I thought you should at least consider what it is you are trying to achieve. I rather suspect that you didn't fully understand the part where the data was read in from the file either. But I accept that it's a steep learning curve and a lot to take in.

It should look something like this:
1
2
3
4
5
    for (int i=0; i<count; i++)
    {
        fout.write (names[index[i]],  nameLength );
        fout.write ((char*) &scores[index[i]], sizeof(int) );
    }


You might also want to add this line in order to make your file resemble the original input:
fout.put(0); // padding

Ideally, the program should be able to read as input and make sense of its own output, in my opinion.

You should be able to recognise the similarities between the output statements and this code which was used to read in the data:
1
2
3
4
5
6
    while ( fin.read(names[count], sizeof(names[0]))
         && fin.read((char *) &scores[count], sizeof(scores[0])) )
    {
        count++;
        fin.ignore(); // skip the padding byte
    }


Some of the differences are purely cosmetic, there is more than one way to do things. But naturally the controlling logic for reading is different to that for writing.
Last edited on
Okay thanks it works, but how come the char* as well as sizeof is not needed? I figured that the names would be a string. Is sizeof not needed because it's not a length or anything, but actually putting bytes directly?

Yeah, sorry the class I'm learning from really doesn't do all that much teaching. It uses the c++: without fear book, and since Chapter 8 is the File Storage chapter it didn't really go into much detail about reading and writing binary files.
Last edited on
I explained that previously:
Chervil wrote:
One is this: sizeof(20), 20 is an integer, so the expression is the same as sizeof(int) which is not what is required. What you need here is the length of the char strings, which you could find using either sizeof(names[0]) or directly as it is specified in the constant nameLength

http://www.cplusplus.com/forum/beginner/103593/2/#msg559143

(char *) is not required because the function write() requires a pointer to an array of characters as the first parameter. That's why it wasn't needed in the "Pineapple" example above and isn't needed for input or output of the name. (When an array is passed as a parameter to a function, it is treated as a pointer)
Last edited on
Ah, OK. I understand now. Thanks a lot. You've done a lot more for my understanding than my professor. He emphasized the idea that programmers work alone, and had a pretty barebones structure to class with the assignments requiring you massive amounts of secondary methods. Man, I have a feeling I'm going to bomb my final next week.
Pages: 123