C++: Reading and Sorting Binary Files

Pages: 123
I've been scratching my head and putting this homework off for a couple days but now that I hunker down to try and do it I'm coming up empty. There's 4 things I need to do.

1) Read a binary file and place that data into arrays

2) Sort the list according to the test scores from lowest to highest

3) Average the scores and output it

4) Create a new binary file with the sorted data

This is what the binary data file SHOULD look as a text file unsorted

A. Smith 89

T. Phillip 95

S. Long 76

But the .dat (the binary file) looks something like A.Smith ÌÌÌÌÌÌÌÌÌÌÌY T. Phillip ÌÌÌÌÌÌÌÌ_ S. Long ip ÌÌÌÌÌÌÌÌL J. White p ÌÌÌÌÌÌÌÌd

I can probably sort since I think I know how to use parallel arrays and index sorting to figure it out, but the reading of the binary file and placing that data into an array is confusing as hell to me as my book doesn't really explain very well.

So far this is my preliminary code which doesn't really do much:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
  #include "stdafx.h"
#include <iostream>
#include <fstream>
#include <Windows.h>
using namespace std;



int get_int(int default_value);
int average(int x, int y, int z);

int main() 

   {


    char filename[MAX_PATH + 1];
    int n = 0;
    char name[3];
    int grade[3];
    int recsize = sizeof(name) + sizeof(int);
    cout << "Enter directory and file name of the binary file you want to open: ";
    cin.getline(filename, MAX_PATH);    

    // Open file for binary write.
    fstream fbin(filename, ios::binary | ios::in);
    if (!fbin) {
    cout << "Could not open " << filename << endl;
    system("PAUSE");
    return -1;
   }

}
Where did the binary file come from originally, for example was it created in another program, or supplied by someone else?

At any rate, additional information is required in order to interpret the data. Ideally you will know from the specifications provided along with the file what is the length of the string which holds the name. Also how is the numeric data stored, is it type int, or float or what? I can make guesses at these but that isn't the proper approach, the information should really be known.

Failing that, you could open the .dat file using a hex editor in order to examine the data in hexadecimal mode.


Last edited on
It was a binary file provided by my professor. And sorry the record structure is name (20 bytes) and grade (int).
ok, i can see that this is an assignment, but i'm gonna help you abit here.
the record is made of a 20bytes long name, and an integer grade.
let's take a look at the first record
A.Smith ÌÌÌÌÌÌÌÌÌÌÌY

the names are shorter than 20bytes, that's why the names are padded with enough Ì until it reaches 20 bytes.
in this record: A.smith has 6 letters, one dot, one whitespace, then it should be padded with 12 Ìs.
after the name comes the integer grade, if you look at the file with a hex editor you will see:
the letter Y is actually the value 0x59, convert it to decimal, it's actually 89 the grade of a.smith, this integer isn't stored in a fixed length, so you can use the standard extraction methods to extract it.
after the integer we can notice a white space to separate records.
you know something, in the records you provided, the names are all 19 bytes long, not 20.
are you sure that the records should be 20bytes, and those records are right?
Last edited on
Yeah, that's what the assignment said "the record structure (20 bytes), grade (integer)
Just a quick comment, I was going to say more, but I'm short of time right now.
It looked to me as though the name was char[19] and the grade was a 2-byte integer, usually defined as short.

There are two ways to find out for sure, one is to write the code using whichever values actually work, a bit of trial-and-error.

My preferred approach would be to examine the file in hexadecimal, for Windows I can recommend the free Hexplorer.
http://sourceforge.net/projects/hexplorer/
yeah animus, you definitely should have a hex editor, not just for this assignment, but every programmer should have one.
if you can, open the file and post exactly what the hex editor displayed (i mean copy-paste).
that might unclear some more intel about the records file.
I downloaded HxD, and opened the binary file with it and this is what it shows.


41 2E 53 6D 69 74 68 00 CC CC CC CC CC CC CC CC CC CC CC 59 00 00 00 00 54 2E 20 50 68 69 6C 6C 69 70 00 CC CC CC CC CC CC CC CC 5F 00 00 00 00 53 2E 20 4C 6F 6E 67 00 69 70 00 CC CC CC CC CC CC CC CC 4C 00 00 00 00 4A 2E 20 57 68 69 74 65 00 70 00 CC CC CC CC CC CC CC CC 64 00 00 00
That looks a bit odd.
What I'm seeing there is:
19-bytes name
4-bytes  integer
1-byte   padding
except for the last line which doesn't have any padding.

Here's a quick program which reads the file according to the hex data posted above:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <fstream>

using namespace std;

int main()
{
    string filename = "data_edited.dat";
    ifstream fin(filename.c_str(), ios::binary);


    const int nameLength = 19;
    const int maxRecords = 100;

    int scores[maxRecords];
    char names[maxRecords][nameLength];

    int count = 0;

    // read individual items
    while ( fin.read(names[count], sizeof(names[0]))
         && fin.read((char *) &scores[count], sizeof(scores[0])) )
    {
        count++;
        fin.ignore(); // skip the padding byte
    }

    for (int i=0; i<count; i++)
    {
        cout << "Name: " << names[i] << " Score: " << scores[i] << endl;
    }
}

Output:
Name: A.Smith Score: 89
Name: T. Phillip Score: 95
Name: S. Long Score: 76
Name: J. White Score: 100
Last edited on
Here's an alternative version, which reads the entire record in one go - but in order to make this work I had to edit the dat file and insert an extra padding byte at the very beginning of the file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <iostream>
#include <fstream>

using namespace std;

const int nameLength = 19;
const int maxRecords = 100;

struct record {
    char filler;
    char name[nameLength];
    int score;
};

int main()
{
    string filename = "data_editedB.dat";
    ifstream fin(filename.c_str(), ios::binary);

    record rows[maxRecords];

    int count = 0;

    // read entire record
    while ( fin.read((char *) &rows[count], sizeof(record)))
        count++;

    for (int i=0; i<count; i++)
        cout << "Name: " << rows[i].name << " Score: " << rows[i].score << endl;
}



Thanks, but now I'm confused because the assignment wants me to have that data placed into arrays and then sorted, and averaged. This program reads the data and just outputs it to you right? How would you alter this so it reads and then does what I mentioned? I tried looking it up and came across something like putting that data into a buffer but I couldn't make sense of it.
Both programs posted above read the data into arrays.

The output is purely for diagnostic purposes, to verify that it is working. I suggest you read through the code slowly - some of it is obvious, but there are one or two tricky bits.
15
16
    int scores[maxRecords];
    char names[maxRecords][nameLength];

Those are the arrays. scores is just a single-dimension array. names is a 2D array as each name is itself an array of characters.
Last edited on
One more question. The .dat file which I used in version 1 of my code was 95 bytes long. It contains four records. I would expect a binary file to utilise a fixed number of bytes for each record, therefore the file length should be divisible by 4.

So Animus would it be possible (if you would be so kind) to double check the length of the .dat file you have. If indeed it is not evenly divisible by 4 (or whatever is the number of records) I would suggest you query this with your professor and see if this can be explained.

Last edited on
I would ask him, but the assignment is due on Tuesday. Anyways should I like upload the file and have you download it? Because all the assignment sheet says is "(the record structure: name (20 bytes), grade (integer)).
Last edited on
Thanks, yes, if you could upload the file I'd be interested to take a look.
Thanks for that. It confirms what you previously posted (the hex data).
I'd still go with what I suggested in the previous post here: http://www.cplusplus.com/forum/beginner/103593/#msg558359

But it leaves me feeling uneasy, as I can't match the file with a string length of 20, no matter how I look at it.

My main concern is that part of your project requires you to output a file of your own. When you reach that stage the question will arise, should you do the job properly according to the specification, or try to create a file with a similar botched format to the one you were given.



Will the new file that's botched have a weird output or it just won't have the specified 20 byte for the name?
Not sure I understand. The new file will depend on what code you choose to write. So it's a design decision that needs to be made as to whether to follow the precedent set by the file you were given, or to follow the specifications.

In the real world problems like this can arise, and the decision made could depend upon the circumstances. If the supplied file is currently used by some other programs, you would tend to be pragmatic and follow the same format. But you'd still request clarification.

On the other hand, if a discrepancy is uncovered during program development, then it points to someone having made a mistake somewhere, and you would tend to favour the written specification, but would also query it with the other developers, to make sure you all end up working to the same spec.

I understand that there probably isn't time to get this fully resolved before the project is due. Perhaps you could discuss this with other students on the same course.
Last edited on
Pages: 123