binary files

Years ago I wrote a program that created binary files. Now I can't read the file.
What I am interested in doing is reading this file and rewriting it as a text file. For some reason this file resists giving me the correct results.


How do I attach the "rawmat'l.stk" file?


Attached is a streamlined console code:

#include "stdafx.h"

#include <iostream>
#include <conio.h>

struct materials
{
char A[7];
char B[27];
float C;
float D;
float E;
int F;
float G;
float H;
int I;
float J;
float K;
float L;
float M;
float N;
float O;
float P;
};

const static int end = 6;

int _tmain()
{
struct materials mat;
FILE *in;

if((in = fopen("c:/x_rm2/rawmat'l.stk","rb"))!= NULL)
{
for(int j=0; j<=end; j++)
{
fread(&mat,sizeof(mat),1,in);
printf("%6s %27s %6.2f %6.2f %6.2f \n",mat.A,mat.B,mat.C,mat.D,mat.P);
}
getch();
fclose(in);
return(0);
}
return(-1);
}
Years ago I wrote a program that created binary files. Now I can't read the file.

A number of things could have changed since then and now. For example:
• size of fundamental types such as int may have changed
• endian-ness of hardware may be different (byte order)
• packing options for the struct may differ

As a first attempt at diagnosing the nature of the problem, find the exact length in bytes of the file. It should be an exact multiple of the size in bytes of the struct.

Depending on the whether or not there are padding bytes, I get a size for the struct materials of either 90 or 92 bytes. That is based on a float and int each needing 4 bytes.

The padding, well
1
2
char A[7];
char B[27];
together make a total of 34 bytes. The compiler may try to align the float or int on a boundary which is a multiple of 4 bytes, so there may or may not be an extra two bytes padding inserted before the start of float C;


How do I attach the "rawmat'l.stk" file?

You could upload it to some external file-sharing service, and post the link here.

Or, as a partial alternative (ok for small files), you could run it through some code to convert it to hexadecimal values as readable text and post the output here. A quick example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <iostream>
#include <fstream>
#include <cstdio>

using namespace std;

const char* hex(unsigned char ch)
{
    static char hexbuf[4] = {0};
    sprintf(hexbuf, "%02X", ch);
    return hexbuf;
}

int main()
{
    // read the binary file and output as text
    const char * file_name   = "c:/x_rm2/rawmat'l.stk";
    const char * output_name = "c:/x_rm2/hexadecimal.txt";
    
    ifstream fin(file_name, ios::binary);
    ofstream fout(output_name);
    
    char ch;
    
    for (int i=0; fin.get(ch); i++)
    {
        if (i>0 && i%16 == 0) fout << '\n';
        fout << hex(ch) << ' ';
    }
}

Sample output:
41 70 70 6C 65 73 00 50 65 61 72 20 2E 2E 2E 2E 
2E 2E 2E 2E 2E 2E 2E 2E 2E 2E 2E 2E 20 54 72 65 
65 00 00 00 9A 99 99 3F 9A 99 59 40 33 33 B3 40 
15 03 00 00 D0 0F 49 40 4D F8 2D 40 B0 01 00 00 
0E 2D 6A 40 0A D7 12 42 66 A6 B7 43 00 D0 65 45 
AD FA BC 3E 48 E1 13 42 2F DD 6C 40 


The above program reads the file and creates an output file named "hexadecimal.txt". It may be quite long, if you post at least the first dozen or so lines that should be more than enough.

The original binary file can be reconstructed like so:
1
2
3
    std::ifstream fin("hexadecimal.txt");
    std::ofstream fout("convert.bin", std::ios::binary);
    for (int n; fin >> std::hex >> n; ) fout.put(n);

Last edited on
31 37 31 38 37 30 00 4E 55 20 59 45 4C 20 4F 58
20 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 66 66 78 42 00 00 00 00 33 33 73 41 CD CC
5C 42 00 00 00 00 33 33 97 41 01 00 98 41 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 29 5C
8F 3F 00 00 00 00 00 00 00 00 0A 36 E6 41 3B 43
0D 41 00 00 00 00 00 00 00 00 00 00 00 00 36 2D
32 36 30 31 00 4F 52 47 41 4E 49 43 20 59 45 4C
4C 4F 57 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 FC 41 00 00 00 00

These should be the lines for the first two items saved. The first (looks like) 18 entries give the proper information saved 171870 NU YEL OX C. It is after that the problem begins.
The following Perl invocation can be used to reconstruct the binary file from the hexdump above
perl -ne 's/([0-9a-f][0-9a-f])/print chr hex $1/ieg'
In case anyone wants to inspect the original.
Last edited on
Are you sure that is two complete items? Could you give a sample of about three times the length. I think, from the null-terminated char strings the length of one item is 110 bytes. Could you recheck the contents of your struct as well.

Looks roughly like:
7 chars
27 chars
4 bytes
4 bytes (blank)
8 bytes
4 bytes (blank)
8 bytes
16 bytes (blank)
4 bytes
8 bytes (blank)
8 bytes
12 bytes (blank)
Then starts repeating.
Hmm, I extracted these sorts of values from the file, but I've no idea whether they make sense:

171870  NU YEL OX C
62.1    0       15.2    1113377997      18.9    19      0
0       0       1.12    0       0       28.7764 8.82891

6-2601  ORGANIC YELLOW
31.5    0       0       0       0       0       0
0       0       0       0       0       0       0


My thought process as follows. Yesterday I came up with an expected length of 90 bytes for the struct with no padding. Today, it seems there is no padding after the end of the two strings. But the data has a length of 110, not 90. My first attempt just put a 20-char filler on the end. But that was wrong, as there is content in there, the filler can't be more than 12 bytes. So where to put the other 8 bytes? Perhaps each int requires 8 bytes instead of 4.

That gives this structure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#pragma pack(1)
struct materials
{
    char A[7];
    char B[27];
    float C;
    float D;
    float E;
    long long int F;
    float G;
    float H;
    long long int I;
    float J;
    float K;
    float L;
    float M;
    float N;
    float O;
    float P;
    char filler[12];
};
#pragma pack() 


long long int should use 8 bytes. A better solution is to use the names in the header <cstdint> which should not vary between compilers.

The #pragma pack(1) means align on one-byte boundary, the other pragma means go back to original setting.
Last edited on
An attempt to read the file into a vector, then print it out. It will only handle whole records. I had to add some trailing zero bytes (to the data file) to complete the second record.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#include <iostream>
#include <fstream>
#include <vector>

using namespace std;

#pragma pack(1)
struct materials
{
    char A[7];
    char B[27];
    float C;
    float D;
    float E;
    long long int F;
    float G;
    float H;
    long long int I;
    float J;
    float K;
    float L;
    float M;
    float N;
    float O;
    float P;
    char filler[12];
};
#pragma pack()

std::ostream & operator << (std::ostream & os, const materials& m)
{
    os << m.A << '\t'
       << m.B << '\n'
       << m.C << '\t'
       << m.D << '\t'
       << m.E << '\t'
       << m.F << '\t'
       << m.G << '\t'
       << m.H << '\t'
       << m.I << '\n'
       << m.J << '\t'
       << m.K << '\t'
       << m.L << '\t'
       << m.M << '\t'
       << m.N << '\t'
       << m.O << '\t'
       << m.P << '\n';

    return os;
}

int main()
{
    vector <materials> mat;

    ifstream fin("c:/x_rm2/rawmat'l.stk", ios::binary);

    if (!fin)
    {
        cout << "could not open file\n";
        return 1;
    }

    for (materials m; fin.read(reinterpret_cast<char*>(&m), sizeof(materials)); )
        mat.push_back(m);

    for (const auto & m : mat)
        cout << m << "\n";

}
Last edited on
Another thought process. Just treat all the values as either a 4-byte float or a 4-byte integer - but not sure which. So try both and see which makes most sense.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <iostream>
#include <fstream>
#include <vector>
#include <iomanip>
#include <cstdint>

using namespace std;

union float_or_int 
{
    int32_t n;
    float   f;    
};

#pragma pack(1)
struct Material 
{
    char A[7];        
    char B[27];
    float_or_int num[19];
};
#pragma pack()

std::ostream & operator << (std::ostream & os, const Material& m)
{
    os << m.A << '\n'
       << m.B << '\n';
    for (const auto q : m.num)
        os << setw(16) << q.f << setw(16) << q.n << '\n';        
       
    return os;
}

int main()
{    
    ifstream fin("c:/x_rm2/rawmat'l.stk", ios::binary);

    vector <Material> mat;

    for (Material m; fin.read(reinterpret_cast<char*>(&m), sizeof(Material)); )
        mat.push_back(m);
                
    for (const auto & m : mat)
        cout << m << "\n";
    
}

Output:
171870
NU YEL OX C
            62.1      1115186790
               0               0
            15.2      1098068787
            55.2      1113377997
               0               0
            18.9      1100428083
              19      1100480513
               0               0
               0               0
               0               0
               0               0
            1.12      1066359849
               0               0
               0               0
         28.7764      1105606154
         8.82891      1091388219
               0               0
               0               0
               0               0

6-2601
ORGANIC YELLOW
            31.5      1107034112
               0               0

From this, it seems the floating-point values look possibly more suitable than the integers.

Of course, where the value is zero, there is no way to decide which is more suitable.
Last edited on
Thanks for all the responses. I will take some time and work through all the ideas provided. I think I will remember it better that way.
You're welcome. I did think afterwards that I made use of some C++11 features which may be unfamiliar to you - if you need explanation of anything, just ask. Also, the use of <vector> isn't really necessary in the above code - I used it to read in the entire file to an array (vector) but it could be done without that step, just read in each record and print it out (or write it to a different file).
Topic archived. No new replies allowed.