Write class to bin

How do you actually write the whole class object into a binary file?

The way I do it is, create seperate local varibles to hold the variables inside the class, and write the variables to the file. But can you actually just write the whole object into the binary file?
But can you actually just write the whole object into the binary file?

That depends. If the class is POD (plain old data), then yes. If the class contains member variables that are complex classes such as std::string, std::vector, std::list, etc, then no.


I made a little mock program to try and figure it out, what I currently have:

When I run it, it outputs a bunch of random numbers which I'm assuming are just junk.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#include <iostream>
#include <fstream>

using namespace std;

class Dude
{
private:
	int age;
	float height;
	string name;
public:
	Dude(int a, float h, string n);
	int getAge();
	float getHeight();
	string getName();
};
Dude::Dude(int a, float h, string n)
{
	this->age = a;
	this->height = h;
	this->name = n;
}
int Dude::getAge()
{
	return age;
}
float Dude::getHeight()
{
	return height;
}
string Dude::getName()
{
	return name;
}

int main()
{
	//create objs
	Dude John(20, 155, "John");
	Dude Alex(30, 140, "Alex");
	Dude Don(40, 120, "Don");

	//write to bin
	ofstream f("ClassFile.bin", ios::out | ios::app | ios::binary);
	f.write((char*)&John, sizeof(John));
	f.write((char*)&Alex, sizeof(Alex));
	f.write((char*)&Don, sizeof(Don));
	f.close();

	//read back from bin
	int toRestore1 = 0;
	float toRestore2 = 0;
	string toRestore3 = "";
	ifstream read("ClassFile.bin", ios::in | ios::binary);
	while (read.is_open())
	{
		read.read((char*)&toRestore1, sizeof(toRestore1));
		cout << toRestore1 << endl;
		if (read.eof())
		{
			read.close();
		}
	}

	system("PAUSE");
	return 0;
}


See my post above. You can not write an object to a binary file if it contains a complex class such as std::string (line 11).

The reason is that std::string does it's own memory management and when you write out a std::string you are writing out a structure that contains a pointer where the actual data for the string is on the heap. When you read that back, the pointer will no longer be pointing to the data for your string.
Last edited on
So then in order to write this class into a binary file, I would have to write the variables independently right?

So since my class contains 2 ints and a string, I would have to use my get functions to get these data, put it in a local variable, then write that variable into the binary file?

The int's are trivial. You can write them out directly.

The string is the problem. Using your getter (getName) isn't going to solve your problem because getName() returns a std::string (still a complex data type).

std::string does have the c_str() function which will return a const char * pointer to the actual data for your string. You can use that pointer to write out the data for the string, but you have the problem that now you're writing a variable length item to a binary file. If you write out the c_str() data and the null terminator, you can then determine the end of the string when you read it back. Another option is to write a length word in front of the string which can be used to determine the length when reading it back.
I see, so let us assume that the string is gone and my class just contains the two ints. Is my code correct in writing and reading class objects from a binary file?
If you have only the two ints, then you're fine and can write the class object directly to the binary file.

I would however suggest that instead of lines 46-48 that you make a Write() function in your class. The Write() function would be responsible for serializing your class to the binary file. If the members of your class change, only your Write function needs to change. main() should't care about "how" to write a Dude object.

1
2
3
4
5
6
7
8
void Dude::Write (ostream & f)
{  f.write ((char *)this, sizeof(Dude));
}

  // main lines 46-48
  John.Write (file);
  Alex.Write (file);
  Don.Write (file);


Likewise, your class should have a Read() function that knows how to restore a Dude object from a binary file back to a Dude object.

Last edited on
I see, thanks for clearing that up!

So regarding complex data structures such as a string..

How could I go about that?
Create your own structure with additional details.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
struct Dude
{
    Dude() : signature(0xE8A9) {}
    unsigned short signature;
    // Your data
    // Your complex data
    std::string s1, s2;
    int data_length;
    int string_length1;
    int string_length2;
};

void Dude::write(std:: ostream &os)
{
    string_length1 = s1.size() + 1;
    string_length2 = s2.size() + 1;
    data_length = string_length1 + string_length2;
   os.write(this, sizeof(Dude));
   os.write(s1.c_str(), string_length1);
   os.write(s2.c_str(), string_length2);
}

You can use that as base.
What is "signature(0xE8A9)"? Is that a memory address?
It is the unique signature for the struct Dude so that when reading the file you can ensure there is no errors (especially when you have to read multiple Dude entries at a time).

1
2
3
4
5
6
7
8
9
10
11
12
13
void Dude::read(std:: istream &is)
{
    Dude dude;
    is.read((char*)&dude), sizeof(dude));
    if(dude.signature == 0xE8A9)
    {
           std::cout << "Signature for the Dude entry matched. Begin reading further..." << std::endl;
     }
     else
    {
          std::cout << "Unknown entry detected! Possibly a data corrupt" << std::endl;
    }
}
This entire process is named "serialization".

Simply containing non-POD data or references isn't the end of the story. Different schemes can be used to write out hierarchies of objects, depending on the sophistication of the objects you need to write.

If your class doesn't make use of reference semantics (pointers, references, even any aliases), isn't part of a hierarchy, and contains only POD, it can be serialized trivially. Just take the address of the object and it's size, and write it out.

If your class makes use of references, than you'll have to serialize each referent, recursively.

Things get more complicated as your objects do. The C++ FAQ makes excellent suggestions about how to do this well.
https://isocpp.org/wiki/faq/serialization

Don't forget a version number and prefer human-readable text.
Last edited on
To serialize strings, your resulting output should be like <len><string><len><string>, etc.

So, for example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Data {
public:
    Data(std::string a, std::string b) : a_(a), b_(b) {}

    void write(std::ostream& out) const {
        out.write(reinterpret_cast<char*>(&a_.size()), sizeof(a_.size()));
        out.write(a_.c_str(), a_.size());
        out.write(reinterpret_cast<char*>(&b_.size()), sizeof(b_.size()));
        out.write(b_.c_str(), b_.size());
    }

    void read(std::istream& in) {
        std::string::size_type len;
        out.read(reinterpret_cast<char*>(&len), sizeof(len));
        a_ = std::string(len, '\0');
        out.read(&a_[0], len);

        out.read(reinterpret_cast<char*>(&len), sizeof(len));
        b_ = std::string(len, '\0');
        out.read(&b_[0], len);
    }

private:
    std::string a_;
    std::string b_;
};


Serialization can get very complex, very quickly. If you plan on doing anything non-trivial, I'd recommend either writing class wrappers for serializing strings and the like, or using a pre-existing library such as Boost.Serialization.
@TwilightSpectre - Lines 16 and 20 are going to result in undefined behavior.

At lines 15,19, you've allocated a string of one byte (a null). Reading directly into a[0] for len bytes will overwrite string's internal storage and possibly other data. No string operation is involved (other than returning a reference), so string does NOT know to increase the allocation of it's internal storage if required.


BTW, lines 14,16,18,20 should be in, not out,
Last edited on
By the way :
out.write(reinterpret_cast<char*>(&a_.size()), sizeof(a_.size()));
string::size() returns a plain int. Is it even valid if you use the address-of operator like that?
Even if it is valid, doing such things can be considered dangerous.
string::size() returns a plain int.

No, std::string.size() returns a size_t not an int. A size_t is an implementation defined unsigned type.

Serialization can get very complex, very quickly.

Very very true, especially when trying to serialize data between different compilers, operating systems and processor types where you get into different sizes for the POD types and endian issues.

@AbstractionAnan - I thought lines 15 and 19 were allocating a string of len bytes, with each byte a null. So, when I write to &a_[0], I know I have enough room. However, some more research does show that it is possible for a std::string to break by doing this, but very unlikely given that since C++11 they have to have a contiguous data representation, though admittedly a COW implementation is legal, and would probably cause this to break.

Still, that is a valid point; just because it's always worked for me... the principal stands, however, and undefined behaviour can be removed through replacing std::string with a std::vector<char>, copying it to a string afterwards. Interestingly enough, however, the way I did it is the way given in the example at cppreference: http://en.cppreference.com/w/cpp/io/basic_istream/read

And yes, the outs were a typo...

EDIT:
@SakurasouBusters
That was 'somewhat' a typo. Yes, the ampersand was deliberate - I'm passing a pointer, so I need to get the reference somehow. What wasn't correct was taking a reference of an rvalue; again, something stupid I did. Just goes to show I need to test my code before I post it, here.

For interest's sake, here's a coliru of the program with typos removed:
http://coliru.stacked-crooked.com/a/c084f5bcc3959f73
Last edited on
@twilightSpectre - I take back what I said about lines 16 and 20 being undefined behavior.
I missed that you were allocating the string for len bytes. That will ensure enough space for the read to work.

Edit: I still think it's an approach that should be avoided. It takes advantage of "guilty knowledge"that the string is stored contiguously at a[0]. Yes, it works but it bypasses the safety of the string class. The string class is designed so that all operations that modify the string are made through provided operators or functions.
Last edited on
Topic archived. No new replies allowed.