Question regarding boost::serialization and placement new

Hey guys,

The TL;DR is: How does load_construct_data from boost::serialization know how much memory to allocate for a class when loading through a pointer?

The long version is here. My question is illustrated alongside code that I wrote to demonstrate the question. This is a complete, runnable program, assuming you've properly linked boost::serialization to your project.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
#include <cstddef> // NULL
#include <iomanip>
#include <iostream>
#include <fstream>
#include <string>

#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>

class non_default_constructor; // Forward declaration for boost serialization namespacing below


// In order to "teach" boost how to save and load your class with a non-default-constructor, you must override these functions
// in the boost::serialization namespace. Prototype them here.
namespace boost { namespace serialization {
	template<class Archive>
	inline void save_construct_data(Archive& ar, const non_default_constructor* ndc, const unsigned int version);
	template<class Archive>
	inline void load_construct_data(Archive& ar, non_default_constructor* ndc, const unsigned int version);
}}

// Here is the actual class definition with no default constructor
class non_default_constructor
{
public:
	explicit non_default_constructor(std::string initial)
	: some_initial_value{initial}, state{0}
	{

	}

	std::string get_initial_value() const { return some_initial_value; } // For save_construct_data

private:
	std::string some_initial_value;
	int state;

	// Notice that we only serialize state here, not the
	// some_initial_value passed into the ctor
	friend class boost::serialization::access;
	template<class Archive>
	void serialize(Archive& ar, const unsigned int version)
	{
		std::cout << "serialize called" << std::endl;
		ar & state;
	}
};

// Define the save and load overides here.
namespace boost { namespace serialization {
	template<class Archive>
	inline void save_construct_data(Archive& ar, const non_default_constructor* ndc, const unsigned int version)
	{
		std::cout << "save_construct_data called." << std::endl;
		ar << ndc->get_initial_value();
	}
	template<class Archive>
	inline void load_construct_data(Archive& ar, non_default_constructor* ndc, const unsigned int version)
	{
		std::cout << "load_construct_data called." << std::endl;
		std::string some_initial_value;
		ar >> some_initial_value;

		// Use placement new to construct a non_default_constructor class at the address of ndc
		::new(ndc)non_default_constructor(some_initial_value);
	}
}}


int main(int argc, char *argv[])
{

	// Now lets say that we want to save and load a non_default_constructor class through a pointer.

	non_default_constructor* my_non_default_constructor = new non_default_constructor{"initial value"};

	std::ofstream outputStream("non_default_constructor.dat");
	boost::archive::text_oarchive outputArchive(outputStream);
	outputArchive << my_non_default_constructor;

	outputStream.close();

	// The above is all fine and dandy. We've serialized an object through a pointer.
	// non_default_constructor will call save_construct_data then will call serialize()

	// The output archive file will look exactly like this:

	/*
		22 serialization::archive 17 0 1 0
		0 13 initial value 0
	*/


	/*If I want to load that class back into an object at a later time
	I'd declare a pointer to a non_default_constructor */
	non_default_constructor* load_from_archive;

	// Notice load_from_archive was not initialized with any value. It doesn't make
	// sense to intialize it with a value, because we're trying to load from
	// a file, not create a whole new object with "new".

	std::ifstream inputStream("non_default_constructor.dat");
	boost::archive::text_iarchive inputArchive(inputStream);

	// <><><> HERE IS WHERE I'M CONFUSED <><><>
	inputArchive >> load_from_archive;

	// The above should call load_construct_data which will attempt to
	// construct a non_default_constructor object at the address of
	// load_from_archive, but HOW DOES IT KNOW HOW MUCH MEMORY A NON_DEFAULT_CONSTRUCTOR
	// class uses?? Placement new just constructs at the address, assuming
	// memory at the passed address has been allocated for construction.

	// So my predicament is this:
	// (1) I don't want to construct a dummy non_default_constructor before loading into it, because that's just awful
	// programming practice (example: non_default_constructor = new non_default_constructor{"dummy value"})
	// (2) I want to verify that *something* is (or isn't) allocating memory for a non_default_constructor
	// class to be constructed at the address of load_from_archive.

	std::cout << load_from_archive->get_initial_value() << std::endl; // THIS WORKS, BUT HOW? IS IT VALID MEMORY?

	return 0;
   
}
Not sure what the standard is for b.u.m.ping, but figured I'd do it just once.
For posterity, I have the answer:

Detailed within the boost documentation here ( https://www.boost.org/doc/libs/1_73_0/boost/archive/detail/iserializer.hpp ), boost defines facilities for heap allocation under the assumption that subsequent code will use global operator_new to construct an object within the allocated memory; which is exactly what the overloaded load_construct_data and save_construct_data do.

The one caveat is that heap allocation is conditional due to the possibility that you may be deserializing the same object into multiple different pointers. Due to the way that object tracking works, only the first deserialization will result in heap_allocation, whereas subsequent deserializations will simply return a pointer to the already allocated memory. (This works similarly with serialization as well. If you're serializing multiple pointers to the same object, only 1 version of the object is serialized.). That is described here: https://www.boost.org/doc/libs/1_61_0/libs/serialization/doc/special.html#objecttracking
Topic archived. No new replies allowed.