Serialization

I finished a C++ serializer generator: https://github.com/Helios-vmg/cppserialization
It supports reference cycles, smart pointers, strings, various standard containers, and inheritance.
There's no explicit support for format versioning, but it can be implemented on top of the existing infrastructure. There's an optional "type hash" system that ensures an older stream can be fully decoded by a newer deserializer that implements the exact contained types.
The one limitation is that pointers to the middle of an object or container can't be detected.

Example input: https://github.com/Helios-vmg/cppserialization/blob/master/random_graph/random_graph.xml
Last edited on
What would be the point of serializing a vector...?
Huh? For... the same reason one would serialize any collection of items? Maybe I'm not understanding the question.
How invasive is it, and how much boilerplate is needed to use it with an existing codebase?
It wants to generate the classes that will be serialized. If you have existing objects that you want to serialize, it'd be best to translate them into input files for the generator. The basic workflow is to put in the XML the members you want to have serialized and add an include for a header where you'll declare members that won't be serialized and functions. If you further derive from a generated class, and you try to deserialize by passing Derived as a template parameter to DeserializationStream::begin_deserialization(), you'll get back a nullptr because the generator will not be aware of the derived class.
Um, what? What does XML have to do with this? That sets off multiple alarm bells for me.
Last edited on
The serializable types and the pointer enumerations are generated from an XML that contains the description of said types.

Here's what the generator outputs for the input above.

random_graph.generated.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <cstdint>
#include <string>
#include <vector>
#include "serialization_utils.h"
#include "Serializable.h"
#include "SerializerStream.h"
#include "DeserializerStream.h"

class graph;
class node;

class graph : public Serializable
{
private:
    std::vector<node *> nodes;
private:
    node * first;
#include "graph.h"
public:
    graph(DeserializerStream &);
    virtual ~graph();
    virtual void get_object_node(std::vector<ObjectNode> &) const override;
    virtual void serialize(SerializerStream &) const override;
    virtual std::uint32_t get_type_id() const override;
    virtual TypeHash get_type_hash() const override;
    virtual std::shared_ptr<SerializableMetadata> get_metadata() const override;
    static std::shared_ptr<SerializableMetadata> static_get_metadata();
    virtual void rollback_deserialization() override;
};

class node : public Serializable
{
private:
    std::vector<node *> children;
private:
    std::uint64_t name;
private:
    std::string data;
#include "node.h"
public:
    node(DeserializerStream &);
    virtual ~node();
    virtual void get_object_node(std::vector<ObjectNode> &) const override;
    virtual void serialize(SerializerStream &) const override;
    virtual std::uint32_t get_type_id() const override;
    virtual TypeHash get_type_hash() const override;
    virtual std::shared_ptr<SerializableMetadata> get_metadata() const override;
    static std::shared_ptr<SerializableMetadata> static_get_metadata();
    virtual void rollback_deserialization() override;
};

random_graph.generated.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
#include "random_graph.generated.h"
#include <utility>

extern std::pair<std::uint32_t, TypeHash> random_graph_id_hashes[];
extern size_t random_graph_id_hashes_length;

std::shared_ptr<SerializableMetadata> get_metadata_b5b8eeb6f0dc0289();

graph::graph(DeserializerStream &ds)
    : nodes(proxy_constructor<std::vector<node *>>(ds))
    , first(proxy_constructor<node *>(ds))
{}

void graph::get_object_node(std::vector<ObjectNode> &v) const
{
for (const auto &i0 : (*this).nodes) {
        v.push_back(::get_object_node(i0));
    }
    v.push_back(::get_object_node(this->first));
}

void graph::serialize(SerializerStream &ss) const
{
    ss.serialize(this->nodes);
    ss.serialize(this->first);
}

std::uint32_t graph::get_type_id() const
{
    return 2;
}

TypeHash graph::get_type_hash() const
{
    return TypeHash( { 0xee, 0xf9, 0x3e, 0x1d, 0x14,
0x48, 0x28, 0x04, 0x27, 0x7f, 0xca, 0x01, 0x72, 0x46,
0x40, 0x32, 0xd1, 0xa4, 0xfd, 0xbc, 0xc3, 0x38, 0x52,
0x40, 0x59, 0xfa, 0x1e, 0x86, 0x14, 0x54, 0xad, 0x4d, });
}

std::shared_ptr<SerializableMetadata> graph::get_metadata() const
{
    return this->static_get_metadata();
}

std::shared_ptr<SerializableMetadata> graph::static_get_metadata()
{
    return get_metadata_b5b8eeb6f0dc0289();
}


node::node(DeserializerStream &ds)
    : children(proxy_constructor<std::vector<node *>>(ds))
    , name(proxy_constructor<std::uint64_t>(ds))
    , data(proxy_constructor<std::string>(ds))
{}

void node::get_object_node(std::vector<ObjectNode> &v) const
{
for (const auto &i1 : (*this).children) {
        v.push_back(::get_object_node(i1));
    }
}

void node::serialize(SerializerStream &ss) const
{
    ss.serialize(this->children);
    ss.serialize(this->name);
    ss.serialize(this->data);
}

std::uint32_t node::get_type_id() const
{
    return 1;
}

TypeHash node::get_type_hash() const
{
    return TypeHash( { 0x54, 0x5e, 0xa5, 0x38, 0x46,
0x10, 0x03, 0xef, 0xdc, 0x8c, 0x81, 0xc2, 0x44, 0x53,
0x1b, 0x00, 0x3f, 0x6f, 0x26, 0xcf, 0xcc, 0xf6, 0xc0,
0x07, 0x3b, 0x32, 0x39, 0xfd, 0xed, 0xf4, 0x94, 0x46, });
}

std::shared_ptr<SerializableMetadata> node::get_metadata() const
{
    return this->static_get_metadata();
}

std::shared_ptr<SerializableMetadata> node::static_get_metadata()
{
    return get_metadata_b5b8eeb6f0dc0289();
}

random_graph.aux.generated.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#include "Serializable.h"
#include "random_graph.generated.h"
#include <utility>
#include <cstdint>

std::pair<std::uint32_t, TypeHash> random_graph_id_hashes[] = {
    { 1, TypeHash({ 0x54, 0x5e, 0xa5, 0x38, 0x46, 0x10,
0x03, 0xef, 0xdc, 0x8c, 0x81, 0xc2, 0x44, 0x53, 0x1b,
0x00, 0x3f, 0x6f, 0x26, 0xcf, 0xcc, 0xf6, 0xc0, 0x07,
0x3b, 0x32, 0x39, 0xfd, 0xed, 0xf4, 0x94, 0x46, })},
    { 2, TypeHash({ 0xee, 0xf9, 0x3e, 0x1d, 0x14,
0x48, 0x28, 0x04, 0x27, 0x7f, 0xca, 0x01, 0x72, 0x46,
0x40, 0x32, 0xd1, 0xa4, 0xfd, 0xbc, 0xc3, 0x38, 0x52,
0x40, 0x59, 0xfa, 0x1e, 0x86, 0x14, 0x54, 0xad, 0x4d, })},
};
size_t random_graph_id_hashes_length = 2;

void *allocator_b5b8eeb6f0dc0289(std::uint32_t type)
{
    switch (type) {
    case 1:
        return ::operator new (sizeof(node));
    case 2:
        return ::operator new (sizeof(graph));
    }
    return nullptr;
}

void constructor_b5b8eeb6f0dc0289(std::uint32_t type, void *s, DeserializerStream &ds)
{
    switch (type) {
    case 1: {
        node *temp = (node *)s;
        new (temp) node(ds);
    }
    break;
    case 2: {
        graph *temp = (graph *)s;
        new (temp) graph(ds);
    }
    break;
    }
}

void rollbacker_b5b8eeb6f0dc0289(std::uint32_t type, void *s)
{
    switch (type) {
    case 1: {
        node *temp = (node *)s;
        temp->rollback_deserialization();
    }
    break;
    case 2: {
        graph *temp = (graph *)s;
        temp->rollback_deserialization();
    }
    break;
    }
}

bool is_serializable_b5b8eeb6f0dc0289(std::uint32_t type)
{
    switch (type) {
    case 1:
        return true;
        break;
    case 2:
        return true;
        break;
    }
    return false;
}

std::shared_ptr<SerializableMetadata> get_metadata_b5b8eeb6f0dc0289()
{
    std::shared_ptr<SerializableMetadata> ret(new SerializableMetadata);
    ret->set_functions(allocator_b5b8eeb6f0dc0289,
constructor_b5b8eeb6f0dc0289, rollbacker_b5b8eeb6f0dc0289, is_serializable_b5b8eeb6f0dc0289);
for (auto &p : random_graph_id_hashes)
        ret->add_type(p.first, p.second);
    return ret;
}


The serializer proper, which is hand-written, uses these functions to generate an address->object ID map: https://github.com/Helios-vmg/cppserialization/blob/master/postsrc/SerializerStream.cpp#L23
Last edited on
I work in a code base that actually does something similar to this (but with more features).

I think you're selling yourself short by calling this a serializer. When I hear the word "serializer" I imagine attaching it to an existing class in my code so that I can save and restore it.

What you've built is the first step for describing an entire data model which happens to be serializable. This is useful when you need many different classes of raw data (in named fields) which don't need any specialized functionality but need extremely robust getters and setters (or direct access to fields). In a real project, messing up the boilerplate code could mean corrupting user data. With auto-generated code you don't risk messing up any individual class and you don't need to write their unit tests. In your project's build you can generate the code as a preprocessing step.

Now some comments about your implementation, going by only the sample input file:

Avoid guids/long longs in method names. Unless you've been very careful to make sure that compiler errors will never ever happen in these methods (except when maintaining the serializer, obviously) and that you never have to step into these using a debugger, someone is going to hate you for this. Stick to readable names if at all possible.

I would generally avoid depending on order like that in XML. Change <public /> Stuff... into <public>Stuff</public>.

On that note, why would fields ever not be public? These classes are auto-generated. Where would you be able to access the private members?

Get rid of the <include> tags. You should either auto-include the needed files or put everything into one file (I've had no problems working with tens of thousands of lines of generated code).

Is vector a special case in that its contents can be described inline? It might be useful to be able to "construct" your own types like this.

You should try adding version support! In a real data model most changes will simply add new fields (for example, adding SSN to Facebook profiles). You can add version numbers to all the fields in the XML input and automatically set things to the default version when converting up to a version with a new field. More difficult changes actually require arbitrary code to convert. You should allow the user to supply code that converts version N to version N+1; as long as you keep all your old converters around your program will be able to read any old file. Bonus points if you also allow downgrading; I've worked with web services that require this for some complicated reason.

Oh, and you should show off references between user-defined objects in your input file. I'd like to see how they are handled.
Last edited on
Avoid guids/long longs in method names. Unless you've been very careful to make sure that compiler errors will never ever happen in these methods (except when maintaining the serializer, obviously) and that you never have to step into these using a debugger, someone is going to hate you for this. Stick to readable names if at all possible.
Do you mean for example is_serializable_b5b8eeb6f0dc0289()? I needed some way to prevent clashes between different is_serializable()s generated by different instantiations of the generator, linked into the same executable.
If a compiler error happens in these, I would consider that a bug in the generator or the interpreter.

I would generally avoid depending on order like that in XML. Change <public /> Stuff... into <public>Stuff</public>.
I considered this, but I wanted something that reflected the structure of the C++ code. Access specifiers aren't really structured.

On that note, why would fields ever not be public? These classes are auto-generated. Where would you be able to access the private members?

Get rid of the <include> tags. You should either auto-include the needed files or put everything into one file (I've had no problems working with tens of thousands of lines of generated code).
The user can ask to have files included that declare members that don't take part in the de/serialization process, such as getters/setters, other constructors, and more complex, non-serializable data members.

Is vector a special case in that its contents can be described inline? It might be useful to be able to "construct" your own types like this.
Could you expand on this?

You should try adding version support! In a real data model most changes will simply add new fields (for example, adding SSN to Facebook profiles). You can add version numbers to all the fields in the XML input and automatically set things to the default version when converting up to a version with a new field. More difficult changes actually require arbitrary code to convert. You should allow the user to supply code that converts version N to version N+1; as long as you keep all your old converters around your program will be able to read any old file. Bonus points if you also allow downgrading; I've worked with web services that require this for some complicated reason.
I do plan to use versioning in the project I will use this with. My idea was to have a simple generator that was easy to implement correctly, and then have the client code do the versioning weight-lifting.
This is what I have planned:
1
2
3
4
5
6
7
8
9
10
11
12
<class name="VersionedSerializable">
	<uint32_t name="version_no"/>
</class>
<class name="Foo1">
	<!--The syntax might be wrong, I can't remember it ATM.-->
	<base name="VersionedSerializable"/>
	<string name="some_data"/>
</class>
<class name="Foo2">
	<base name="Foo1"/>
	<string name="some_more_data"/>
</class> 
The generator takes care to make sure that the serialized object can be checked that is understood by the deserializer, to fail early if it isn't, and to construct the correct type if it is.

Oh, and you should show off references between user-defined objects in your input file. I'd like to see how they are handled.
What do you mean?
Last edited on
Topic archived. No new replies allowed.