Seeking general solution to this problem (explanatory only)

There are N containers containing elements of type Thing*. Many Thing objects have identical data values, but are still separate Thing objects. The N containers intersect with each other in general.

When the program exits, the N containers are saved on file by writing out the data values of each Thing they contain. Hence each Thing object of each of the N containers is stored on file. When the program runs again, each Thing object is retrieved by reading the txt file, and the N containers now have the elements that they had before, with the difference that now each Thing* are all distinct pointers because the program could not deduce from the txt file which were identical (remember that two Thing objects having identical data values are not necessarily the same object). As a result, the N containers no longer intersect as they did before, i.e. everything is all wrong.

What is the general solution to this problem?
Last edited on
Maintaining an object registry as input or output is performed.

Brute force (two pass for output) example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <iostream>
#include <map>
#include <vector>

struct thing { std::string data ; };

int main()
{
    std::vector<thing> things { {"zero"}, {"one"}, {"two"}, {"three"}, {"zero"}, {"two"} } ;
    const thing* p = std::addressof( things.front() ) ;

    struct container { std::string name ; std::vector<const thing*> objects ; } ;
    container containers[] =
    {
        { "first container", { p+4, p, p+3, p+4, p+1, p+5, p+2, p+4, p, p+2, p+3 } },
        { "second container", { p+3, p+2, p, p+5, p+2, p+4, p+3, p+1 } }
    };

    std::size_t seq_number = 0 ;
    std::map< const thing*, std::size_t > object_map ;

    // populate object map
    for( const auto& cntr : containers )
    {
        for( const thing* ptr : cntr.objects )
        {
            auto iter = object_map.find(ptr) ;
            if( iter == object_map.end() ) object_map.emplace( ptr, seq_number++ ) ;
        }
    }

    // write object map
    std::cout << "objects are: \n" ;
    for( const auto& pair : object_map )
        std::cout << '#' << pair.second << ' ' << pair.first->data << '\n' ;

    // write object sequence numbers in container
    for( const auto& cntr : containers )
    {
        std::cout << '\n' << cntr.name << " holds objects: " ;
        for( const thing* ptr : cntr.objects ) std::cout << '#' << object_map[ptr] << ' ' ;
        std::cout << '\n' ;
    }

    // input would be symmetric:
    // 1. read and create object map
    // 2. read sequence numbers and populate containers from map
}

http://coliru.stacked-crooked.com/a/c987aa039f531425
Here's a solution I thought of while you wrote yours. Not sure if its doing the same thing or not, but I save the addresses as well (this makes the file look really ugly and slows the performance, but it works at least), and then check if the address has been read already or not by looking up a map during the reading.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <list>
#include <set>
#include <map>

struct Thing {
	int value;
	void save (std::ostream& os) const {os << this << '\n' << value << '\n';}
	void load (std::istream& is) {is >> value;}
};

std::vector<Thing*> vector;
std::list<Thing*> list;
std::set<Thing*> set;

int main() {
	Thing a{2}, b{2}, c{2}, d{2};
	vector = {new Thing{3}, new Thing{2}, &a, &b, new Thing{2}};
	list = {new Thing{3}, new Thing{2}, &a, &b, &c, new Thing{4}};
	set = {new Thing{2}, new Thing{5}, &c, &d, new Thing{2}, &a, &b};
	
	std::string fileName = "Save.txt";
	std::ofstream outfile (fileName);
	outfile << vector.size() << '\n';
	for (const Thing* x : vector)
		x->save(outfile);
	outfile << list.size() << '\n';
	for (const Thing* x : list)
		x->save(outfile);
	outfile << set.size() << '\n';
	for (const Thing* x : set)
		x->save(outfile);
	outfile.close();  // Wasted a lot of time debugging because I forgot this line.
	vector.clear();  list.clear();  set.clear();

	std::ifstream infile(fileName);
	int num, value;
	std::string address;
	std::map<std::string, Thing*> alreadyLoaded;
	infile >> num;
	for (int i = 0; i < num; i++) {
		infile >> address;
		if (alreadyLoaded.find(address) != alreadyLoaded.end()) {
			std:: cout << "already loaded.\n";
			infile >> value;
			vector.emplace_back(alreadyLoaded[address]);
		}
		else {
			Thing* thing = new Thing;
			thing->load(infile);
			vector.emplace_back(thing);
			alreadyLoaded[address] = thing;
		}
	}
	infile >> num;
	for (int i = 0; i < num; i++) {
		infile >> address;
		if (alreadyLoaded.find(address) != alreadyLoaded.end()) {
			std:: cout << "already loaded.\n";
			infile >> value;
			list.emplace_back(alreadyLoaded[address]);
		}
		else {
			Thing* thing = new Thing;
			thing->load(infile);
			list.emplace_back(thing);
			alreadyLoaded[address] = thing;
		}
	}
	infile >> num;
	for (int i = 0; i < num; i++) {
		infile >> address;
		if (alreadyLoaded.find(address) != alreadyLoaded.end()) {
			std:: cout << "already loaded.\n";
			infile >> value;
			set.emplace(alreadyLoaded[address]);
		}
		else {
			Thing* thing = new Thing;
			thing->load(infile);
			set.emplace(thing);
			alreadyLoaded[address] = thing;
		}
	}
}

Output:
1
2
3
4
5
already loaded.
already loaded.
already loaded.
already loaded.
already loaded.
Last edited on
> Not sure if its doing the same thing or not

It does more or less the same thing.

A good serialisation library can do a lot more; cater to many different types, handle object graphs which can be cyclic (one object may internally refer to other objects, which may in turn refer to more objects), support run-time polymorphism, provide for robust versioning and more.
For instance Boost.Serialization http://www.boost.org/doc/libs/1_57_0/libs/serialization/doc/index.html


> Wasted a lot of time debugging because I forgot this line

Use local block scopes to limit the life-time of objects. C++ has first class support for RAII; it is almost always a mistake to write Java-like or C-like spaghetti for resource management.

1
2
3
4
5
6
7
8
9
10
11
12
13
{
    std::ofstream outfile (fileName);

    // use outfile

}

{
    std::ifstream infile(fileName);

    // use infile

}
Since many people can benefit from this design, here it is more generally:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <map>
#include <vector>
#include <list>
#include <set>
#include <iterator>
#include <algorithm>
#include <type_traits>

struct Thing {
	std::string data;
	void save (std::ostream& os) const {os << data << '\n';}
	void load (std::istream& is) {std::string str;  while (std::getline (is, data) && data.empty());}
};

class System {
	private:
		static std::map<Thing*, int> ThingMap;
		static int seq_number;
	public:
		static const std::map<Thing*, int>& getThingMap() {return ThingMap;}
		static inline void insertInThingMap (Thing*);
		template <typename T, typename... Containers> static void save (std::ostream& os, Containers&... containers) {
			saveMap (os, ThingMap);
			SaveContainers<T, Containers...>()(os, ThingMap, containers...);
		}
		template <typename T, typename... Containers> static void load (std::istream& is, Containers&... containers) {
			std::map<int, T> inverseMap = loadMap (is, ThingMap);
			LoadContainers<T, Containers...>()(is, inverseMap, containers...);
		}
		static void clearThingMap() {ThingMap.clear();}
		static void resetSequenceNumber() {seq_number = 0;}
	private:
		template <typename, typename...> struct SaveContainers;
		template <typename, typename...> struct LoadContainers;
		template <typename T> static inline void saveMap (std::ostream&, const std::map<T, int>&);
		template <typename T> static inline std::map<int, T> loadMap (std::istream&, std::map<T, int>&);
		template <typename T, typename Container> static void addToContainer (Container& container, const std::map<int, T*>& inverseMap,
		const std::list<int>& sequenceNumbers, std::true_type) {
			for (int x : sequenceNumbers) container.emplace (inverseMap.find(x)->second);		
		}
		template <typename T, typename Container> static void addToContainer (Container& container, const std::map<int, T*>& inverseMap,
		const std::list<int>& sequenceNumbers, std::false_type) {
			for (int x : sequenceNumbers) container.emplace_back (inverseMap.find(x)->second);		
		}
};
std::map<Thing*, int> System::ThingMap;
int System::seq_number = 0;

inline void System::insertInThingMap (Thing* x) {
	const auto it = ThingMap.find(x);
	if (it == ThingMap.end())
		ThingMap.emplace (x, seq_number++);
}

template <typename T>
inline void System::saveMap (std::ostream& os, const std::map<T, int>& map) {
	os << map.size() << '\n';
    for (const std::pair<T, int>& x : map) {
        os << x.second << '\n';
		x.first->save (os);
	}
}

template <typename T>
inline std::map<int, T> System::loadMap (std::istream& is, std::map<T, int>& map) {
	std::map<int, T> InverseMap;
	int MapSize, sequenceNumber;
	is >> MapSize;
	for (int i = 0; i < MapSize; i++) {
		is >> sequenceNumber;
		typename std::remove_cv<T>::type t = new typename std::remove_cv<typename std::remove_reference<typename std::remove_pointer<T>::type>::type>::type;
		t->load(is);
		map.emplace (t, sequenceNumber);
		InverseMap.emplace (sequenceNumber, t);
	}
	return InverseMap;
}

template <typename T>
struct System::SaveContainers<T> {
	void operator()(std::ostream&, const std::map<T, int>&) const {}
};

template <typename T, typename First, typename... Rest>
struct System::SaveContainers<T, First, Rest...> : System::SaveContainers<T, Rest...>{
	void operator()(std::ostream& os, const std::map<T, int>& map, const First& first, const Rest&... rest) const {
		for (const T& t : first)
			os << map.find(t)->second << ' ';
	    os << '\n';
		System::SaveContainers<T, Rest...>::operator()(os, map, rest...);
	}
};

template <typename T> struct HasInsert: public std::false_type {};

template<typename T, typename A>
struct HasInsert<std::set<T, A>> : public std::true_type {};

template<typename T, typename A>
struct HasInsert<std::map<T, A>> : public std::true_type {};  // What about unordered_map, multimap, unordered_set, multiset ???  Need to improve on this!

template <typename T>
struct System::LoadContainers<T> {
	void operator()(std::istream&, const std::map<int, T>&) const {}
};

template <typename T, typename First, typename... Rest>
struct System::LoadContainers<T, First, Rest...> : System::LoadContainers<T, Rest...> {
	void operator()(std::istream& is, const std::map<int, T>& inverseMap, First& first, Rest&... rest) const {
		std::list<int> sequenceNumbers;
		std::string stringOfSequenceNumbers;
		while (std::getline (is, stringOfSequenceNumbers) && stringOfSequenceNumbers.empty());
		std::istringstream line (stringOfSequenceNumbers);
		std::copy (std::istream_iterator<int>(line), std::istream_iterator<int>(), std::back_inserter(sequenceNumbers));
		addToContainer (first, inverseMap, sequenceNumbers, HasInsert<First>());
		System::LoadContainers<T, Rest...>::operator()(is, inverseMap, rest...);
	}
};

int main() {
    std::vector<Thing> master { {"zero"}, {"one"}, {"two"}, {"three"}, {"zero"}, {"two"} };
    Thing* p = std::addressof (master.front());
	
	std::vector<Thing*> vector = { new Thing{"four"}, p, p+2, p+5, new Thing{"two"} };
	std::list<Thing*> list = { new Thing{"one"}, p+3, p, p+2, new Thing{"two"} };
	std::set<Thing*> set = { new Thing{"two"}, p, p+5, new Thing{"three"} };
	
    // Populate System::ThingMap.
    for (Thing* x : vector)
    	System::insertInThingMap(x);
    for (Thing* x : list)
    	System::insertInThingMap(x);
    for (Thing* x : set)
    	System::insertInThingMap(x);

    std::cout << "Contents of System::getThingMap() are: \n";
    for (const std::pair<const Thing*, int>& x : System::getThingMap())
        std::cout << '#' << x.second << ' ' << x.first << ' ' << x.first->data << '\n';

    std::cout << "\nContents of vector are: \n";
    for (const Thing* x : vector)
    	std::cout << "     " << x->data << ' ' << x << std::endl;
    std::cout << "\nContents of list are: \n";
    for (const Thing* x : list)
    	std::cout << "     " << x->data << ' ' << x << std::endl;
    std::cout << "\nContents of set are: \n";
    for (const Thing* x : set)
    	std::cout << "     " << x->data << ' ' << x << std::endl;

    // Saving:
    std::cout << "\n\nSaving...\n";
    std::string fileName = "Save.txt";
    {
		std::ofstream outfile (fileName);
		System::save<Thing*> (outfile, vector, list, set);
		vector.clear();  list.clear();  set.clear();
		System::clearThingMap();  System::resetSequenceNumber();
		std::cout << "Everything cleared and all Thing objects destroyed as the program exits...\n\n";
		std::cout << "---------------------------------------------\n";
	}

	// Loading:
	std::cout << "\nLoading...\n";
	{
		std::ifstream infile (fileName);
		System::load<Thing*> (infile, vector, list, set);
	}
    
    std::cout << "\nContents of the loaded System::getThingMap() are: \n";
    for (const std::pair<const Thing*, int>& x : System::getThingMap())
        std::cout << '#' << x.second << ' ' << x.first << ' ' << x.first->data << '\n';
    std::cout << "\nContents of the loaded vector are: \n";
    for (const Thing* x : vector)
    	std::cout << "     " << x->data << ' ' << x << std::endl;
    std::cout << "\nContents of the loaded list are: \n";
    for (const Thing* x : list)
    	std::cout << "     " << x->data << ' ' << x << std::endl;
    std::cout << "\nContents of the loaded set are: \n";
    for (const Thing* x : set)
    	std::cout << "     " << x->data << ' ' << x << std::endl;
}
Last edited on
Some points worth mentioning:
std::map<Thing*, int> is populated, saved, and retrieved. But during retrieval, the inverse map std::map<int, Thing*> is needed to be constructed to populate the containers back. Then that inverse map is destroyed because it is not needed anymore.

My meta-functions
1
2
3
4
5
6
7
template <typename T> struct HasInsert: public std::false_type {};

template<typename T, typename A>
struct HasInsert<std::set<T, A>> : public std::true_type {};

template<typename T, typename A>
struct HasInsert<std::map<T, A>> : public std::true_type {};


does not cover all types that has the insert function (vs. push_back). The more general meta-function for that needs to be worked on still.
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <type_traits>
#include <utility>

namespace detail
{
    using two_chars = char(&)[2] ;

    template < typename T > struct has_iterator
    {
        template < typename U > static char test( const typename U::iterator* ) ;
        template < typename U > static two_chars test( ... ) ;
        static constexpr bool value = sizeof( test<T>(nullptr) ) == sizeof(char) ;
    };

    template < typename T > struct has_key_type
    {
        template < typename U > static char test( const typename U::key_type* ) ;
        template < typename U > static two_chars test( ... ) ;
        static constexpr bool value = sizeof( test<T>(nullptr) ) == sizeof(char) ;
    };

    template < typename T, typename = void > struct has_insert : std::false_type {};
    template < typename T > struct has_insert< T, typename std::enable_if< has_key_type<T>::value >::type >
    {
        using insert_t = std::pair< typename T::iterator, bool > (T::*)( const typename T::key_type& ) ;
        template < typename U > static char test( const insert_t* ) ;
        template < typename U > static two_chars test( ... ) ;
        static constexpr bool value = sizeof( test<T>(nullptr) ) == sizeof(char) ;
    };

    template < typename T, typename = void > struct has_find : std::false_type {};
    template < typename T >
    struct has_find< T, typename std::enable_if< has_iterator<T>::value && has_key_type<T>::value >::type >
    {
        using find_t = typename T::iterator (T::*)( const typename T::key_type& ) ;
        template < typename U > static char test( const find_t* ) ;
        template < typename U > static two_chars test( ... ) ;
        static constexpr bool value = sizeof( test<T>(nullptr) ) == sizeof(char) ;
    };
}

template < typename T >
struct is_associative_container : std::conditional< detail::has_iterator<T>::value &&
                                                    detail::has_key_type<T>::value &&
                                                    detail::has_insert<T>::value &&
                                                    detail::has_find<T>::value,

                                                    std::true_type,
                                                    std::false_type >::type {};

http://coliru.stacked-crooked.com/a/e498943ff3902785
What is the standard procedure when an element is removed from a serialization map? Should the keys be adjusted so that they remain consecutive integers? Or just leave the keys as they are? If the former, then when loading from a program restart, seq_number can simply be set to the size of the map (since the keys are consecutive and starting at 0). If the latter, then seq_number will have to be one greater than the largest key found from the map.
Last edited on
The lifetime of a serialisation map is limited to the duration of the serialisation. It is freshly created for each serialisation and discarded at the end of the serialisation.

If it is a permanent object map (which is also used for serialisation), it is best not to reassign ids. In such a scenario, objects would be identified by their ids; for instance, we wouldn't have a container of pointers; instead we would have a container of object ids. We don't want a god-like component (to update all the object ids scattered throughout the program). Not reassigning ids would also be more efficient; we don't want an O(N) destructor either.
There still needs to be a further redesign to deal with proper removals from the serialization map ThingMap. Suppose N containers contain the Thing* object p. If p is removed from one of the containers, we do NOT call 'ThingMap.erase(p);' because other containers still contain p. But once p has been removed from ALL those containers, then 'ThingMap.erase(p);' shall be called. So how do we know when to erase p from ThingMap and when we don't? We can't check every single container, because those containers could be scattered across the program, and it would be too difficult to maintain that checking procedure when new containers are introduced, and it is very inefficient too. So I came up with the following solution (tested to work correctly), and wondered if there is a better solution still:

Change the serialization map ThingMap from std::map<Thing*, int> to std::map<Thing*, std::pair<int,int>>, where the first int is the serial number and the second int is the use_count. When a new key is inserted, the use_count is 1. The use_count is increased by 1 everytime the map is checked due to an insertion into a container, and decreased by one whenever a removal from a container is made. Only when the use_count is zero do we remove the key from the map.

The following are the key changes to the code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class System {
	static std::map<Thing*, std::pair<int,int>> ThingMap;  // *** Changed
        static int seq_number;
        // ...
};

inline void System::insertInThingMap (Thing* x) {
	const auto it = ThingMap.find(x);
	if (it == ThingMap.end())
		ThingMap.emplace(x, std::make_pair(seq_number++, 1));
	else
		ThingMap[x].second++;  // *** Added.  Updating the use_count of x.
}

template <typename List>
inline void System::removeItem (List& list, Thing* thing) {
	list.remove(thing);
	decreaseUseCount(thing);
}

inline void System::decreaseUseCount (Thing* thing) {
	std::pair<int, int>& pair = ThingMap[thing];
	pair.second--;
	if (pair.second == 0)
		ThingMap.erase(thing);
}


I thought of using std::shared_ptr<Thing>::use_count, but we don't know what a pointer's use_count would be apart from those containers, when only those containers are being monitored by ThingMap (outside of those containers, I don't see any use for ThingMap since its only purpose is saving and loading those containers correctly).
Last edited on
Repeat: The lifetime of a serialisation map is limited to the duration of the serialisation.
It is freshly created for each serialisation and discarded at the end of the serialisation.

Clarified: Containers should not be modified (through another thread) while serialisation is in progress.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// begin serialise (write)
{
      // containers are not modified once serialization starts
      // till the function returns / scope is exited
     
     // step 1: create object map
     std::size_t seq_number = 0 ;
     std::map< const thing*, std::size_t > object_map ;
     // add entries to object map
  
     // serialise as earlier
     
     // serialisation is complete
     // the life-time of the object_map  is now over; it is destroyed    
     // containers may be modified after the function returns / scope is exited
} // end seralise 
Ok, I think I misunderstood you the first time. You said:

"A good serialisation library can do a lot more; cater to many different types, handle object graphs which can be cyclic (one object may internally refer to other objects, which may in turn refer to more objects), support run-time polymorphism, provide for robust versioning and more."

I didn't understand those terms because I've never taken a computer course before, but from this, I concluded that as objects are created and inserted into various containers, they are inserted into the object map immediately because the object map can serve other useful purposes than just saving the elements to txt file at the end. Such objects are continually being created over the course of the running of the program, and hence the object map is continually updated (and hence I figured that as objects are removed from the containers, the object map is continually updated by element removal as well). The object map is used for various reasons throughout the program in the meantime (the stuff you mentioned above which I didn't understand). Then at the end of the program the object map is saved to txt file. Program restart then reads the txt file and restores the object map back to the way it was, and then the containers restored back to the way they were by reading the (inverse map of) the object map.

Now I reread "If it is a permanent object map (which is also used for serialisation), it is best not to reassign ids. In such a scenario, objects would be identified by their ids; for instance, we wouldn't have a container of pointers; instead we would have a container of object ids. We don't want a god-like component (to update all the object ids scattered throughout the program)."

Now from this I think the real procedure this whole time was meant to be: Objects are created and inserted into containers throughout the running of the program. There is no such thing as an object map yet. Program comes to an end and everything will be written to txt files. Only now is an object map created. It reads in all the containers and is populated with their elements. Then the object map is written to file as the program exits. Then the program restarts and the object map is created from the file, from which all the containers are restored as well. Object map is then destroyed and never created again until the game once again is ready for saving and exiting. Now I'm on the right track? The code I wrote so far was assuming the first design. But it is the second design that I should follow, right? But to follow the second design, the object map must know about the containers. The containers are really scattered and are continuously created and destroyed in my program, but I think they can be determined by the object map. The containers know about the object map (hence I was following the first design), but I wasn't sure whether object map knew about the containers. But I think it does, or can.
Last edited on
> Then the object map is written to file as the program exits.
> Then the program restarts and the object map is created from the file,
> from which all the containers are restored as well.
> Object map is then destroyed and never created again until the game once again
> is ready for saving and exiting. Now I'm on the right track?

Yes. You could also save or restore at any intermediate point in the program, creating a fresh object map each time. As long as the containers are not modified while save/restore is in progress.


> to follow the second design, the object map must know about the containers.

No; avoid tightly coupled components. Pass the containers as arguments. Something like:
1
2
template < typename... CONTAINER > 
std::ostream& save( std::ostream& stm, const CONTAINER&... cntrs ) ;


Topic archived. No new replies allowed.