fast destruction of std::vector<std::pair<int, int> >

Pages: 12
Dear C++ experts

I applogize for my frequent similar submissions,
but I am really in trouble.

I am bothered by high overhead of destruction of STL container like,
std::unodered_map<int, std::vector<std::pair<int, int> > >,
or
std::unodered_map<int, std::unodered_map<int, std::vector<std::pair<int, int> > > >,
particularly due to destruction of the last element,
std::vector<std::pair<int, int> >.
(Indeed, not std but TBB's concurrent containers)

I heard that destruction of std::vector<Object> entails high cost.
However if the Object type is primitive(such as char*, int etc),
its destruction seems to become fast as the below.

<https://stackoverflow.com/questions/22709344/does-stdvectortclear-call-destructor-of-the-content>

In my case, my Object type is std::pair<int, int>, so that I could split them into individual containers (two std::vector<int> ).
But, the size of type std::pair<int, int> is previously known,
therefore, I suppose if there is a smart way to fastly destruct std::vector<std::pair<int, int>.

Finally, could I ask you the below questions

(1) could vector of primitive type be fastly destructed compared to Object type? (such as std::vector<int>)

(2) Is there a way to destruct fixed-size object (like std::pair<int,int>) in a similar manner to the above (1) case?

I would appreciate it if you could help me.

Kind regards

Last edited on
I wouldn't expect the destruction of std::vector<std::pair<int, int>>. to be expensive as there's no code run per element, it really ought to be just the call to operator delete.

But as you have collections of these things, do you have any idea of how many you have?

I can now see why you were investigating a custom heap, that makes sense. However, it would be worth trying to find out where exactly the time is being spent. You could add timing code to your destructor and clear each container within a timed section.

This example has room for improvement, but it's a start:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
#include <chrono>
#include <iostream>
#include <string>
#include <tuple>
#include <unordered_map>
#include <vector>

struct Time {
	std::chrono::high_resolution_clock::time_point pt_;

	void now() { pt_ = std::chrono::high_resolution_clock::now(); }
	auto pt() const { return pt_; }
};

inline auto deltaNs(const Time &stop, const Time& start) {
	return std::chrono::duration_cast<std::chrono::nanoseconds>(stop.pt() - start.pt()).count();
}

struct Timer {
	Time start_, stop_;

	void start() { start_.now(); }
	void stop()  { stop_.now(); }

	auto durationNs() const { return deltaNs(stop_, start_); }
};

class RecordTimer {
public:
	RecordTimer(Timer& t) : t(t) { t.start(); }
	~RecordTimer() { t.stop(); }
private:
	Timer& t;
};

class Test {
	using value_type = std::tuple<int, int, int, int, int>;
	using values_type = std::vector<value_type>;
	using container_type = std::unordered_map<int, values_type>;

	container_type stuff;
	std::string name;

public:
	Test() : name("hello") {
		constexpr value_type value{1, 2, 3, 4, 5};
		values_type values;
		for (int i = 0; i < 4*1024; ++i)
			values.push_back(value);

		for (int i = 0; i < 4*1024; ++i)
			stuff[i] = values;
	}

	void run() {
		// stop the optimizer removing all the code
		long sum{ static_cast<long>(name.size()) };
		for (auto&& entry : stuff)
			for (auto&& values : entry.second)
				sum += std::get<0>(values) + std::get<4>(values);
		std::cout << "sum: " << sum << "\n";
	}

	~Test() {
		Timer name_timer;
		{
			RecordTimer rec(name_timer);
			name.clear();
		}

		Timer stuff_timer;
		{
			RecordTimer rec(stuff_timer);
			stuff.clear();
		}

		std::cout << "name_timer: " << name_timer.durationNs() << " ns\n";
		std::cout << "stuff_timer: " << stuff_timer.durationNs() << " ns\n";
	}
};

int main() {
	Test test;
	test.run();
}
Last edited on
Dear kbw

I really appreciate your reply.
Fast of all, I am going to measure peformance in each destruction steps.
(Then, I would like to consider how to improve)

Thank you for your kind reply.

>> I wouldn't expect the destruction of std::vector<std::pair<int, int>>
>> to be expensive as there's no code run per element,
>> it really ought to be just the call to operator delete.

I understand that the reason why vector of primitive type is fastly destoyed is due to the above manner. (If vector of Usual Object type, destructor call per element occurs)
Could vector of primitive type be fastly destructed compared to Object type?

The distinction isn't primitive vs class type but whether a type is trivially destructible.

Since C++17, std::pair<int, int> is indeed trivially destructible - but std::pair<T, U> is more complicated than intuition suggests, and it is not trivial overall (i.e., static_assert(! std::is_trivial_v<std::pair<int, int>>)) no matter its template arguments.

You could try replacing it with a trivial aggregate
struct my_pair { int first, second; };

But I have found that these data-flow issues are hard to address unless the question is posed properly.

Recall that an unordered_map is merely a clever arrangement of ints in a region of memory. A description of what happens to this data can be used to eliminate all the extra work your program is doing.

For example, consider the ideal case, in which a large memory buffer is allocated up front and treated as a stack of bytes from which the data structure monotonically sources memory. Then once the data structure is no longer needed, it can be discarded without destruction, the stack pointer reset and the memory reused by a new data structure. This is a blazing-fast scheme that does nothing that isn't required, but it only works if data usage follows known patterns.

Tell us about the data usage in your program?
Last edited on
Mitsuru wrote:
I applogize for my frequent similar submissions,
but I am really in trouble.


If these topics are related, could I suggest you just stick to one topic with slightly different but related questions, don't create new ones? Thanks :+) It is also a good idea to post links to your other similar topics.

mbozzi wrote:
Tell us about the data usage in your program?


Indeed, because at the moment this sounds like a classic XY problem. If it is, the answer is often a better algorithm or design pattern. Did you mention that you are often destructing a container then rebuilding it again? If so, what is the reason for that?

Please tell us what your actual real world problem is. :+)

Sorry, I don't have anything concrete to offer, but hopefully my post might be even the tiniest bit helpful nonetheless. :+)

Mitsuru wrote:
I heard that destruction of std::vector<Object> entails high cost.


I can't help thinking that premise might be inaccurate: The cost of deleting std::vector<Object> may be more than std::vector<int> (there is more work to do) ; but the cost of writing a custom delete or destructor may be more than letting the container destruct itself naturally at the end of it's scope. I guess this is the Zero Overhead principle.

It will be interesting to see what figures you have for the timing, hopefully you can include the timing for natural container destruction as a base case to compare other strategies to.

I look forward to your replies.

Regards
Thank you for several replies.
And, common thing you asked me is my data usage,
I explain it as possible as I can.
(In the above query, I said int, but std::size_t is really used)


There are N elements indexed by std::size_t,
and they are primarily grouped into M groups indexed by std::size_t
in mutually exclusive, and, collectively exhaustive manner.

Based on a collection of pairs in N elements (i.e. a pair is (N1 element in M1 group & N2 element in M2 group). the collection is generated in advance. And, this collection is changed in many times),
I would like to map from each group index pair to element index pairs(i.e. M1, M2 |-> a set of N1, N2)

To realize the above intention, I utilized a container like
1
2
3
4
5
6
7
8
9
10
11
12
tbb::concurrent_unordered_map<std::size_t, 
tbb::concurrent_unordered_map<std::size_t,
tbb::concurrent_vector<std::pair<std::size_t, std::size_t> > > > 
mapper_data;

// generation of mapper_data

// psuedo relation
auto N1N2_pairs = mapper_data[M1][M2];

// destruction of mapper_data (In reality, due to exit of this scope)


Here, the vector length (N1N2_pairs.size()) is different from each other (length is case by case).

The reason why I asked you about the destruction is time for destruction of mapper_data is 5~6 times slower than its generation.


I hope it's explained.

Regards
Last edited on
The above generation stage is done in shared-memory parallel manner (tbb).
So, it is possible to take shorter in generation stage.

Even so, I would like to know how to speed up destruction stage (if possible, with generation stage).
Last edited on
The above generation stage is done in shared-memory parallel manner (tbb).


Right, as per the documentation: generation is concurrent, but erasure is not.

Your explanation is still abstract, I am wondering what the real world problem is, and why the collection of pairs is changed so often. In an XY problem, you have shown us Y (your solution), but not X the actual problem. It's possible there is a better solution using a different data structure or algorithm or design pattern.

Based on a collection of pairs in N elements (i.e. a pair is (N1 element in M1 group & N2 element in M2 group). the collection is generated in advance.


I am confused by this: A M1 has N1 and N2 as it's pair even though N2 is from M2, where M2 is a separate item in the second level unordered_map? You have said the collection is generated from the pairs, but the pairs are part of the data structure?

I have a vague idea that one could try to decouple the pairs collection from the main data structure, but I am guessing because I don't know the reasons in a real world sense why you have such a complex data structure.

I would like to map from each group index pair to element index pairs(i.e. M1, M2 |-> a set of N1, N2)


This sounds like a std::vector<std::pair> :

1,1
2,1
3,1
4,1
5,1
6,2
7,2
8,2
9,3
10,4
11,4
12,5
13,5
14,5

Where the first number is N, the second is the group (M as you named it). The set of group 1 and 2 is N(1 to 8) ; the set of group 3 and 5 is N(9,12,13,14) . That information could be stored as a vector of vector.

Hence why I am confused why you have vector of pair inside a map inside a map. If the M's are mutually exclusive and the N's are unique, why the need for a map? A vector would work there.

Regards
the real world problem is, and why the collection of pairs is changed so often


Among N elements, a pair (two of elements) is made like couple, and the couple is changed so often (there is no gender, every element has opportunity to make pair with all others).

I am confused by this: A M1 has N1 and N2 as it's pair even though N2 is from M2, where M2 is a separate item in the second level unordered_map? You have said the collection is generated from the pairs, but the pairs are part of the data structure?


a number of elements is N, and a number of group is M.
And, elements necessarily belongs to just one group like the below

Group M_1 M_2 M_3 ....
===============================
Elements N_1 N_5 N_9 ....
N_2 N_6 N_10 ....
N_3 N_7 N_11 ....
N_4 N_8 N_12 ....

a collection of pairs is given like

{(N_1, N_6), (N_2, N_8),(N_5, N_10)}

Hence why I am confused why you have vector of pair inside a map inside a map.


The map (F) I need is like this,

F(M_1, M_2) = {(N_1, N_6), (N_2, N_8)}


I hope this explain the detail of my data structure and algorithm.

Regards
I hope this explain the detail of my data structure and algorithm.


Not really, you explained a pair is a couple, but the rest is still abstract. What is a couple - a molecule, a bacteria, or something else? We are still missing the problem statement.

What discriminates a group?

I am bothered by high overhead of destruction of STL container like
Really?
I heard that destruction of std::vector<Object> entails high cost.

Wow!

How many pairs that have to be destroyed are being talked about here? How many microseconds is there to save? A billion pairs per minute? Or just 1000 in a single run? Less, I'll bet. Where is the benchmark evidence?

Chances are individual pairs don't have to be destroyed anyway.

There's probably more time being wasted in chasing this very badly structured question than there is in any potential (if that ) time-save.

Houston, we don't have a problem here. The shuttle never even entered the real world. Time to move on.
What is the maximum N and M ?

And why do you need a map of a map? Why not just a map of pair(M1,M2) (like I suggested in an earlier post) ?

Here's your data in a somewhat more readable form:

1  2  3  Groups
=======
1  5  9  Elements
2  6 10
3  7 11
4  8 12

Pairs: {(1,6),(2,8),(5,10)}

Mapping: F(1,2) => {(1,6),(2,8)}

Do the elements of each couple always come from different groups? I.e., in the above situation, could the couple (1,2) exist?
Are the elements always evenly divided between the groups?
Are the groupings of the elements fixed for the duration of the program?

If there is an online description of the problem, post a link.
Last edited on
Dear dutch

Thank you for your reply.

And why do you need a map of a map? Why not just a map of pair(M1,M2) (like I suggested in an earlier post) ?

I tried what you told me (hashed key of std::pair).
I also thought it was a nice idea, but when I tried,
mainly due to overhead of calculation of hash value (like max_x*y + x),
time cost in generation occured.
(Of course, I tried it again after several ways for my problems).

Do the elements of each couple always come from different groups?

No, elements can be paris within the same group.
Are the elements always evenly divided between the groups?

Basically it is. But, in some cases, elements are not evenly divided.
(For general usage, it is desirable to implement code based on the latter case)
Are the groupings of the elements fixed for the duration of the program?

This is fixed.

If there is an online description of the problem, post a link.

I am sorry but there is no online description.
What are the maximum N and M ?

Can you post a complete description of the problem ?
Dear againtry

How many pairs that have to be destroyed are being talked about here? How many microseconds is there to save? A billion pairs per minute? Or just 1000 in a single run? Less, I'll bet. Where is the benchmark evidence?[


This is a nice message, and kbw recommended the measurement of performance in each destruction step.
(I am now checking this)

I understand my question is , as you said, badly structured question,
because the data type, the container, I used is so commomplace.

According to what everyone suggested for me,
I should make clear my data structure befind the map.

Regards
Dear dutch

What are the maximum N and M ?

Now, I use N=250000 and M = 5000.

Can you post a complete description of the problem ?


Indeed, what I said is almost all,
because I am making some toy model relating to coupling and grouping.

I am sorry but I hope I understand if such a C++ implementation is usual, or not but there is a way to resolve it.

Kind regards,
Last edited on
Using a separate chrono timer the timings show pair vector significantly quicker than 2 vectors. (Still only a small amount of time)
Size range from 1 000 000 to 10 000 000


TEST 1: TWO SEPARATE VECTORS OF INT's
16616892
34327521
51215085
69111602
86863480
103661112
119625830
134466354
150418500
168635861

TEST 2: SINGLE VECTOR OF PAIRS
9241684
20160191
28355487
38156638
45784181
54911954
63274346
76891756
83492575
92045430
Complete
Program ended with exit code: 0
Dear againtry

Thank you for your verification.

Though I cannot understand the detail you did
it helps me.

What is unit of elapsed time?
and, how many times are generation/destruction called in each output line?
Last edited on
You can try it yourself and modify it to suit what you are really trying to do.

While it is not the end of the world, there is a difference between destruction, deletion etc and that is why I have talked about 'going out of scope' which may or may not be what your concern is.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#include <chrono>
#include <iostream>
#include <vector>

#include "stopwatch.h"


int64_t test_vector_two_ints(Stopwatch &watch, int n)
{
    int NO_INTS{n * 1000000};
    std::vector<int> Vector_1;
    std::vector<int> Vector_2;
    
    for(int i = 0; i < NO_INTS; i++)
    {
        Vector_1.push_back(i);
        Vector_2.push_back(i);
    }
    watch.reset();
    return watch.getTime_chrono();
}

int64_t test_vector_pairs(Stopwatch &watch, int n)
{
    std::vector<std::pair<int, int>> Vector;
    
    int NO_PAIRS{n * 1000000};
    
    std::pair<int, int> temp;
    for(int i = 0; i <= NO_PAIRS; i++)
    {
        temp.first = i;
        temp.second = NO_PAIRS - i;
        
        Vector.push_back(temp);
    }
    watch.reset();
    return watch.getTime_chrono();
}

int main()
{
    Stopwatch sw;
    
    std::cout << "TEST 1: TWO SEPARATE VECTORS OF INT's\n";
    for(int i = 1; i <= 10; i++)
    {
        int64_t start = test_vector_two_ints(sw, i);
        int64_t finish = sw.getTime_chrono();
        
        std::cout << finish - start << '\n';
    }
    std::cout << '\n';
    
    std::cout << "TEST 2: SINGLE VECTOR OF PAIRS\n";
    for(int i = 1; i <= 10; i++)
    {
        int64_t start = test_vector_pairs(sw, i);
        int64_t finish = sw.getTime_chrono();
        
        std::cout << finish - start << '\n';
    }
    
    std::cout << "Complete\n";
    
    return 0;
}


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// stopwatch.h
#include <chrono>

using namespace std::chrono;

class Stopwatch
{
private:
    steady_clock::time_point start_chrono_clock;
    clock_t start_ctime_clock;
public:
    Stopwatch(){ reset(); }
    ~Stopwatch(){}
    void reset(){ start_chrono_clock = steady_clock::now(); }
    
    // Read time elapsed in nanoseconds
    int64_t getTime_chrono(){ return (steady_clock::now() - start_chrono_clock).count(); }
};
Last edited on
Pages: 12