My Merge Sort is too slow!!

closed account (G28XjE8b)
Hello,
No matter how much I try, my Merge Sort is incredibly slow, let's forget comparing it with std::sort. I have tried to pinpoint the problem, but I can't find a way to fix this slow problem.

My merge sort uses a vector to cache data, I really worry about memory leak.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
template <class T>
void mergeDataAscending(T arr1[], int size1, T arr2[], int size2, vector<char> &cache_vector)
{
	int i;
	cache_vector.reserve(sizeof(T) * (size1 + size2));
	T *cache_data = (T*)(&cache_vector[0]);

	int idx_c = 0;
	int idx1_s = 0;
	int idx2_s = 0;

	int size1_s = size1;
	int size2_s = size2;

	while(size1_s && size2_s)
	{
		if(arr1[idx1_s] <= arr2[idx2_s])
		{
			new (&cache_data[idx_c]) T(arr1[idx1_s]);
			idx_c++;
			idx1_s++;
			size1_s--;
		}
		else
		{
			new (&cache_data[idx_c]) T(arr2[idx2_s]);
			idx_c++;
			idx2_s++;
			size2_s--;						
		}
	}

	while(size1_s)
	{
		new (&cache_data[idx_c]) T(arr1[idx1_s]);
		idx_c++;
		idx1_s++;
		size1_s--;
	}

	while(size2_s)
	{
		new (&cache_data[idx_c]) T(arr2[idx2_s]);
		idx_c++;
		idx2_s++;
		size2_s--;	
	}

	for(i = 0; i < idx_c; i++)
	{
		arr1[i].~T();
		new (&arr1[i]) T(cache_data[i]);
		cache_data[i].~T();
	}
}

template <class T>
void mergeDataDescending(T arr1[], int size1, T arr2[], int size2, vector<char> &cache_vector)
{
	int i;
	cache_vector.reserve(sizeof(T) * (size1 + size2));
	T *cache_data = (T*)(&cache_vector[0]);

	int idx_c = 0;
	int idx1_s = 0;
	int idx2_s = 0;

	int size1_s = size1;
	int size2_s = size2;

	while(size1_s && size2_s)
	{
		if(arr1[idx1_s] >= arr2[idx2_s])
		{
			new (&cache_data[idx_c]) T(arr1[idx1_s]);
			idx_c++;
			idx1_s++;
			size1_s--;
		}
		else
		{
			new (&cache_data[idx_c]) T(arr2[idx2_s]);
			idx_c++;
			idx2_s++;
			size2_s--;				
		}
	}

	while(size1_s)
	{
		new (&cache_data[idx_c]) T(arr1[idx1_s]);
		idx_c++;
		idx1_s++;
		size1_s--;
	}

	while(size2_s)
	{
		new (&cache_data[idx_c]) T(arr2[idx2_s]);
		idx_c++;
		idx2_s++;
		size2_s--;
	}

	for(i = 0; i < idx_c; i++)
	{
		arr1[i].~T();
		new (&arr1[i]) T(cache_data[i]);
		cache_data[i].~T();
	}
}

template <class T>
void recursiveMergeSortAscending(T arr[], int size, vector<char> &cache_vector)
{
	if(size <= 1) return;

	int half = size / 2;

	recursiveMergeSortAscending(&arr[0], half, cache_vector);
	recursiveMergeSortAscending(&arr[half], size - half, cache_vector);

	mergeDataAscending(&arr[0], half, &arr[half], size - half, cache_vector);
}

template <class T>
void recursiveMergeSortDescending(T arr[], int size, vector<char> &cache_vector)
{
	if(size <= 1) return;

	int half = size / 2;

	recursiveMergeSortDescending(&arr[0], half, cache_vector);
	recursiveMergeSortDescending(&arr[half], size - half, cache_vector);

	mergeDataDescending(&arr[0], half, &arr[half], size - half, cache_vector);
}

template <class T>
void mergeSort(T arr[], int size, vector<char> &cache_vector, bool bAscending = true)
{
	if(cache_vector.empty()) cache_vector.push_back(0);

	if(bAscending) 
	recursiveMergeSortAscending(arr, size, cache_vector);
	else
	recursiveMergeSortDescending(arr, size, cache_vector);
}


Any help and suggestion will be hugely appreciated.
I was going to suggest using std::sort() but it seems like you want to create your own sort algorithm.

Try implementing a quick sort algorithm. If you don't know what that is google it. Also, I think the reason for the speed of you sort algorithm has to do with caching data with vectors and the use of recursive functions.
Last edited on
closed account (G28XjE8b)
@boost lexical cast
We haven't learned quick sort yet, we are practicing merge sort. Please help.
You're worried about memory leaks, so you're using placement-new directly into the memory buffer of a std::vector with zero size?

This is a staggeringly bad idea. What problem were you hoping to avoid?
I can almost guarantee that std::allocator does the same job, better, by default, inside the guts of the vector object.

One issue is that since you never resize the vector, your vector is actually in an indeterminate state; you're violating the preconditions required for access.

We haven't learned quick sort yet
Please tell me that your instructor didn't tell you to do this.
Last edited on
closed account (G28XjE8b)
Please tell me that your instructor didn't tell you to do this.

My instructor didn't tell me to do this.

I am just concerning why my merge sort is slow. My merge sort does not crash or something, just slow.

You're worried about memory leaks, so you're using placement-new directly into the memory buffer of a std::vector with zero size?

The vector is merely unused, I just use the memory that the vector holds. As long as the index does not violate the vector capacity, I don't think it will cause segmentation fault.
The vector is merely unused, I just use the memory that the vector holds. As long as the index does not violate the vector capacity, I don't think it will cause segmentation fault.

You're using a vector in a way it was not intended to be used, resulting in undefined behavior. That may result in a segmentation fault, or it may not. It may result in your code being slow. It may have no observable effect at all. (In debug mode on one popular compiler, this will result in an exception being thrown.)

Making wrong code faster doesn't seem like a terribly brilliant use of effort.

Merge sort's application is typically on linked lists, not arrays or vectors, but I would be curious what makes you think your implementation is "slow."





closed account (G28XjE8b)
You're using a vector in a way it was not intended to be used, resulting in undefined behavior. That may result in a segmentation fault, or it may not. It may result in your code being slow. It may have no observable effect at all. (In debug mode on one popular compiler, this will result in an exception being thrown).

That is why I have prepared to make sure the vector has at least one element so that I can access the whole vector memory at will. When the vector destructs, it will automatically clean the memory up.

if(cache_vector.empty()) cache_vector.push_back(0);

Merge sort's application is typically on linked lists, not arrays or vectors, but I would be curious what makes you think your implementation is "slow.

It is really slow, because I am trying to match the speed std::sort, and my merge sort is many times slower.
Last edited on
That is why I have prepared to make sure the vector have at least one element so that I can access the whole vector memory at will. When the vector destructs, it will automatically clean the memory up.
if(cache_vector.empty()) cache_vector.push_back(0);

This does not in any way mitigate the undefined behavior. The vector destructor will run if the vector is created, regardless of whether the size is 0 or not.
closed account (G28XjE8b)
This does not in any way mitigate the undefined behavior. The vector destructor will run if the vector is created, regardless of whether the size is 0 or not.

I am fine with "undefined behavior" as long as it can give me speed! If I use vector.push_back(), the vector has to check to resize every time and my merge sort will become even more terrible!

vector<char> &cache_vector
I am using reference everywhere in my code. What makes you think the vector destructor will be called?
That is why I have prepared to make sure the vector has at least one element so that I can access the whole vector memory at will.

You are violating the preconditions of element-wise access, so your code is broken. What's wrong with std::allocator that you couldn't simply emplace elements in the vector?

When the vector destructs, it will automatically clean the memory up.
Your objects will never be destroyed if an exception is thrown. If you were using the vector properly, it would take care of that for you.

It is really slow, because I am trying to match the speed std::sort, and my merge sort is many times slower.

Almost certainly your library's std::sort is a highly-optimized hybrid sort that will almost certainly perform better in the average case than any merge-sort, no matter how many micro-optimizations you make.

Yes, both algorithms are almost certainly asymptotically equivalent (linearithmic) in time on average; this doesn't imply they are the same speed in real life.
Last edited on
I am fine with "undefined behavior" as long as it can give me speed! If I use vector.push_back(), the vector has to check to resize every time and my merge sort will become even more terrible!

It can't "give you speed," but it does make your code wrong. If you wish to avoid reallocations, use resize rather than reserve. [edit: And get rid of your use of placement new.]

I am using reference everywhere in my code. What makes you think the vector destructor will be called?

That's the way things in work in C++. Those references must refer to an actual vector.
Last edited on
closed account (G28XjE8b)
You are violating the preconditions of element-wise access, so your code is broken. What's wrong with std::allocator that you couldn't simply emplace elements in the vector?

I don't know std::allocator at all, so I can't use it.

Your objects will never be destroyed if an exception is thrown. If you were using the vector properly, it would take care of that for you.

I copied the data into the vector buffer, but I also have taken care so that the temporary elements in the vector cache get destructed.

1
2
3
4
5
6
7
8
9
10
11
for(i = 0; i < idx_c; i++)
{
	// Destroy the array element
	arr1[i].~T();
	
	// Copy new data to the array element
	new (&arr1[i]) T(cache_data[i]);

	// Destroy the element in the cache
	cache_data[i].~T();
}


Almost certainly your library's std::sort is a highly-optimized hybrid sort that will almost certainly perform better in the average case than any merge-sort, no matter how many micro-optimizations you make.

I am personally fine with my merge sort being two or three times slower, but it is many times slower!

It can't "give you speed," but it does make your code wrong. If you wish to avoid reallocations, use resize rather than reserve.

If I do that, there are some classes with their default constructor and if they are called every time when I merge data, my merge sort becomes even more terrible. I still need to overwrite the vector data though.

That's the way things in work in C++. Those references must refer to an actual vector.

Of course I need to specify a valid vector so that the merge sort works.
closed account (G28XjE8b)
All I want from the vector is the address of the first element in the vector, so that is why I just need to make sure the vector has at least one valid element. That is the trick.

T *cache_data = (T*)(&cache_vector[0]);

Why was std::array<> or even std::unique_ptr<T[]> not acceptable in favor of a vector?

I cannot dynamically specify the size for std::array. And if I have to sort several million of elements that consume more than 2Mb thread stack, the program will overflow and it crashes.

I want to take advantage of vector.reverse() and I use the vector like a buffer. There is no need for unique_ptr<T[]>, even it is the array version of unique_ptr.

It should also make me feel easier, I just need to call vector.shrink_to_fit() and the vector memory will be all cleaned up.

It can't "give you speed," but it does make your code wrong. If you wish to avoid reallocations, use resize rather than reserve. [edit: And get rid of your use of placement new.]

I can use the assignment operator, but it seems that placement new is a better choice.
Last edited on
Ah -- I edited my post too slowly. Apologies. (I know why std::array and unique_ptr<> aren't suitable.)

I copied the data into the vector buffer, but I also have taken care so that the temporary elements in the vector cache get destructed.

If an exception is thrown between when you explicitly invoke the destructors of your objects and when they are constructed, they will not be destroyed. If you were using std::vector correctly, they would be.

And if I have to sort several million of elements that consume more than 2Mb thread stack, the program will overflow and it crashes.

The recursive version of your algorithm is prone to stack overflow. The number of recursions should be log(n), though -- at what point are you expecting stack overflow?

I don't know std::allocator at all, so I can't use it.
std::allocator is essentially a wrapper around placement new. It constructs elements in-place and manages the allocation and deallocation of memory for them. It's used by default in all the standard containers that satisfy the AllocatorAware constraint. The allocator type can be specified as the second template argument to std::vector.

The allocator interface is why, e.g., std::vector::reserve() does not invoke any constructors even though reserve allocates a memory block. In other words, the default allocator (along with std::allocator_traits) manages a simple memory pool, and manages elements inside that memory pool upon request.

std::unique_ptr<T[]> obviously doesn't use an allocator and will therefore require T has a default constructor (or that the array is aggregate-initialized.) There's a similar problem with std::array, not with vector.

Eliminate placement new and use the vector correctly. It will be no slower -- better construct the elements in place, taking advantage of perfect forwarding as it is provided to you.
std::vector::emplace_back()

Regarding performance:
I am personally fine with my merge sort being two or three times slower, but it is many times slower!

How many times is "many times"? Are we talking orders of magnitude? Some smaller constant factor? Micro-optimization will not solve your problem.
Last edited on
Let me ask a simple question:
Is the performance of your merge-sorting algorithm a problem?
If no, then stop worrying about it.

If yes, then the conventional wisdom towards optimization is this:
The very first thing you change is the algorithm. If there are is an unequivocally better algorithm, implement that.

If there is not, then check to see if you know something about your data that enables some algorithmic optimization --- particular partitioning schemes for the quick-sort algorithm, for instance, may offer enormous benefits to performance on input with certain characteristics. Example: 3-way partitioning quicksort on stepped input.

Other things to check next -- is there any way you can reduce the input size? Trade memory for time?

Only if you are in the inner-loop of an inner-loop should you consider writing targeted micro-optimizations as a last-resort. You should expect to fight for the last few percentage points of saved time. This becomes less realistic as compilers improve.

The vector has to check to resize every time and my merge sort will become even more terrible

Don't waste your time -- your code won't run much faster by omitting a single comparison.
Last edited on
I took the liberty of modifying the original code to use a vector correctly and fixed the caching so it wasn't copying twice the number of values it needed to. Note that a cache doesn't really buy you much. I'd prefer to do it in-situ myself. Ascending/descending stuff removed for clarity. (That should probably be a point of customization in the template as opposed to requiring another suite of functions.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
template <class T>
void mergeData(T arr1 [], int size1, T arr2 [], int size2, std::vector<T> &cache_vector)
{
    cache_vector.insert(cache_vector.end(), arr1, arr1 + size1);

    T* dest = arr1;

    arr1 = cache_vector.data();
    T* arr1_end = arr1 + cache_vector.size();
    T* arr2_end = arr2 + size2;

    while (arr1 != arr1_end && arr2 != arr2_end)
    {
        if (*arr1 < *arr2) *dest++ = *arr1++;
        else *dest++ = *arr2++;
    }

    while (arr1 != arr1_end)
        *dest++ = *arr1++;

    while (arr2 != arr2_end)
        *dest++ = *arr2++;

    cache_vector.clear();
}

template <class T>
void recursiveMergeSort(T arr [], int size, std::vector<T> &cache_vector)
{
    if (size <= 1) return;

    int half = size / 2;

    recursiveMergeSort(&arr[0], half, cache_vector);
    recursiveMergeSort(&arr[half], size - half, cache_vector);

    mergeData(&arr[0], half, &arr[half], size - half, cache_vector);
}

template <class T>
void mergeSort(T arr [], int size)
{
    std::vector<T> cache_vector;
    cache_vector.reserve(size / 2 + 1);

    recursiveMergeSort(arr, size, cache_vector);
}
It seems OP has closed his a/c ;) but the rest of us can still learn from each other
Topic archived. No new replies allowed.