Binary Sort Performs Worse than std::sort

I'm trying out my own implementations of a few sorting algorithms/containers and am comparing performance through benchmarks. I've found that my implementation of a binary tree for sorting (that is to say, the time it takes to insert every element into the tree) has provided consistently worse performance than that of the std::sort implementation on VS2012. This holds true for a group of data that's 50 elements, 500,000 elements, or 5 million elements. All elements are randomly generated integers.

Is this performance difference just a result of the algorithms themselves, or is it a result of my implementation? (Source below)

.h:
http://pastebin.com/i51Jq1Rh

.inl:
http://pastebin.com/zs6gaU1z

GitHub:
https://github.com/tylercamp/SortMark

The sorting itself starts from within the Run method.

I feel as though part of the overhead in my implementation is the amount of recursion that's going on, which I don't think is going to happen as much in an std::sort implementation. This thought is the result of how std::sort works with a linear data set while my binary tree uses, well, a tree.



EDIT: Here are some results of one of my benchmarks:

Element count: 5000000
Name                     | Validation             | Time (seconds)
Binary Tree Sort           Succeeded                10.868
STL std::sort              Succeeded                1.619


EDIT2:
After showing this to a friend, he mentioned that I made a lot of dereferences. Cache misses are definitely going to be an issue, and since quicksort is working with a contiguous memory set (data's in an std::vector) it's got the upper hand since it's doing an in-place sort. Will benchmark with an std::list instead and see how things go.

EDIT3:
std::sort doesn't work with std::list since std::sort requires random access iterators; instead I tried it duking it out against std::list::sort, where my binary tree beat it in about half the time. It's promising, but list::sort isn't the same algorithm, so I'll see if I can find a way to either invoke cache hits against std::sort, or if I can get my binary tree to work with contiguous memory.

EDIT4:
I modified the node reserve so that instead of holding pointers to nodes allocated off the heap, it simply contains node objects that are now stored in contiguous memory. This cut sorting time of 5 million elements in half. Not enough to outpace std::sort, though:
Element count: 5000000
Name                     | Validation             | Time (seconds)
Binary Tree Sort           Succeeded                5.967
STL std::sort              Succeeded                0.707883

(Just realized, I think this test is in Release mode and the previous results were from a Debug build. Ratio is actually worse in this scenario, possibly due to all of the pass-by-value.)
Last edited on
AFAIK binary trees aren't meant to be fast for sort but fast for search hence Binary Search Tree.
The insertion operation is essentially the sort, which should all be happening in O(n log n) time, IIRC. QuickSort also hits O(n log n) for a search sort.
Last edited on
How would you use quicksort to search?
I'm not using it to search, I'm using it to sort. I'm using the way that the binary tree automatically sorts elements to use it as a sorting algorithm as well.

Now I see what you mean. Fixed the typo.
Last edited on
Have you tried using iteration over recursion?
Last edited on
Trying it now, just getting distracted. In the meantime, had another discussion with my friend and I strongly believe the performance is related to CPU cache. Here's a transcript:

http://pastebin.com/8QT6W8kk
perhaps because of the dynamic allocation new Node<T>( ); whenever you insert a value to the binary tree that slows down?
That only occurs when there aren't enough nodes reserved, and in addition to creating that new node there is a warning message that is output to the console. That warning hasn't shown up, so that allocation isn't occuring.

EDIT:

Performance is even worse when using an iterative algorithm:

Element count: 5000000
Name                     | Validation             | Time (seconds)
Binary Tree Sort           Succeeded                7.668
STL std::sort              Succeeded                0.679356


Element count: 5000000
Name                     | Validation             | Time (seconds)
Binary Tree Sort           Succeeded                7.277
STL std::sort              Succeeded                0.653391


Implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
template <typename T>
void sort::BinaryTree<T>::Run( const std::vector<T> & data )
{
	Node<T> * currentParent;
	Node<T> ** currentSlot;

	for( uint i = 0; i < data.size( ); i++ )
	{
		currentParent = nullptr;
		currentSlot = &m_Root;

		while (true)
		{
			if ( *currentSlot == nullptr )
			{
				Node<T> * newNode = GetNewNode( );
				newNode->Parent = currentParent;
				newNode->Value = data[i];

				*currentSlot = newNode;
				break;
			}
			else
			{
				currentParent = *currentSlot;

				if( data[i] < (*currentSlot)->Value )
					currentSlot = &currentParent->Left;
				else
					currentSlot = &currentParent->Right;
			}
		}
	}
}


Branch prediction is a player here, but I'd guess that it will cause problems for both algorithms since data is random, and shouldn't really be considered.
Last edited on
I'm going to just chalk this up to cache misses, unless anyone's got anything to say otherwise. I was expecting to be corrected on the complexity of sorting between binary-trees and quick-sort, but oh well. Marking as solved.
Topic archived. No new replies allowed.