Fastest comparison algorithm based on a predefined container?

Pages: 123
Suppose you have defined a container of elements and you want do define a comparison function between elements based on the ordering of the elements in that container. What algorithm for the comparison would be the most efficient?

Example:
master = {tom, sam, mary, sally, bert, frank}
subset = {bert, sam, mary, tom}
goal: change subset to {tom, sam, mary, bert}

My current idea is to simply iterate from the beginning of the container, and whichever of the two elements is found first is the lesser (assuming the second is not the same as the first). It seems kind of naïve though. Can someone think of a better performing algorithm? This is what I have so far:
 ``1234567891011121314151617181920212223242526272829303132333435363738394041`` ``````#include #include #include #include #include #include template bool lessThan (const typename std::iterator_traits::value_type& a, const typename std::iterator_traits::value_type& b, ForwardIterator first, ForwardIterator last) { while (first != last) { if (*first == a && *first != b) return true; if (*first == b) return false; ++first; } throw std::logic_error ("Neither element found in the given the range."); } class Object {} *apple = new Object, *table = new Object, *chair = new Object, *bowl = new Object; std::list objects = {apple, table, chair, bowl}; bool compare (Object* a, Object* b) { return lessThan (a, b, objects.begin(), objects.end()); } int main() { try { bool less = compare (table, bowl); std::cout << std::boolalpha << "less = " << less << std::endl; less = compare (chair, apple); std::cout << "less = " << less << std::endl; less = compare (table, table); std::cout << "less = " << less << std::endl; } catch (const std::logic_error& error) {std::cout << error.what() << std::endl;} std::vector v = {bowl, table, apple}; std::sort (v.begin(), v.end(), compare); }``````

Output:
less = true
less = false
less = false

This method I think is poor if the container is really, really long though.
Would perhaps forcing the container to have a random access iterator, like vector, and then writing a specialized comparison function based on that perform even faster? Or perhaps force the container to be a map to integers, and compare the elements by comparing the integer mapped values?
Last edited on
I'm not 100% sure what you want, but if my understanding is correct, would you like to sort container B based on the order of the elements in container A? If not, it is near impossible to create a generalized comparison function to order elements based on the order of them in a specific container - what happens if your preferred container is {1, 2, 3, 2} and you try to compare 2 < 3 or 3 < 2?
I don't understand why `compare (table, bowl)` is true, or much else for that matter.
@kbw Because I defined the container:
 `` `` ``std::list objects = {apple, table, chair, bowl};``

which states that table < bowl.
Last edited on

`std::list<Object*> objects = {apple, table, chair, bowl, table};`
@LB. I missed that. Ok, pre: the predefined container shall have no repeats. Then I believe the desired comparison function based on the container will be well-defined.

Would using the container:
std::map<Object*, int> = {{apple, 1}, {table, 2}, {chair, 3}, {bowl, 4}};
allow faster comparison if we simply compare the integers? Hence no iteration involved like my first method (logarithmic time vs linear time--imagine the container were really big and the 2 elements compared were at the very end!)?
Last edited on
I think it is prudent to ask first, do you really need the comparison functor? If the end goal is just to arrange a container in a similar order to a template container, then I'd say you should avoid using comparators.
It sounds like you're using the wrong container. What's the real problem you're trying to solve?
->I think it is prudent to ask first, do you really need the comparison functor? If the end goal is just to arrange a container in a similar order to a template container, then I'd say you should avoid using comparators.

If you have a subset of an ordered container, whose ordering of elements do not match the larger container's ordering of elements, how else can you reorder the subset without defining a comparison function? That is the final goal.

My final lines above
 ``12`` ``````std::vector v = {bowl, table, apple}; std::sort (v.begin(), v.end(), compare);``````

gets the job done. I thought the only issue was performance, but if there is a better method altogether without even using the compare functor, I'd like to hear what that is.

Or maybe take a copy of the original set, and remove from that set all the elements that are absent from the smaller set? That would work, and no comparison function needed, but I have a feeling that will perform slower, depending on how many elements are absent.
Last edited on
 If you have a subset of an ordered container, whose ordering of elements do not match the larger container's ordering ...

I'm probably missing something.

If some elements of a container are copied to (or pointed to from) another container that specifies a different sort order, why does the order in the larger container matter?

Why will a normal sort predicate not suffice?
Last edited on
@kbw

master = {tom, sam, mary, sally, bert, frank}
subset = {bert, sam, mary, tom}

goal: change subset to {tom, sam, mary, bert}

What normal sort predicate are you referring to? What is the default < operator?
Last edited on
If you use selection sort you do not need a comparator. Just construct a new container by pulling elements from the old one in the order of the template.
Last edited on
Here's what I found (using std::list). The removal method is much slower than my original method:
 ``1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465`` ``````#include #include #include #include #include #include #include #define show(variable) std::cout << #variable << " = " << variable << std::endl; struct Timer { const std::clock_t begin; Timer() : begin (std::clock()) {} ~Timer() { const std::clock_t end = std::clock(); std::cout << double (end - begin) / CLOCKS_PER_SEC << " seconds." << std::endl; std::cout << "std::difftime = " << std::difftime (end, begin) << std::endl; }; }; template bool lessThan (const typename std::iterator_traits::value_type& a, const typename std::iterator_traits::value_type& b, ForwardIterator first, ForwardIterator last) { while (first != last) { if (*first == a && *first != b) return true; if (*first == b) return false; ++first; } throw std::logic_error ("Neither element found in the given the range."); } class Object {} *apple = new Object, *table = new Object, *chair = new Object, *bowl = new Object; std::list objects = {apple, table, chair, bowl}; bool compare (Object* a, Object* b) { return lessThan (a, b, objects.begin(), objects.end()); } int main() { const long N = 1000000; std::vector v = {bowl, table, apple}; { Timer timer; for (int i = 0; i < N; i++) std::sort (v.begin(), v.end(), compare); } { Timer timer; for (int i = 0; i < N; i++) { std::list copy = objects; for (Object* x : objects) { if (std::find(copy.begin(), copy.end(), x) == copy.end()) copy.remove(x); } } } } /* Output: 1.973 seconds. std::difftime = 1986 9.124 seconds. std::difftime = 11114 */``````
Oh yeah, I forgot to ask. Why are you using std::list at all? std::vector is orders of magnitude more efficient for this.
If you're going to time stuff, compile it with optimizations.

For your code, I get the following output:
 ```0.031 seconds. std::difftime = 31 0.531 seconds. std::difftime = 531```

For a slight modification where lines 52 through 55 are elided:
 ```0.046 seconds. std::difftime = 46 0.531 seconds. std::difftime = 531```

What does that tell you about what you're timing? What kind of meaningful result did you expect to get with 3 element containers?

(1) If you are just sorting < 50 elements, it hardly matters what sort method you use.

(2) But if speed really is a concern (as part of sorting a larger collection), an insertion sort really is fastest for < 50 elements.

(3) If you have two or more presorted sequences that need to be put together into a single sorted sequence, use a merge sort. (See <algorithm>)

(4) It is unlikely you need to write special sorting stuff anyway -- unless you are handling large amounts of specialized data, use the STL sorting methods. They really will work fastest for generalized data (such as sorting strings).

(5) Keep in mind that a std::map or std::set sorts stuff you add to the container, so these containers are essentially sorting algorithms as well.

Hope this helps.
Pointers/reference_wrappers are inexpensive to copy and test for equality, and there is no good way to implement the comparator, so a custom selection sort should be the most effective.
LB wrote:
Just construct a new container by pulling elements from the old one in the order of the template.
Last edited on
Here is what you do:
- Put your master list into symbol table i.e. std::map or std::unordered_map (if performance degrades with std::map) and map each element to the index it belongs to in this master list.

- Next, use this comparison code:
 ``123`` ``````template bool comp( const T& obj1, const T& obj2) { return masterListMap[obj1] < masterListMap[obj2]; }``````

- Finally take your subset list (has to be an array or a container that supports indexing - std::vector, std::array, std::deque, etc) and feed this list to the function std::sort (`#include <algorithm> `)and call std::sort like this:
`std::sort(subset.begin(), subset.end(), comp);`

This should sort the subset in the order they appear in the master list. Hope that helps. Note I have not written any code to test this method, but I know it should work and any compiler errors you get, just post it here

EDIT: std::sort uses introsort to sort a list so you are guaranteed Nlog2N time complexity:
http://en.wikipedia.org/wiki/Introsort

But note that depending on the symbol table type you choose, the worst case performance of this algorithm as a whole degrades to log2M * Nlog2N, where M is the size of master list and N is size of sublist. But even at this, don't get greedy and decide to go for the hashtable implementation of std::unordered_map, unless it is needed
Last edited on
I was somewhat certain that Smac89's recommendation would be much better, but was surprised at the results:
 ``123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111`` ``````struct Timer { const std::clock_t begin; Timer() : begin (std::clock()) {} ~Timer() { const std::clock_t end = std::clock(); std::cout << double (end - begin) / CLOCKS_PER_SEC << " seconds." << std::endl; }; }; template bool lessThan (const typename std::iterator_traits::value_type& a, const typename std::iterator_traits::value_type& b, ForwardIterator first, ForwardIterator last) { while (first != last) { if (*first == a && *first != b) return true; if (*first == b) return false; ++first; } throw std::logic_error ("Neither element found in the given the range."); } class Object {} *apple = new Object, *table = new Object, *chair = new Object, *bowl = new Object, *doll = new Object, *banana = new Object, *house = new Object, *baby = new Object, *candy = new Object, *plate = new Object; const std::list master0 = {apple, table, chair, bowl, doll, banana, house, baby, candy, plate}; bool compare (Object* a, Object* b) { return lessThan (a, b, master0.begin(), master0.end()); } std::unordered_map master1 = { {apple, 1}, {table, 2}, {chair, 3}, {bowl, 4}, {doll, 5}, {banana, 6}, {house, 7}, {baby, 8}, {candy, 9}, {plate, 10} }; bool comp_unordered_map (Object* a, Object* b) { return master1[a] < master1[b]; } template bool comp_unordered_mapT (const T& obj1, const T& obj2) { return master1[obj1] < master1[obj2]; } std::map master2 = { {apple, 1}, {table, 2}, {chair, 3}, {bowl, 4}, {doll, 5}, {banana, 6}, {house, 7}, {baby, 8}, {candy, 9}, {plate, 10} }; bool comp_map (Object* a, Object* b) { return master2[a] < master2[b]; } int main() { const long N = 1000000; const std::vector subset = {bowl, table, apple, plate, house}; std::vector v = subset; std::cout << "OP method:" << std::endl; { Timer timer; for (int i = 0; i < N; i++) {std::sort (v.begin(), v.end(), compare); v = subset;} // v reset after sort } std::cout << "\nUsing removal method:" << std::endl; { Timer timer; for (int i = 0; i < N; i++) { std::list copy = master0; for (Object* x : master0) { if (std::find(copy.begin(), copy.end(), x) == copy.end()) copy.remove(x); } } } std::cout << "\nUsing std::unordered_map:" << std::endl; { Timer timer; for (int i = 0; i < N; i++) {std::sort (v.begin(), v.end(), comp_unordered_map); v = subset;} } std::cout << "\nUsing std::unordered_mapT:" << std::endl; { Timer timer; for (int i = 0; i < N; i++) {std::sort (v.begin(), v.end(), comp_unordered_mapT); v = subset;} } std::cout << "\nUsing std::_map:" << std::endl; { Timer timer; for (int i = 0; i < N; i++) {std::sort (v.begin(), v.end(), comp_map); v = subset;} } } /* Output: OP method: 0.531 seconds. Using removal method: 1.844 seconds. Using std::unordered_map: 1.546 seconds. Using std::unordered_mapT: 1.547 seconds. Using std::_map: 1.391 seconds. */``````

Perhaps my samples are too small, but the difference is significant regardless. I don't see why unordered_map/map is slower, and by so much.
Last edited on
I'm guessing it has to do with hash collisions because you have not defined an effective hash function for objects of type Object. (Wonder how std::hash hashes pointers...
http://stackoverflow.com/questions/20953390/what-is-the-fastest-hash-function-for-pointers
http://jfdube.wordpress.com/2011/10/12/hashing-strings-and-pointers-avoiding-common-pitfalls/ )

Note: You are sorting an already sorted list for most of these timings (i.e. after you have sorted the list the first time, it is already sorted, so subsequent sorts are just there to drag time) so although OP's method might seem faster, and others seem slower it is not an accurate report on the performance of the aforementioned methods.

Also when you post benchmark reports, it is good to include the specifications of the machine that carried out this computation. i.e. processor, ram, os type, etc
Last edited on
Pages: 123