When is an Assumed Median sort faster?

I have an array of pointers to structures and I need to sort that array multiple times based on various structure elements. For example: read 300 elements from an external source. For each element allocate memory for the structure and save the pointer to that element in an array. I then need to sort the array 20 different times based on different elements and save the results in other parts of the structure. A simple in line bubble sort will work, but would it be faster to create an assumed median function and do recursive calls on the halves until the sort is complete?

In other words, would the time required for the recursive function calls override the diminished testing necessary for the median sort? I know that a bubble sort is best for say 5 or 10 elements, and the assumed median sort is best for 10,000 elements. Does anyone know the break even element count?


Use std::sort(). It would be an optimised sort for the general case.
http://en.cppreference.com/w/cpp/algorithm/sort

Typically, implemented as a hybrid sort that starts with introsort: https://en.wikipedia.org/wiki/Sort_(C++)
OK, let me get more specific here. This is a stock market analysis program that I am writing. I start with a list of market symbols. For each symbol I allocate a struct:

1
2
3
4
5
6
7
struct Market {
	 char Symbol[16];
 	int Composites[10];
	int Ratings[10];
	double RORs[16];	// M1,M3,M6,Y1,Y3,Y5
 	int Ranks[16]; 		// M1,M3,M6,Y1,Y3,Y5
} Markets[1000];


I download historical prices and calculate the Rates of Return (RORs) for 1 month, 3 months, etc. and store the pointer to the struct in an array of Markets. I then sort the array of pointers on each ROR and after each sort I define the Rank of that ROR. ABC was the best (1) investment for the past month and the sixth best (6) for the last 3 months, etc.

The next step is to create composite ratings by processing a secondary list of various ROR combinations. For example, add the M3, M6, and Y1 ranks together, store that number in Composites[x], sort the list again on Composites [x], and store that index in Ratings[x] for each requested combination.

I’m actually working with 16 different RORs, not the 6 commented ones. A list of 300 or more markets sorted up to 26 times can result in a fair amount of processing time which is why I’m looking for the most efficient sort algorithm. Most of the packaged algorithms do a lot of unnecessary testing which makes sure they never crash and burn, but they do take longer to run. I can validate the data once when I calculate the RORs.




Most of the packaged algorithms do a lot of unnecessary testing which makes sure they never crash and burn, but they do take longer to run.

What makes you believe this is true?
Just to show that they don't, look what happens when you give an sequence and comparator which does not satisfy preconditions.
http://coliru.stacked-crooked.com/a/5cdfe9d1334f01a6

Edit: checked GCC implementation. No sanity checks inside algorithm. Not even if begin precedes end.
Last edited on
Yes, universal package code is a lot slower than properly written custom code!

My last job was Senior Software Engineer for the Market Data division of a local Hedge Fund. One of my responsibilities was to make changes to the internal code that managed our database. When you have to keep up with every trade being made during the day in real time, you get to see just how slow the packaged math algorithms are as compared to custom database language raw code.

For example, if you are tracking the average of the last 50 trades made on a symbol, the standard math package average routine would test each number for its existence, add all 50 numbers together, then divided by the number of real numbers in the list of 50. When writing database code you keep a list of the last 50 trades for each symbol you follow and a rotating pointer to the end of that list. When a new trade happens you test it against the previous average to see if you have to ignore it (the average was $10.42 and the new trade was $1026, someone entered the decimal point wrong). If you keep it, you use the pointer to subtract the 50th old trade from the total, replace it with the new trade, add the new trade to the total, and divided by 50 so you have the last average for the next trade. One more test to see if you are at the end of the list of 50 and move the pointer to the beginning if necessary.

The package does 50 tests, 50 adds with pointer moves, and a division. The custom code does one test and if ok, one add, one divide and one more test to increment the pointer one time!

When you are running this code a few million times a minute, every operation counts, and the package code takes way too long to run.

So, this is algorithm problem, not its implementation. BTW, there is no standard average routine is standard library.
Your question was about sorting sequence without any assumptions. I am still wanting proofs that standard implementation is suboptimal for that case. Or that sorting routine is a bottleneck in your problem and you are not doing premature optimisation.
I fail to see what any of that has to do with sorting.
What this has to do with sorting!

I’m trying to minimize the size of my database. My current software is in Beta testing mode and the algorithms and concept seem to be working fine. The code is currently gathering the data properly to take dated snapshots of the market. The Beta code currently runs in a batch mode to create multiple programmable views of the market and then stores everything in HTML format on your local disk drive. This requires a lot of data sorting to determine the order in which each appears on each page. From this programmable viewpoint, what are the best investments?

Using the software I can look at the market as of today. Decide to invest in either ABC or XYZ etc. and then highlight each symbol or market sector with a different color. Those investment might look great on today’s chart. Each chart has a next and previous link. By pressing the Previous link I can see how those investment would have looked last month and the month before etc.

I am trying to move the code from a Beta batch mode process into an interactive window’s form process. I have two choices. I can save my historical data as merely a list of Rates of Return and run the sort algorithms every time I hit a next or previous link. The second choice is to do all of the sorting each time I create a snapshot and store all of the sort indexes along with the rates of return which would about triple the size of the database.

I like the flexibility of option 1 because it would allow me to change the list of markets on the fly but it will require running all the sorts each time the date is changed. The key to the eventual software design is the ability to create a custom market view including potential investment and then rapidly see how those investment decisions would have performed in the past by rapidly clicking the Previous link.

In the Beta version I merely have to view the page on the previous month that has already been created. In the release version I have to create the page on the fly. If I already know the order of the page, the creation can be instant. How much of a time lag will there be if I have to sort everything multiple times to determine the proper page order before I create the page?


How much of a time lag will there be if I have to sort everything multiple times to determine the proper page order before I create the page?
I do not know. Whay don't you test it? Profile your code and find out if this place need optimisations.

Another thing which really helps in those situations is preemptive generation. Start generating page as backgroung process before it is requested. For example, if your application has "next month", "previous month", then there is high chance that those buttons will be clicked most often. You can start process of generating those pages (actually one, other should be still cached) each time you press those buttons.
Thanks, I think the preemptive code approach might be the best alternative.

I could run a parallel task that does all the sorting and restart that code every time the user changes the list of markets they want to watch. The page creation is almost instantaneous once the order of markets is known.
Topic archived. No new replies allowed.