Efficiency: Linked list or sorting an array?

I have an algorithm and I want to make it as efficient as possible. Basically it just involves putting numbers in order. I have two options, but I have no idea which one would be more efficient:

1. Using a doubly linked list. Every time a user wants to add a new number, the algorithm will start searching the correct place for the number from the beginning of the list. This is efficient per se, but once there are about a million numbers and a number has to be put in at the end of the list, the algorithm must go through all the 999 999 numbers before it.

2. Using a container to store all the numbers first, then sorting the numbers. In this case, adding all the numbers is fast, but the actual sorting will take a long time.

Which option would be more efficient? I was thinking of using maybe merge sort or quick sort in option 2. Yes, I'm aware I could just use vector and sort, but that's not my goal here.
I suggest to look at heap data structure:
http://en.wikipedia.org/wiki/Heap_%28data_structure%29
http://www.cprogramming.com/tutorial/computersciencetheory/heap.html

Filling it with numbers and retrievind it in order is O(N logN), which is more efectibe than both of your approaches.
<algorithm> header contains a bunch of function to work with heaps:
http://en.cppreference.com/w/cpp/algorithm/make_heap
http://en.cppreference.com/w/cpp/algorithm/push_heap
http://en.cppreference.com/w/cpp/algorithm/pop_heap
and others.
I'd suggest 1.

the algorithm must go through all the 999 999 numbers before it.
not if you use binary search:

http://www.cplusplus.com/reference/algorithm/lower_bound/?kw=lower_bound
http://www.cplusplus.com/reference/algorithm/upper_bound/?kw=upper_bound
> Filling it with numbers and retrievind it in order is O(N logN),
> which is more efectibe than both of your approaches.
merge sort is also O(N log N)


> the algorithm must go through all the 999 999 numbers before it.
>> not if you use binary search:
you will perform O(lg N) comparison in each step. But because is using a list, to reach the cell to do the comparison, you'll need to perform O(N) next operations. So the total time is O(N^2)
If you use an array, then the insertion will take O(N), so the total time is O(N^2)


> Yes, I'm aware I could just use vector and sort
¿what kind of container do you imply in your point 2 then?
> 2. Using a container to store all the numbers first, then sorting the numbers.

Yes; container: std::vector, algorithm: std::sort. If and only if actual measurements reveal that std::sort is not fast enough to meet the performance constraint, use a specialised sort that can take advantage of the knowledge that we are sorting numbers. For instance, a spread sort. https://en.wikipedia.org/wiki/Spreadsort (Steven Ross's library was recently reviewed and accepted into boost).

I emphasize the importance of cache effects. In my experience, all but true experts tend to forget those when algorithms are discussed.

And, yes, my recommendation is to use std::vector by default. More generally, use a contiguous representation unless there is a good reason not to. Like C, C++ is designed to do that by default.
- Stroustrup http://www.stroustrup.com/bs_faq.html#list



> 1. Using a doubly linked list. Every time a user wants to add a new number,
> the algorithm will start searching the correct place for the number from the beginning of the list.
> This is efficient per se, but once there are about a million numbers
> and a number has to be put in at the end of the list,
> the algorithm must go through all the 999 999 numbers before it.

You may be surprised if you make some actual measurements.

Consider a simple example: generate N random integers and insert them into a sequence so that each is in its proper position in the numerical order. For example, if the random numbers are 5, 1, 4, and 2, the sequence should grow like this:
5
1 5
1 4 5
1 2 4 5

Once the N elements are in order, we remove them one at a time by selecting a random position in the sequence and removing the element there. For example, if we choose positions 1, 2, 0, and 0 (using 0 as the origin), the sequence should shrink like this:
1 2 4 5
1 4 5
1 4
4

Now, for which N is it better to use a linked list than a vector (or an array) to represent the sequence? If we naively apply complexity theory, that answer will be something like, “Is this a trick question? A list, of course!” We can insert an element into and delete from a linked list without moving other elements. In a vector, every element after the position of an inserted or deleted element must be moved. Worse still, if you don’t know the maximum number of elements in advance, it’s occasionally necessary to copy the entire vector to make room for another element.

Depending on the machine architecture and the programming language, the answer will be that the vector is best for small to medium values of N. When I ran the experiment on my 8-Gbyte laptop, I found N to be much larger than 500,000.

...

In an attempt at fairness, I didn’t use a binary search to speed up insertion into the vector. Nor did I use random access to find the deletion point in the vector version. This keeps the number of elements traversed the same for all versions. In fact, I used identical generic code for the vector and the lists.

Is this an academic curiosity? No. Infrastructure application developers tell me that compactness and predictable access patterns are essential for efficiency.
- Stroustrup http://www.stroustrup.com/Software-for-infrastructure.pdf


Repeat: treat std::vector as the default container of choice.
Topic archived. No new replies allowed.