|I have an idea ...|
|I emphasize the importance of cache effects. In my experience, all but true experts tend to forget those when algorithms are discussed.|
And, yes, my recommendation is to use std::vector by default. More generally, use a contiguous representation unless there is a good reason not to. Like C, C++ is designed to do that by default.
- Stroustrup http://www.stroustrup.com/bs_faq.html#list
|Consider a simple example: generate N random integers and insert them into a sequence so that each is in its proper position in the numerical order. For example, if the random numbers are 5, 1, 4, and 2, the sequence should grow like this:|
1 4 5
1 2 4 5
Once the N elements are in order, we remove them one at a time by selecting a random position in the sequence and removing the element there. For example, if we choose positions 1, 2, 0, and 0 (using 0 as the origin), the sequence should shrink like this:
1 2 4 5
1 4 5
Now, for which N is it better to use a linked list than a vector (or an array) to represent the sequence? If we naively apply complexity theory, that answer will be something like, “Is this a trick question? A list, of course!” We can insert an element into and delete from a linked list without moving other elements. In a vector, every element after the position of an inserted or deleted element must be moved. Worse still, if you don’t know the maximum number of elements in advance, it’s occasionally necessary to copy the entire vector to make room for another element.
Depending on the machine architecture and the programming language, the answer will be that the vector is best for small to medium values of N. When I ran the experiment on my 8-Gbyte laptop, I found N to be much larger than 500,000.
In an attempt at fairness, I didn’t use a binary search to speed up insertion into the vector. Nor did I use random access to find the deletion point in the vector version. This keeps the number of elements traversed the same for all versions. In fact, I used identical generic code for the vector and the lists.
Is this an academic curiosity? No. Infrastructure application developers tell me that compactness and predictable access patterns are essential for efficiency.
- Stroustrup http://www.stroustrup.com/Software-for-infrastructure.pdf
echo '======== clang++ ========' && clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors main.cpp && ./a.out echo '========== g++ ==========' && g++ -std=c++14 -O3 -Wall -Wextra -pedantic-errors main.cpp && ./a.out ======== clang++ ======== vector: 1330 millisecs list: 3630 millisecs ========== g++ ========== vector: 780 millisecs list: 4110 millisecs
echo '======== clang++ ========' && clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors main.cpp && ./a.out echo '========== g++ ==========' && g++ -std=c++14 -O3 -Wall -Wextra -pedantic-errors main.cpp && ./a.out ======== clang++ ======== vector: 3070 millisecs list: 3120 millisecs ========== g++ ========== vector: 2570 millisecs list: 3390 millisecs
// auto itrVec = std::begin(myVec) + pos; // O(1)