STL Containers

I am having some difficulty understanding the STL containers list and forward_list. I understand that list supports bidirectional iteration, while forward_list supports only forward iteration. One thing that really throws me off is about ranges. I understand that both list and vector support half-open ranges:

[begin,end)

What is the reason that the end(), cend(), rend(), and crend() methods return iterators that point past the object? My book says it is for supporting empty ranges, but I don't quite understand that. Additionally, why do forward_lists have a method called before_begin() so that they support fully open ranges?

(begin,end)

What is the reason for this strange behavior?

One final question: lists provide many methods like splice() and insert(). But why do all forward_lists have special methods suffixed with a "_after"? Ex: splice_after() and insert_after()
Last edited on
> What is the reason that the end(), cend(), rend(), and crend() methods return iterators that point past the object?
> My book says it is for supporting empty ranges

Supporting empty ranges is one reason; for instance in an empty container, begin() == end().

Another is that with end being 'one past the last', operation on sequences are simple and efficient:

1
2
3
4
5
6
7
8
// classic: begin: iterator to the first item, end: iterator to one-past-the-last item 
template < typename ITERATOR, typename T > ITERATOR find( ITERATOR begin, ITERATOR end, const T& v ) 
{
    // != : does not demand random access iterators
    for( ; begin != end ; ++begin ) if( *begin == v ) return begin ; 
    
    return end ; // we don't need a special 'invalid position in sequence' iterator
}

Compare with:
1
2
3
4
5
6
7
8
9
10
template < typename ITERATOR, typename T > 
// ifirst: iterator to the first item, ilast: iterator to the last item
ITERATOR find( ITERATOR ifirst, ITERATOR ilast, const T& v ) 
{
    // != skips the last element
    for( ; ifirst != ilast ; ++ifirst ) if( *ifirst == v ) return ifirst ; 
    
    if( *ilast == v ) return ilast ; // this does not work if the sequence is empty
    else return ??? ; // not found; what do we return here?
}


Also, in many sequences (for example, an iterator that iterates over messages coming over a socket, with the sequence ending when the client closes the socket), it is impossible to tell before hand what the last element is. In contrast, a check for EOF or input failure (as one past the last element) is easy to implement.

> why do all forward_lists have special methods suffixed with a "_after"

From the proposal:
Since singly linked lists have strictly less functionality than doubly linked lists, the only reason to include them in the library is performance. It's important to notice, furthermore, that the performance difference isn't a matter of asymptotic complexity, but a matter of bytes and cycles. Inserting a new node is O(1) in either a singly linked list or a doubly linked list, but in a singly linked list it's faster. This fact influences most of the design decisions in this proposal. The main goal of this proposal is that forward_list should have zero space or time overhead relative to a hand-written C-style singly linked list.
...
The main disadvantage of this design is that programmers have to use insert_after and erase_after instead of the more familiar insert and erase. The advantage, however, is that all other operations are more efficient: iterator dereference is a single pointer dereference, and the list object itself doesn't need to use a second word to store a tail pointer. Admittedly these performance issues are small: they involve changes in small constant factors, not changes of asymptotic complexity. I still consider them important, however, because those constant factors are the only reason for using a singly linked list instead of a doubly linked list the first place. The main design goal for this proposal is that forward_list should have zero overhead relative to a hand-written C-style linked list like xmlEnumeration, and a design in which iterators point to the preceding list node does not satisfy that goal. Long experience shows that for some uses the restrictions of C-style or lisp-style singly linked lists are not onerous. For other uses, std::list is still available. - http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2543.htm
Thanks. I have one more question. What is the difference between std::list and std::vector (besides the performance and algorithm complexity)? I know that std::list is used for quick O(1) inserts/deletes and slower, linear O(n) lookups. But what is actually different in the usage? are they both dynamic arrays?
Last edited on
How they are stored in memory is the biggest difference. A vector, like an array or a string, is stored in a solid block of memory. Every element is an offset of the first one, giving you random access to any element. A list can be scattered all over memory and each element need to store the address of the next, and/or previous, element. You have to iterate through every element in order to reach the one you want.
> I know that std::list is used for quick O(1) inserts/deletes and slower, linear O(n) lookups.
> But what is actually different in the usage? are they both dynamic arrays?

std::list<> is a doubly linked list; for a pictorial representation of what admkrk described, see: https://en.wikipedia.org/wiki/Doubly_linked_list

Because of the different way in which list stores its data, it has some inherent properties which make it the sequence container of choice in certain contexts:

1. Modifiers of std::list<> do not invalidate iterators, references or pointers to the other elements.

2. Elements can be transferred from one std::list<T,A> to another std::list<T,A> (same T, compatible A) without actually moving or copying the elements, and without invalidating iterators, references or pointers to the transferred elements.
http://en.cppreference.com/w/cpp/container/list/splice

3. std::list<> provides 'strong-exception-guarantees' for modifiers; if an exception is thrown during a standard-library operation on the list, the list rolls back to the state in which it was before the operation started. In contrast, std::vector<> provides only the 'basic-exception-guarantee' for some modifiers; if an exception is thrown during these standard-library operation on the vector, the vector remains in a 'good state' and no resources are leaked.
http://stroustrup.com/3rd_safe.pdf


> I know that std::list is used for quick O(1) inserts/deletes and slower, linear O(n) lookups.

In the vast majority of real-life scenarios, even when there are a large number of inserts/deletes, std::vector<> would still be the sequence container of choice.

I emphasize the importance of cache effects. In my experience, all but true experts tend to forget those when algorithms are discussed.

And, yes, my recommendation is to use std::vector by default. More generally, use a contiguous representation unless there is a good reason not to. Like C, C++ is designed to do that by default.
- Stroustrup http://www.stroustrup.com/bs_faq.html#list


Consider a simple example: generate N random integers and insert them into a sequence so that each is in its proper position in the numerical order. For example, if the random numbers are 5, 1, 4, and 2, the sequence should grow like this:
5
1 5
1 4 5
1 2 4 5

Once the N elements are in order, we remove them one at a time by selecting a random position in the sequence and removing the element there. For example, if we choose positions 1, 2, 0, and 0 (using 0 as the origin), the sequence should shrink like this:
1 2 4 5
1 4 5
1 4
4

Now, for which N is it better to use a linked list than a vector (or an array) to represent the sequence? If we naively apply complexity theory, that answer will be something like, “Is this a trick question? A list, of course!” We can insert an element into and delete from a linked list without moving other elements. In a vector, every element after the position of an inserted or deleted element must be moved. Worse still, if you don’t know the maximum number of elements in advance, it’s occasionally necessary to copy the entire vector to make room for another element.

Depending on the machine architecture and the programming language, the answer will be that the vector is best for small to medium values of N. When I ran the experiment on my 8-Gbyte laptop, I found N to be much larger than 500,000.

...

In an attempt at fairness, I didn’t use a binary search to speed up insertion into the vector. Nor did I use random access to find the deletion point in the vector version. This keeps the number of elements traversed the same for all versions. In fact, I used identical generic code for the vector and the lists.

Is this an academic curiosity? No. Infrastructure application developers tell me that compactness and predictable access patterns are essential for efficiency.
- Stroustrup http://www.stroustrup.com/Software-for-infrastructure.pdf


When the type of the elements in the sequence are more complex than an integer (say std::string), still consider using std::vector<> as the sequence container. Move semantics has profoundly changed the way that we think about performance.

For instance, for a sequence of 16000 random string insertions, 8000 removals from the middle:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <iostream>
#include <vector>
#include <list>
#include <algorithm>
#include <random>
#include <ctime>
#include <iomanip>

std::string random_string() // generate a random string
{
    static char alphabet[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
                             ",./<>?;':\"[]\\{}|!@#$%^&*()_+-=`~" ;
    static constexpr std::size_t ALPHABET_SIZE = sizeof(alphabet) - 1 ;
    static std::mt19937 rng( std::random_device{}() ) ;
    static std::uniform_int_distribution<int> distribution( 10, ALPHABET_SIZE ) ;

    const auto sz = distribution(rng) ;
    std::shuffle( alphabet, alphabet+ALPHABET_SIZE, rng ) ;
    return { alphabet, alphabet+sz } ;
}

// insert a string maintaining sorted order
template < typename A, template<typename,typename> class CNTR >
void insert( CNTR< std::string, A >& sorted_seq, const std::string& v ) // invariant: sorted
{ sorted_seq.insert( std::upper_bound( std::begin(sorted_seq), std::end(sorted_seq), v ), v ) ; }

// erase the number at the middle of the sequence, maintaining sorted order
template < typename A, template<typename,typename> class CNTR >
void erase( CNTR< std::string, A >& seq ) // invariant: !empty()
{ seq.erase( std::next( std::begin(seq), seq.size()/2 ) ) ; }

template < typename A, template<typename,typename> class CNTR >
auto test_it( CNTR< std::string, A >& seq, const std::vector<std::string>& values )
{
    const auto start = std::clock() ;
    for( const auto& v : values ) insert( seq, v ) ;
    while( seq.size() > values.size()/2 ) erase(seq) ;
    return ( std::clock() - start ) * 1000.0 / CLOCKS_PER_SEC ;
}

int main()
{
    constexpr std::size_t N = 16'000 ;
    std::vector<std::string> values(N) ;
    std::generate( std::begin(values), std::end(values), random_string ) ;

    std::cout << "vector: " ;
    std::vector<std::string> vec ;
    std::cout << std::setw(5) << test_it( vec, values ) << " millisecs\n" ;

    std::cout << "  list: " ;
    std::list<std::string> lst ;
    std::cout << std::setw(5) << test_it( lst, values ) << " millisecs\n" ;
} 

echo '======== clang++ ========' && clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors main.cpp && ./a.out
echo '========== g++ ==========' && g++ -std=c++14 -O3 -Wall -Wextra -pedantic-errors main.cpp && ./a.out
======== clang++ ========
vector:  1330 millisecs
  list:  3630 millisecs
========== g++ ==========
vector:   780 millisecs
  list:  4110 millisecs

http://coliru.stacked-crooked.com/a/216ea83c08605c45

Even when we refuse to exploit random access that std::vector<> provides, it still performs on par with std::list<> (for the above test, on the same machine.)

echo '======== clang++ ========' && clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors main.cpp && ./a.out
echo '========== g++ ==========' && g++ -std=c++14 -O3 -Wall -Wextra -pedantic-errors main.cpp && ./a.out
======== clang++ ========
vector:  3070 millisecs
  list:  3120 millisecs
========== g++ ==========
vector:  2570 millisecs
  list:  3390 millisecs

http://coliru.stacked-crooked.com/a/49dea20dd2be73d2
Topic archived. No new replies allowed.