merging sets in linear time

Dear all,

I have two std::sets S1 and S2 which I need to merge.
I know that the largest element of S1 is less than the smallest element of S2.

the additional information about S1 and S2 should enable me to do the insertion of each element in costant time instead of log(N).

what is the fastest way to calculate the union of the two sets?


Thank you all,
Panecasareccio

Theoretically, you have to write a loop that calls the hinted overload of set::insert passing S1.end() as the hint.

In practice, at least LLVM libc++ and GNU libstdc++ implementations of the range insert: S1.insert(S2.begin(), S2.end()); execute exact same loop.
Using a custom data structure, your operation could be done in constant time.
If you are using a linked list with an extra node that stores the max and min values in the list as well as the front and back nodes. If the largest element of S1 is less that the smallest element of S2 then all you do is link the front node of one to the back node of the other.

In reality, you would most likely use std::set and is would take (to be highly exact):
min(S1.size(), S2.size())log(max(S1.size(), S2.size())


Also I do not know how you only get logN time.
> In reality, you would most likely use std::set
he is. Second paragraph: ``I have two std::sets''

> I do not know how you only get logN time.
it was referring to the order of inserting one element to the set.
@ne555 I missed that, thank you.
> In reality, you would most likely use std::set and is would take (to be highly exact):
> min(S1.size(), S2.size())log(max(S1.size(), S2.size())

Linear time (amortised constant time for each element), with a properly formed hint
(The new element is inserted just before the position indicated by the hint).
ie. O(N) where N is the number of elements inserted.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <iostream>
#include <set>
#include <cassert>
#include <ctime>

template < typename T, typename U >
void merge( std::set<T>& larger, const std::set<U>& smaller )
{
    assert( *smaller.rbegin() < *larger.begin() ) ;
    for( auto iter = smaller.rbegin() ; iter != smaller.rend() ; ++iter ) 
        larger.emplace_hint( larger.cbegin(), *iter ) ;
}

struct timer
{
    ~timer()
    {
        auto end = std::clock() ;
        std::cout << double( end - begin ) / CLOCKS_PER_SEC << " secs.\n" ;
    };

    const std::clock_t begin = std::clock() ;
};

void time_it( std::size_t n )
{
    std::set<int> smaller ;
    std::set<double> larger ;
    for( std::size_t i = 0 ; i < n ; ++i )
    {
        smaller.emplace_hint( smaller.cend(), i ) ;
        larger.emplace_hint( larger.cend(), i+n ) ;
    }
    std::cout << "n == " << n << "  " ;
    {
        timer t ;
        merge( larger, smaller ) ;
    }
    assert( larger.size() == n*2 ) ;
}

int main ()
{
    std::size_t n = 256 * 1024 ;
    for( int i = 0 ; i < 5 ; ++i )
    {
        time_it(n) ;
        n *= 2 ;
    }
}

http://coliru.stacked-crooked.com/a/7d696394cfe3c36b
Last edited on
@JLBorges Thank you. Being out coded by you never ceases to give me enjoyment as I learn something new every time.
Topic archived. No new replies allowed.