Multiplicity in map

Dear forumers,

I have a question maybe related to maps.

I would like to find solution for the following problem: I have lots of vectors, some of the are different, some of them are not. I would like to count the occurence of every vector and store every vector only once. I need to be performed in the most effectise, fastest way.

I though about using a map, so if I try to insert an element which was previosly used then the size of the map won't increase. So this way I can count the occurences alltogether but not one by one.

Could help me to find a fast solution?

Thank you very much!
Use std::set. http://en.cppreference.com/w/cpp/container/set

Copying vectors is expensive, we can avoid making copies by wrapping the vectors in std::reference_wrapper.
http://en.cppreference.com/w/cpp/utility/functional/reference_wrapper

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <iostream>
#include <vector>
#include <set>
#include <functional>

std::size_t count_unique_vectors( const std::vector< std::vector<int> >& vectors )
{
    using vec_wrapper = std::reference_wrapper< const std::vector<int> > ;
    static const auto cmp_less = [] ( const vec_wrapper& a, const vec_wrapper& b )
                                 { return a.get() < b.get() ; } ;

    std::set< vec_wrapper, decltype(cmp_less) > set(cmp_less) ;
    for( const auto& vec : vectors ) set.insert( std::cref(vec) ) ;
    return set.size() ;
}

int main()
{
    const std::vector< std::vector<int> > my_vectors =
    {
        { 1, 2, 3, 4 }, { 1, 3, 2, 4 }, { 1, 2, 3, 4 }, { 4, 2, 3, 1 }, { 1, 2, 3, 4 },
        { 1, 2, 3 }, { 1, 2, 3  }, { 1, 2, 4 }, { 2, 3, 1 }, { 1, 2, 3 }, { 1, 1, 3 },
        { 1, 2 }, { 2, 1 }, { 1, 2 }, {}, { 1, 2 }, {1}, {2}, {}, {2}, {1}, {}, {1}, {1,2}, {2}
    };

    std::cout << my_vectors.size() << " vectors in all, of which "
              << count_unique_vectors(my_vectors) << " are unique\n" ;
}

http://coliru.stacked-crooked.com/a/470c178030be270c
Thank you very much JLBorges!

Maybe I wasn't enough clear. I would like to get a list of the occurences of every different vector.

So the 12 unique vectors should be stored somewhere since I would like to use them later. But only the unique ones can be stored because there are millions of vectors and they can't fit all in the memory like a vector of vectors. So the input vectors are generated in real time by another function and if the vector is unique then store it set the occurence to 1, if it's not unique then only increase it's occurence.

Thank you!
Yes, your idea of using a map to store the unique vectors and their frequency of occurrence is sound. It should be fast enough if the vectors are not large.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <iostream>
#include <vector>
#include <map>
#include <random>
#include <algorithm>

// simulate random vectors generated in real time
std::vector<int> generate_random_vector()
{
    static const std::size_t min_sz = 3 ;
    static const std::size_t max_sz = 5 ;
    static std::mt19937 rng ; // deliberately not seeded
    static std::uniform_int_distribution<std::size_t> rand_sz( min_sz, max_sz ) ;
    static int population[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } ;

    const std::size_t n = rand_sz(rng) ; // generate a random size in [ min_sz, max_sz ]
    std::shuffle( std::begin(population), std::end(population), rng ) ;
    return { population, population+n } ; // return vector initialised with the first n elements of the shuffled sequence
}

int main()
{
    const std::size_t n_vectors = 4'000'000 ;
    std::map< std::vector<int>, std::size_t > counts ; // key: vector mapped value: #occurrences

    // if the vector is unique then store it set the occurence to 1, if it's not unique then only increase it's occurence. 
    for( std::size_t i = 0 ; i < n_vectors ; ++i ) ++counts[ generate_random_vector() ] ;

    std::cout << "vectors generated: " << n_vectors << "  unique vectors: " << counts.size() << '\n' ;
}

http://coliru.stacked-crooked.com/a/2d902f62382c4974
Note: the time reported includes the time taken to generate the random vectors, which is not insignificant.
Thank you very much! It works perfectly!
Topic archived. No new replies allowed.