Adding Unique Items to Map & Counting Duplicates

I have an unordered_map like so:
1
2
std::unordered_map<std::string, int> materialToAmount;
// ^ Example key/value: {"cobblestone", 1} 


I want to add 1 to the value. How can I do this?

My ultimate goal is to:
1. Read from a file with the format: (Done)
1
2
3
OAK_PLANK OAK_PLANK OAK_PLANK
COBBLESTONE REDSTONE COBBLESTONE
COBBLESTONE IRON_INGOT COBBLESTONE

2. Then store each material name into a map as a key. (Work in progress)
- Only store the material name once.
- Set the value to 1 for first insertion.
* Add an additional 1 to value for each identical material found.
- End Result Example: {"cobblestone", 4}
- Can use unordered or normal, whichever works best with ~400 keys. I'm using unordered at the moment.

Is there a better way to go about this? I've been working on this a long time and am having trouble wrapping my head around this..

Help would be greatly appreciated, thanks!


- - -
Cheers to anyone who understands what this program is used for based solely off the material names. :P
Use a normal std::map with the key as the item and the mapped value as the count of items.

For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <iostream>
#include <string>
#include <cctype>
#include <map>
#include <sstream>
#include <iomanip>

// strip white space, convert everything to upper case
std::string make_key( std::string str )
{
    std::string key ;
    for( unsigned char c : str ) if( !std::isspace(c) ) key += std::toupper(c) ;
    return key ;
}

std::map< std::string, int > make_map( std::istream& stm )
{
    std::map< std::string, int > map ;

    // for each item read from the stream, increment the count
    std::string item ;
    while( stm >> item ) ++map[ make_key(item) ] ;
    // note: if the key is seen for the first time, zero-initialised count (zero) is incremented
    //       otherwise the existing count is incremented

    return map ;
}

int main()
{
    std::istringstream file( "OAK_PLANK OAK_PLANK OAK_PLANK\n"
                             "cobblestone redstone cobblestone\n"
                             "COBBLESTONE iron_ingot COBBLESTONE\n"
                             "IRON_INGOT REDSTONE iron_ingot oak_plank COBBLESTONE\n" ) ;

    const auto map = make_map(file) ;

    for( const auto& [key,cnt] : map ) std::cout << std::quoted(key) << " - " << cnt << '\n' ;
}

http://coliru.stacked-crooked.com/a/08af40463c79e9cc
Maybe the if( !std::isspace(c) ) is redundant?
It is indeed redundant for this specific example (where the spaces are already skipped by formatted input).
Well, ok, but since std::toupper() returns “Converted character or ch if no uppercase version is defined by the current C locale” ( https://en.cppreference.com/w/cpp/string/byte/toupper ), perhaps we could just skip it, can’t we?
Skip the std::toupper() if the strings (item names) are case sensitive:
if, for example, COBBLESTONE and Cobblestone are two different items.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <iostream>
#include <sstream>
#include <string>
#include <cctype>
#include <map>
using namespace std;

string toupper( string str )
{
   for ( char &c : str ) c = toupper( c );
   return str;
}

int main()
{
   stringstream in( "OAK_PLANK OAK_PLANK OAK_PLANK         \n"
                    "cobblestone redstone cobblestone      \n"
                    "COBBLESTONE iron_ingot COBBLESTONE    \n"
                    "IRON_INGOT REDSTONE iron_ingot oak_plank COBBLESTONE \n" );

   map<string,int> freq;
   for ( string s; in >> s; ) freq[toupper(s)]++;
   for ( auto p : freq ) cout << p.first << ": " << p.second << '\n';
}


COBBLESTONE: 5
IRON_INGOT: 3
OAK_PLANK: 4
REDSTONE: 2
Uh! I think I eventually got it, JLBorges: you want your function to ensure the ‘key’ string to be just one word, don’t you?
Nice; it makes the function far more general.

But now, since this code has buzzed in my mind for hours, I can’t help asking you: since you don’t modify the original argument, why don’t you pass it by const reference?
But now, since this code has buzzed in my mind for hours, I can’t help asking you: since you don’t modify the original argument, why don’t you pass it by const reference?

Probably because he is modifying that string inside the function. And he is returning the modified string to the calling function via the return statement. This allows the user the choice as to whether they want to modify the original string using assignment in the calling function.



I probably should have specified this from the beginning, I’m sorry: all my questions were about the function make_key() in the above JLBorges’ code.
> since you don’t modify the original argument, why don’t you pass it by const reference?

I should have passed the string by reference to const.

My first idea was to modify the string in situ, without taking care to remove white space. As I was writing the function, I thought that this would be a bit too crude (too much special casing), and added the stripping of white space. I forgot to modify the signature of the function when I did that.

Ideally, should have taken care of punctuation too, so that OAK_PLANK and Oak Plank would be the same key.
Last edited on
Ok, sorry, I was just trying to understand. Thank you for your answers.
Last edited on
Apologies for my very late reply. I had some stuff I needed to get done first.

Thank you all for your replies!

JLBorges wrote:
Use a normal std::map with the key as the item and the mapped value as the count of items.

Would I only use unordered maps if I had thousands, or millions, of keys/values to store and didn't need for them to be in a specific order?

JLBorges wrote:
For example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <iostream>
#include <string>
#include <cctype>
#include <map>
#include <sstream>
#include <iomanip>

// strip white space, convert everything to upper case
std::string make_key( std::string str )
{
    std::string key ;
    for( unsigned char c : str ) if( !std::isspace(c) ) key += std::toupper(c) ;
    return key ;
}

std::map< std::string, int > make_map( std::istream& stm )
{
    std::map< std::string, int > map ;

    // for each item read from the stream, increment the count
    std::string item ;
    while( stm >> item ) ++map[ make_key(item) ] ;
    // note: if the key is seen for the first time, zero-initialised count (zero) is incremented
    //       otherwise the existing count is incremented

    return map ;
}

int main()
{
    std::istringstream file( "OAK_PLANK OAK_PLANK OAK_PLANK\n"
                             "cobblestone redstone cobblestone\n"
                             "COBBLESTONE iron_ingot COBBLESTONE\n"
                             "IRON_INGOT REDSTONE iron_ingot oak_plank COBBLESTONE\n" ) ;

    const auto map = make_map(file) ;

    for( const auto& [key,cnt] : map ) std::cout << std::quoted(key) << " - " << cnt << '\n' ;
}

http://coliru.stacked-crooked.com/a/08af40463c79e9cc

Thank you for providing full code! I have a few questions for your code.

Also, didn't know std::isspace was a thing. I looked it up (http://www.cplusplus.com/reference/cctype/isspace/), pretty neat!

Please Note:
- When I say things like "wouldn't it make more sense for..", I mean based off my knowledge and I'm sure you have a logical reason for your code so could you please explain why you did it that way? :P I am NOT saying or pretending that I'm an expert or that my way is better than yours. Just don't want it to come off the wrong way haha. :)
- Please be patient with me. I'm sorry I ask so many questions, I just want to make sure I fully understand it.

Questions:

1.) In for( unsigned char c : str ) if( !std::isspace(c) ) key += std::toupper(c) ; does unsigned char just mean it takes up less space than a normal char? I read: https://stackoverflow.com/questions/75191/what-is-an-unsigned-char but I don't fully understand what it means.

2.)
1
2
3
std::map< std::string, int > make_map( std::istream& stm )
{
    std::map< std::string, int > map ;

a. It is a function called make_map that takes the parameter of an input stream (right?) and stm stands for stream to map..?

b. To clarify, the function is of type map meaning it returns a map, yes? Wouldn't it make more sense to have a void function that has a function parameter of the map passed by reference (void make_map(std::istream &stm, std::map<std::string, int> &map)? So the map itself would always be updated in main() without returning anything. Otherwise it has to make a map in the make_map() function & in main() which would be more costly (I think).

3.) For the code:
1
2
3
4
5
    // for each item read from the stream, increment the count
    std::string item ;
    while( stm >> item ) ++map[ make_key(item) ] ;
    // note: if the key is seen for the first time, zero-initialised count (zero) is incremented
    //       otherwise the existing count is incremented 

a. The while( stm >> item ) part means it goes through every line in the file and stores the info into item variable until EOF has been reached and it ends the loop, correct?

b. For the code: ++map[make_key(item)]
I read this: http://www.cplusplus.com/reference/map/map/operator[]/
And based off that link, I believe the bold part means it calls the function make_key() which strips white space and makes everything uppercase. Then it assigns the key within the map. And the ++ part means it adds the key to the next key position in the map. Is this correct?
- Although now that I think about it, why would you need to strip white space? There wouldn't be any white space to strip because the code while( stm >> item ) acts like std::cin (I think) so it only takes the characters until the next white space and skips the white space. In other words, there is no white space to remove, right? Edit: Just read your comment:
JLBorges wrote:
It is indeed redundant for this specific example (where the spaces are already skipped by formatted input).


4.) In main() it says: const auto map = make_map(file) ;
Does this mean a map is created in main() and in the make_map() function? So as mentioned before, wouldn't it make more sense to make the map in main() and pass by reference in make_map()? Sorry, this question is a bit repetitive.

5.) for( const auto& [key,cnt] : map ) std::cout << std::quoted(key) << " - " << cnt << '\n' ;
This goes through every key and value in the map and prints "key" - cnt \n. How does it know what cnt is though? I don't see it declared. Also, what does cnt stand for?

#. The code you provided only works with C++17 not C++11 and C++14.
Something in this line (I think) is the issue: for( const auto& [key,cnt] : map ) std::cout << std::quoted(key) << " - " << cnt << '\n' ;
Do you know what? If you can point me to which part, I could try finding an alternative (as a challenge for myself). :)

My classes so far have used those versions, and while this is a project for my free time, I should probably stick to the same version we use so my future programs won't give errors from having C++17 elements while my professor tries to compile them with C++11 or 14.

If you managed to read all of that and respond, thank you so much for your patience with me! And if you didn't, it's okay, I understand.

- - -
Lastchance I'll read & respond to your comment later. I'm going to go eat dinner. :)


Last edited on
> Would I only use unordered maps if I had thousands, or millions, of keys/values to store
> and didn't need for them to be in a specific order?

In general, for large data sets, unordered map gives better average insertion and look up performance.

However, with unordered maps, the memory usage tends to be higher; its worst case performance may be an issue (that is: even though average performance is much better, a specific insertion or look up may be worse than the guaranteed logarithmic performance for a map).

As a general rule, favour unordered map for large data sets.


> In for( unsigned char c : str ) if( !std::isspace(c) ) key += std::toupper(c) ;
> does unsigned char just mean it takes up less space than a normal char?

No. Both char and unsigned char have he same size.
The reason for using unsigned char: https://en.cppreference.com/w/cpp/string/byte/toupper#Notes


> Wouldn't it make more sense to have a void function that has a function parameter of the map passed by reference.

There are two reasons why passing a map by reference would not be more efficient:

a. Named Return Value Optimisation (NRVO) https://en.cppreference.com/w/cpp/language/copy_elision

b. Even if, for some reason, the optimisation (NRVO) can't be applied, the map object would be moved (cheap) rather than copied (expensive).


> For the code: ++map[make_key(item)]

make_key(item) makes the key (strips hite space, convert to upper case).
Once the key is made, there are two possibilities:

a. This key does not exist in the map. In this case, a new key-value pair is inserted into the map, with the mapped value initialised to zero. The ++ part increments the mapped value (from zero to one); in the end we have the key in the map with an associated value of one.

a. This key already exists in the map. In this case, we get a reference to the value assopciated with the key; The ++ part increments that value (if the count was 6 earlier, it now becomes 7).


> for( const auto& [key,cnt] : map )

Structured binding (C++17): http://www.nuonsoft.com/blog/2017/07/26/c17-structured-bindings/
JLBorges wrote:
> Would I only use unordered maps if I had thousands, or millions, of keys/values to store
> and didn't need for them to be in a specific order?

In general, for large data sets, unordered map gives better average insertion and look up performance.

However, with unordered maps, the memory usage tends to be higher; its worst case performance may be an issue (that is: even though average performance is much better, a specific insertion or look up may be worse than the guaranteed logarithmic performance for a map).

As a general rule, favour unordered map for large data sets.


> In for( unsigned char c : str ) if( !std::isspace(c) ) key += std::toupper(c) ;
> does unsigned char just mean it takes up less space than a normal char?

No. Both char and unsigned char have he same size.
The reason for using unsigned char: https://en.cppreference.com/w/cpp/string/byte/toupper#Notes


> Wouldn't it make more sense to have a void function that has a function parameter of the map passed by reference.

There are two reasons why passing a map by reference would not be more efficient:

a. Named Return Value Optimisation (NRVO) https://en.cppreference.com/w/cpp/language/copy_elision

b. Even if, for some reason, the optimisation (NRVO) can't be applied, the map object would be moved (cheap) rather than copied (expensive).


> For the code: ++map[make_key(item)]

make_key(item) makes the key (strips hite space, convert to upper case).
Once the key is made, there are two possibilities:

a. This key does not exist in the map. In this case, a new key-value pair is inserted into the map, with the mapped value initialised to zero. The ++ part increments the mapped value (from zero to one); in the end we have the key in the map with an associated value of one.

a. This key already exists in the map. In this case, we get a reference to the value assopciated with the key; The ++ part increments that value (if the count was 6 earlier, it now becomes 7).


> for( const auto& [key,cnt] : map )

Structured binding (C++17): http://www.nuonsoft.com/blog/2017/07/26/c17-structured-bindings/

Thank you for your reply! Again, I'm sorry for my late reply.

That was very informative.

Especially with the structured binding! That's pretty neat! That's a pretty useful feature. But if I start using it in my homework I guess my professor would get errors and it'd lead to a zero for my grade. So hm I probably won't use it while I'm trying to get a degree. Still, was a cool thing to learn about! And I know how to change it now to support lower versions. Thank you!


lastchance wrote:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <iostream>
#include <sstream>
#include <string>
#include <cctype>
#include <map>
using namespace std;

string toupper( string str )
{
   for ( char &c : str ) c = toupper( c );
   return str;
}

int main()
{
   stringstream in( "OAK_PLANK OAK_PLANK OAK_PLANK         \n"
                    "cobblestone redstone cobblestone      \n"
                    "COBBLESTONE iron_ingot COBBLESTONE    \n"
                    "IRON_INGOT REDSTONE iron_ingot oak_plank COBBLESTONE \n" );

   map<string,int> freq;
   for ( string s; in >> s; ) freq[toupper(s)]++;
   for ( auto p : freq ) cout << p.first << ": " << p.second << '\n';
}

Thank you for your reply!

for ( string s; in >> s; ) freq[toupper(s)]++;
Pretty neat! I didn't realize (or maybe I forgot, idk) that a for loop could be used in such a way!

One quick question for:
for ( auto p : freq ) cout << p.first << ": " << p.second << '\n';
Why did you make it auto? Why not string? Doesn't auto make it automatically detect which data type it should be and sets it as the proper one? I would think (this is just a guess) that setting as auto would have a slight performance difference over just setting the type. Otherwise, what's the point of setting the data types at all other than readability? Or is that the only reason?

In: https://en.cppreference.com/w/cpp/language/auto
I read: in the type specifier of a variable: auto x = expr;. The type is deduced from the initializer.

Looking forward to your reply (even if I may be a bit slow to reply), thanks! :)
Last edited on
I used 'auto' there for convenience, maybe laziness. The full type in this instance would be pair<string,int>, since that is the type of each element in 'freq'.
Topic archived. No new replies allowed.