Hash maps.

Hello all. I'm quite new to C++'s implementation of hash maps. I've messed around with them in Java, which seems really simple compared to this, but I want to try to figure them out in C++ as well.

So I found some code on a website that got me started, but there are some things I don't understand. Here's the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <hash_map>
#include <string>
#include <iostream>
#define _DEFINE_DEPRECATED_HASH_CLASSES 0

using namespace std;
using namespace stdext;

int main()
{
	hash_map<string, int> yeah;
	string hello = "hello";

	yeah["hello"] = yeah.comp("hello");
	yeah["Mellow Gold"] = 233;
	if (yeah["mellow Gold"])
		cout << "YEY!" << endl;
	cout << yeah.comp("hello");
	cout << yeah["hello"];

	yeah["okay"] = 23;
	cout << endl << yeah["okay"] << endl;

	return 0;
}



There are some things I don't understand:

- Why is there the "#define _DEFINE...?" Is it defining something that is used in the hash_map lib or what?

- What's stdext? And I guess it's used so I don't have to do stdext:: everytime I want to use a member of that namespace?

- I'm not really sure how to call the hash function/create my own. I want to hash on the string values, so if I do "yeah.comp("hello")," that returns the hash value and stores it for yeah["hello"], right? Is that the right way of doing it?

What I really want to do is build a dictionary of words, so I will have strings (that will be my words), and I want to count how many times these words occur in my text file. I'm choosing a hash_map because of how fast the look up is. I'm just stuck on how I implement this.

My current thought process/problem is... I want to basically have a struct that just has my word and its count, but when I go to put this in the hash map, I'm not really sure how I do this, since I don't want the hash function to hash on the whole struct, I just want the string's value to hash. So let's say I have a struct that looks like this:

1
2
3
4
struct word {
      string myWord;
      int count = 0;
};


Then do I declare the hash_map like:

 
hash_map<word, int, hash<word::myWord>, hash_compare<word::myWord, less<word::myWord>>> myHash;


I know hash_map's 3rd argument is the hash function and the 4th argument is to check whether two keys are equal, but as you can see, I'm not really sure how to use these...

I'm sure this is really confusing, so please ask me questions so I can clarify.
Why is there the "#define _DEFINE...?"
hash_map never quite made it into STL, but by then a number of vendors had implementated it. You're obvisouly using a Microsoft compiler. The macro is a Microsoft thing, it switches off useless warnings when you use the hash_map.

What's stdext?
As hash_map never made it into the standard, it doesn't live in namespace std, instead it lives in stdext on your compiler.

I'm not really sure how to call the hash function/create my own.
You use a hash_map in the same way you'd use std::map.

My current thought process/problem is...
If your dictionary maps words to frequency, you don't need a seperate word record as the dictionary is the data structure.

The dictionary declaration could be as you've used in your sample:
std::map<std::string, unsigned> or
stdext::hash_map<std::string, unsigned>
Cool, I understand everything up to calling the hash function. I've never used map, so I don't know how this works.

If I wanted to hash a word, say, "hello," how do I hash it? If I do:

1
2
3
4
5
6
hash["hello"];
cout << yeah["hello"];    // This just gives me zero as the value.
hash["hello"] = hash.comp("hello"); 
cout << yeah["hello"];    /* This appears to give me the hash value returned from hash.comp, but is this the right way to hash?
                          /  And what does yeah["hello"] return?
                          /  Just the value at that index, not the actual hash value? */


I don't think I understand how this hash_map works then... let's say my dictionary contains these words:

hello
good
jack
johnson

And let's say these are the only 4 words we've added so far (with no repeats of these words)... now let's say the next word is "hello". When I go to hash this, I want to realize that it's already in the hash table, so with the current "hello" in the dictionary, I want to increase the count for it, but if my dictionary is a hash table, how can I increase a counter for it if the hash table is just that, a hash table?

I guess I might be visioning it wrong. I see a hash table as basically an array that brings you to the index that you hash on, so if "hello" hashes to 34848, then that would be my index. So at this index, how can I specify whether it holds an int or a struct or whatever I want? I just messed around with this:

1
2
3
4
5
6
hash_map<string, int> hash;

hash["hello"] = 1;
hash["hello"]++;
hash["hello"]++;
cout << hash["hello"] << endl;


When I output hash["hello"], I get 3, which is what I want for increasing the word count, but let's say I want that index for hash["hello"] to hold a struct, is this possible, and when I do just hash["hello"], is that storing hello in my hash table? If this is confusing, please let me know.
Last edited on
I think I just figured it out... I thought:

hash_map<string, int> hash;

declared a hash map where the string was hashed as an int, but the int portion is the actual data for that hash_map, so if I replace int with my own struct, then when I hash to that string, I can insert my struct data. Is this correct?
You guys on here are a big help to a lot of us.
Yes, I think. The map is the data structure, you don't need another one. It has all your words that their frequencies. The code you've posted correctly updates the counts.
Alright cool, so I've made some advancement on this little project. The next thing I want to tackle is doing a sort by count for the hash map, so what I did was converted the hash map into a vector (just stored the word structs from the hash map into the vector) and run a mergeSort algorithm on the vector of words, but this is quite inefficient because I take the hash map and store it in a vector (which will definitely kill CPU time, especially with large dictionaries).

So I was wondering if there's a more efficient way to go about this? Ideally, I'd like to just use the hash map and sort it, but that's not possible because of the way it's declared. Any thoughts?
Topic archived. No new replies allowed.