Double hashing question

Hello there,

I'm writing double hash table and have some question about it.
This is my hash function below:

unsigned int DoubleHashTable::HashFunction1(unsigned int const data)
{
return (data % GetTableSize());
}

unsigned int DoubleHashTable::HashFunction2(unsigned int const data, unsigned int count)
{
return ((HashFunction1(data) + count * (5 - (data % 5)) % GetTableSize()));
}

and this is my insert function below:

void DoubleHashTable::SetData(unsigned int const data)
{
unsigned int probe = HashFunction1(data);

if (m_table[probe].GetStatus())
{
unsigned int count = 1;
while (m_table[probe].GetStatus() && count <= GetTableSize())
{
probe = HashFunction2(data, count);
count++;
}
}

m_table[probe].Insert(data);
}

after insert max number of data into table, like put 100 items into table which is size of 100, I can see some of spaces are left as empty which I think the problem. I know, I don't have an edge case for case when my hash function cannot find correct index for given data. But, I can't follow the concept for double hashing since 2nd hash function is backup for 1st. What happen if 2nd hash function also cannot find correct index of table? Actually, that's possible? Gimme some advice for this. Thanks!
With the way you have designed your insert function, it is impossible for the second hash function to not find an index to insert data into. This is not to say it will be the right one. In the worst case, all the positions that the second hash function discovers are already occupied and if this happens N times where N is the size of the hashtable, then the insert function will simply overwrite an existing entry in the table.

I don't think this is what you want. The idea of hashtables is to always have space for something. One way to maintain this is to have a load factor or density. This density will tell how full the hashtable wishes to be, and when the hashtable size reaches this density, you basically increase the size of the table and rehash everything again. This will give you more space to work with which may provide less collisions

The second problem with this hashtable implementation is precisely that second hash function. As you mentioned, after inserting 100 elements into the table, there is still empty spots left. So this means that you need to use a different hash function. Another solution to this would be to resort to linear probing which is simply where you just do a linear search to find the next available position
Smac,

Thanks for your reply. I can understand your advise, like your mention, I have a loadfactor to detect the usage of table so that I can extend the table size if it's over than 75%. This case is just for example to understand myself the double hashing. What I want to know is the problem on my hash functions. In my head, hash1 and hash2 has O(N) as an worst case of search which means it always find the right index unless there is no empty space left. But, result shows me some of indexes are left as blank. So, I'm guessing that my hash function is wrong, I cannot follow the concept to design the hash functions for double hashing. Can you gimme some tips for that? Thank again!
Last edited on
Your hash function is not wrong, in fact there is no wrong hash function (hmm... I'm not sure); You just need to find one that gives better distribution of the keys in the table. Also when you are designing your hash function, you should also have retrieval in mind. The purpose after all is to have fast insertion and deletion. You might find this site helpful: http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

More links:
https://gist.github.com/badboy/6267743
http://stackoverflow.com/questions/664014/what-integer-hash-function-are-good-that-accepts-an-integer-hash-key
http://burtleburtle.net/bob/hash/integer.html

These are all talking about integer hashing which is what you seem to need. If you need to hash strings, then the first link I gave you describes a few good ones.
This site also compares a few good string hashing functions, worth it:
http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

As you read these, you meet terms like avalanche, collision, etc. The eternallyconfuzzeled link explains these
Last edited on
Smac,

Thanks for reply. I will read out the links you provided.
Like you said, hashing cannot be wrong since it only computes given number and table size. That's what I thought, only difference is how much fast and well distributed into table, efficient. But, one thing still cannot understand is, we actually don't know how big the given data is, in other word, even we can estimate the size of table for given data with really huge prime number, it still do same thing for hashing to find correct index of table, like:

hash1(T) = T % N, N is size of table
hash2(T) = (hash1(T) + i * (P - (T % P)) % N, P is prime number

efficient of hash function is decided by P which is random prime number picked by programmer. If I'm right, there is something I missed... anyway, hope articles and examples from links you gave me could answer my question. Thanks a lot!
Topic archived. No new replies allowed.