Hashing

Hello guys.I started practicing hashing and i am interested did anyone had contact with this kind of functions and how much are they effective.This are functions that i have learned on university.
This is for double and float

1
2
3
4
5
6
7
8
unsigned int HashCode(double d)
if (d==0)
return 0U;
else
{
int exponent;
double mantissa=frexp(d,&exponent);
return(unsigned int)(2*fabs(mantissa)-1)*~0U;

For string
1
2
3
4
5
6
7
8
  unsigned int HashCode(char * s)
{
unsigned int res=0;
unsigned int a=7;
for(int i=0;s[i]!=0;i++)
res=res<<a^s[i];
return res;
}

EDIT:my bad a=7
Last edited on
I'm not sure about the first one. For the second, since a==0 throughout the function, res<<a[i] is the same as res

Some general thoughts on hashing:
- a hash table is almost always faster than a search tree such as std::set<>.
- Often you have an idea how how many items will go into the table so you can set the number of buckets accordingly.
- For now particular reason, I usually aim for bucketCount == itemCount/4. So an average of 4 items per bucket.

Hash tables can be WORSE than a tree when the comparison/hash key is large. For example, if it's a 1kb string then you have to scan the entire string to compute the hash value but you might only have to compare a few bytes to tell which one is larger. In that case the short comparisons in the tree outweight the increased number of items that you must look at.

Another caveat: unordered_map can use a lot of RAM.
string one looks to be too expensive.
personally I am a huge fan of *extract an integer from the data with a simple computation* and then use that integer to re-seed a <random> generator that gives an index in the range of your table. when I say simple computation, for some strings that is as cheap as (int*)(&stringvar[someindexwheredataisnotallthesame]). Not all strings work that easily ... know your data etc..

this works because the first value from the generator is always the same with the same seed. the second value is always the same with the same seed too, so you can use the random function to generate your second, third, etc keys for collisions if you use that re-hash approach, and its cheap to do. Rehashing has its own problems, but most of the time, in practice, when I use a hash table, its for a list of known values, not 'user data', and I will go for a perfect hash by fooling with the table size and random parameters to get it.

**I am assuming by hash you mean hash-table, not 'encryption' type unique ID hash.
Last edited on
Topic archived. No new replies allowed.