Comparing

So
Last edited on
so you run the program twice with the same data and swap out the functions. Count the # of collisions.

Table size you can get by trial and error, until the collisions go away. A binary search is recommended; in 10 or so runs you should be able to nail it down.

the magic numbers are the constants in the code. Why are they what they are, and how do they impact what happens? I don't know the answer to this, but this will be the hardest thing to figure out if you did not write the functions. You need to understand what the input looks like and how the bytes of the input are shuffled to generate the keys. Things like does "aaab" give a similar key to "aaac" or "aaaa" are critical. If those gave 1,2,3 for the indices into your storage table, for example, this is a BAD hash function. Similar input must give very different output for a good function (see sha-1 for an example of this). Table size affects the modulo ... if your table size is 2, for example, every other key will give 0 or 1... this collides and does not work of course. But what you are looking for there is space filled... run the program, fill the table, then look at the table to see how many locations are empty and what the average gap between actual data elements is. If elements are generally 10+ empties between entries, for example, you are wasting a lot of space, but have room to hold more data. If the average is 2, you are probably having collisions already. Its zenny though, is a gap of 5 good? Depends on the application and all somewhat.

If the data is constant or slowly changing, you can usually create a perfect hash for it (no collisions at all). If it is very random, that is more difficult. If it changes at all, you can't be sure you have a perfect hash and must handle the collisions somehow. If you can be sure, the problem is much easier. For that reason I usually only hash things that do not change, like a list of constant values, usually strings. Collisions ruin the whole point of the data structure, to me... I want 1 lookup access to data. If I can't get it, I am not happy.




Last edited on
how do you run the program twice with the same data and swap out the functions?
Did i set up my main function right??? Cause I have errors as of now.


Last edited on
compile it, run it, get your statistics and data.
replace hash1 with hash2 and repeat.

Then compare the data and write your report.

I would think you would call each function with quite a bit of data, not just one string, and again check several nearly identical inputs against the outputs as I said. You should call the functions with at least tablesize/4 number of records.
Do i need the main function at all?
here are the errors when i run it:
lin_1103.cpp: In function ‘int main()’:
lin_1103.cpp:35:19: error: expression list treated as compound expression in initializer [-fpermissive]
uint hash1("",1);
^
lin_1103.cpp:35:19: warning: left operand of comma operator has no effect [-Wunused-value]
lin_1103.cpp:36:19: error: expression list treated as compound expression in initializer [-fpermissive]
uint hash2("",1);
^
lin_1103.cpp:36:19: warning: left operand of comma operator has no effect [-Wunused-value]
lin_1103.cpp:37:33: error: ‘hash1’ cannot be used as a function
cout << hash1(" parameter", 1)<< endl;
^
lin_1103.cpp:38:26: error: ‘hash2’ cannot be used as a function
cout << hash2("met ",1);
Last edited on
yes. main() is the entry point of the program; it tells the compiler where to put the "start here" stuff for when the program begins to execute. It is required unless building something that is not an executable program (a library/dll for example).

you have some syntax garble.

try this:

int main()
{
uint h1 = hash1("",1);
uint h2 = hash2("",1);
cout << h1 << endl << h2 << endl;
//cout << hash1("parameter", 1)<< endl;
//cout << hash2("met",1)
}

and if that works uncomment the next 2.


Last edited on
what do those magic numbers represent in those two hash functions?
I honestly do not know. I can see that they are xored and multiplied to the data but why those specific values? I am not sure.

I am lazy (I have mentioned this before) and about 90% of my hash functions look like
-> turn data into integer, by jacking bytes into it.
-> srand(integer value above)
-> rand()%tablesize

once in a while this does not work well and I had to get creative. In one of them I used sha-1.

But that does not help you. Again, I simply do not know what (if anything) makes these values special or work any better than any other value. Its probably some known technique, but not one I am familiar with. It almost looks like a sideways linear congruential approach, but not quite. ?

Last edited on
Please DON'T delete your posts once you're done with the question. It makes this useless as a resource for others to learn from. It's a selfish abuse of this forum to take something from it, and not give back.
Topic archived. No new replies allowed.