hash table function

Pages: 12

As far as I know - we were asked to check the capacity of the smaller data structure and if it exceeded a certain value then resize.
I don't think multiple resizes were asked - I'd better check.
As far as I could understand it was an exercise in making my own hash table and thinking about capacity and collisions - then retrieving the data. The resizing part was to get me thinking about how to do it without losing any data, that why an array was chosen to get us thinking about the table, data and resizing.

That was why I used the pointers, originally when I first started with this I just had an array of strings, which in turn got me thinking about resizing laters.

With all the help I have received so far my code just about does what is asked.

Thanks for everyone that has taken the time to help me - its is fair to say my code has certainly evolved form my original efforts - very much appreciated!!!

dhayden (5795)

Ask yourself this: will your hash table be efficient if you insert 10 items in it?

How about 1000?
How about 100,000,000?

I think you'll that you're supposed to start with a small number of buckets. Once you reach, say 70% capacity, you increase the size, probably by some multiple like 50% or 100%.

But if you keep inserting items, it will reach 70% of the new capacity. Now you're supposed to resize again.

And so on, as much as needed.

jonnin (11333)

if you want to polish it for real use cases, resize is expensive, and rehash/collisions are expensive. typically you want several free spaces to every one used, not something as tight as 70% full more like 1/3 or even 1/4 full. And memory is cheap so a large starting size default is not a bad plan either, to delay resize as long as possible.

I highly suggest you allow your hash algorithm to be overridden so that when you have a large amount of well known data you can exploit your knowledge of the data better to reduce collisions to nearly zero.

I tend to try a hash table as my first approach to most problems that involve data of any real volume. While the language has decent tools for it, having a pocket library you can use when speed is critical is worth a few days to code it up carefully.

Last edited on

dhayden (5795)

I highly suggest you allow your hash algorithm to be overridden so that when you have a large amount of well known data you can exploit your knowledge of the data better to reduce collisions to nearly zero.

I'll second that point.

In addition, I've found that in real world situations, you usually have some idea of how many items will go into the hash table, so you can start with an appropriate size. This means that you won't have to resize too many times.

I'll also say that any time I've written a hash table, I have NOT done the way you're doing it James. I make each bucket a list of items, i.e., vector<list<item> > buckets. If two items hash to the same bucket then there are two items in the list. James, It sounds as though your prof wants you to implement it the way you have it.

jamesfarrow (211)

Chaps thanks for the taking time to reply. An update - my code, although it 'works' can only resize once and thats it. Pointed out by dhayden earlier. I need to think of a way where the hash table can resize more than once.
My idea of using pointers was the right approach.
The exercise is designed to get me thinking about hash tables and implementation, pointers etc
Back to the drawing board, well not completely but... :)

Thanks again for taking the time to help - I appreciate it.
No doubt I will be back shortly will more questions 🤣

jamesfarrow (211)

rethink and now can rehash multiple times.

class hash
{
private:
    
    int size = 0;
    int max_size = 10;
    std::string* data;
    int prime_number{7};// prime number to multiply when using hash function
    
    
    
public:
    
    hash();//constructor
    //~hash();//destructor
    
    void add_string(std::string str);
    unsigned hash_function(std::string str);//to create hash for array storage
    bool contains(std::string);
    void rehash();
    
};

#endif /* hash_table_hpp */


#include "hash_table.hpp"
#include <iostream>


hash::hash() : data(new std::string[max_size]) {}


unsigned hash::hash_function(std::string str)
{
    unsigned value{};
    for(int i{}; i < str.size();i++)
    {
        value += str[i];
    }
    return value * prime_number;
}

void hash::add_string(std::string str)
{
    int index = hash_function(str) % max_size;
    while(data[index] != "")
    {
        if(data[index] == str) return;
        index = (index + 1) % max_size;
    }
    data[index] = str;
    size++;
    if(size > max_size / 2) rehash();
}

bool hash::contains(std::string str)
{
    unsigned index{hash_function(str) % max_size};
    if(data[index] == str) return true;
    //if above fails loop through data array unitl either found or end of loop
    while(data[index] != "")
    {
        if(data[index] == str) return true;
        index = (index + 1) % max_size;
    }
    return false;//if all above fails default return false
    
}

void hash::rehash()
{
    std::cout << "rehashing table" << std::endl;
    //variable to store size of new hash table
    int new_size = max_size * 2;
    //this is a pointer to new larger array of new size
    std::string* new_data = new std::string[new_size];
    //loop through smaller array and rehash data to new larger array
    for(int i{}; i < max_size;i++)
    {
        if(data[i] != "")
        {
            new_data[hash_function(data[i]) % new_size] = data[i];
        }
    }
    //delete data in original - this leaves the pointer
    //not pointing to anything
    delete [] data;
    //now point data to new data
    data = new_data;
    max_size = new_size;
}

dhayden (5795)

Your hash function isn't very good. ABC gives the same hash as CBA and ACB. But that's easy to fix.

rehash() doesn't account for collisions. It would be great if you can use add_string() for that, so maybe something like this

void hash::rehash()
{
    std::cout << "rehashing table" << std::endl;

    // Remember the current data and size
    int old_size = max_size;
    std::string *old_data = data;

    // Double the size and create a new, empty data array
    max_size *= 2;
    data = new std::string[max_size];

    // Insert the old values
    for(int i{}; i < old_size;i++)
    {
        if(old_data[i] != "")
        {
            add_string(old_data[i]);
        }
    }
    //delete old data
    delete [] old_data;
}

Last edited on

Topic archived. No new replies allowed.

Pages: 12