Hash tables

Hi i am having to create a hash table for a project, but i have never actually created one before, and to be honest my programming skills are not very good, so i am struggling a bit. I have managed to patch something together through different sources, but i think my add/insert function is not working correctly and i can not figure out why, i was hoping someone might be able to help me out. The hash function is working because it is printing out the values, but when i am trying to print out what should be in the buckets nothing is printing, so i am assuming it is a problem with the add/insert function like i said.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
  .h
#pragma once
#include <list>
#include <string>

using namespace std;

class hashNode
{
	public:
		string name;
		string type;
		hashNode *next;
	
		hashNode(string name, string type)
		{
			this->name;
			this->type;
			next = NULL;
		}

};

class hashTable
{
	private:
		hashNode **table;

	public:
		const int TABLE_SIZE = 10;
		hashTable();
		~hashTable();

		int hash(string name);
		void add(string name, string type);
		void printTable();


};

.cpp

#include <iostream>
#include "Hash.h"


//Create the hash table, populate with hashnodes all set to null
hashTable::hashTable()
{
	table = new hashNode * [TABLE_SIZE];
	for (int i = 0; i < TABLE_SIZE; i++)
	{
		table[i] = NULL;
	}
}

hashTable::~hashTable()
{
	for (int i = 0; i < TABLE_SIZE; i++)

		if (table[i] != NULL)

			delete table[i];

	delete[] table;
}



int hashTable::hash(string name)
{
	int hash = 0;
	int bucket;

	for (int i = 0; i < name.length(); i++)
	{
		hash = hash + (int)name[i];
		cout << "hash =	" << hash << endl;
	}

	bucket = hash % TABLE_SIZE;

	return bucket;
}

void hashTable::add(string name, string type)
{
	hashNode *temp = new hashNode(name, type);
	int location = hash(name);
	if (table[location] == NULL)
	{
		table[location] = temp;
	}
	else
	{
		temp->next = table[location];
		table[location] = temp;
	}

}

void hashTable::printTable()
{	
	for (int i = 0; i < TABLE_SIZE; i++)
	{
		if (table[i] != NULL)
		{
			cout << "name = " << table[i]->name
				<< "type = " << table[i]->type << endl;
		}
	}

}

main.cpp

#include <iostream>
#include "hash.h"


using namespace std;

int main()
{
	hashTable hash;
	hash.printTable();

	hash.add("bob", "pug");


	hash.printTable();

	system("pause");

}
Uh. Instead of making a list, your making a bunch of grapes all clustered around a single starting point in add, because instead of attaching the new node to the last node’s next pointer, your attaching the first node to the new node’s next pointer. Is there a reason why your not actually using the normal standard list class? If there’s a reason then ok, but if not you should think about switching to that and just rewriting that table class maybe.
What do you think lines 17 and 18 above are doing?
Uh. Instead of making a list, your making a bunch of grapes all clustered around a single starting point in add, because instead of attaching the new node to the last node’s next pointer, your attaching the first node to the new node’s next pointer. Is there a reason why your not actually using the normal standard list class? If there’s a reason then ok, but if not you should think about switching to that and just rewriting that table class maybe.


Uh I guess ive just never used the list class before, so ive just done it this way. Like I said I really am not good at programming nor anything to do with data structures, ive just unfortunately been tasked with this project which is way out of my skill range.

What do you think lines 17 and 18 above are doing?


I originally had the name and type as private, so had that in for them.
That’s ok, we can show you. We actually have some of the documentation right here on cplusplus.com! It’s right here: http://www.cplusplus.com/reference/list/list/
It’s probably a big help I’d think. You get all the needed functionality from that I’m guessing.
Also, Your hash function seems a little dangerous, it could very easily create exactly similar keys. (For example "abc", "bca", and "cba" would all generate the same key)

I originally had the name and type as private, so had that in for them.

I know this was directed at dutch, but I don’t think either of us understand what you mean for that to do still.
ok so i looked over the add/insert function again, and i believe i understand why the it wasnt working for the else section in the if else statement, but in the for the first part, am i not just creating a new node, checking if the table location is empty, and if it is, setting the empty location to the data of the node that was created? And as for the hash function, i did know it wasnt a great one, im not sure how to make it much better, but i will work that out at a later date, just wanting to get the functionality up and running first, then i will get it working better :) thanks for the heads up though


Edit: Dont i need lines 17 and 18 for my constructor?
Last edited on
When the list is empty you function works fine for making that first node, yes. I was only discussing after the first node had been made. That’s the only part where things got fishy to me.

> Edit...
No. Those lines are doing absolutely nothing. Actually I’m pretty sure the program shouldn’t even compile with those lines.
Actually I’m pretty sure the program shouldn’t even compile with those lines.

Sadly, it will compile, but it should definitely give warnings.

@kmce, Those lines do absolutely nothing. They should be:

1
2
		this->name = name;
		this->type = type;

Last edited on
or, strongly consider not making your code need the extra this-> clutter by having distinct names.

1
2
3
4
5
6
7
hashNode(string in_name, string in_type)
		{
			name = in_name;
			type = in_type;
			next = NULL;  //c++ has nullptr... its all 0 to me but FYI. 
		}
Last edited on
When the list is empty you function works fine for making that first node, yes. I was only discussing after the first node had been made. That’s the only part where things got fishy to me.


Sorry, so should it actually be working when i just enter one new entry then? since i am just creating the first node?

Those lines do absolutely nothing. They should be:
this->name = name;
this->type = type;


of course it should be. Stupid mistake on my part. thanks for that
Is this school or real code?
the map containers are like hash tables (but, I find them a little sluggish, and do not use them for large things). Are you building something you don't need to, or are you doing it to learn about the ideas etc?

I used a hash function to look for something across 2 (huge) files a few weeks ago. The key parts of it are here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
int main()
{	
    const int hsize = 10000;
    vector<vector <string> > vs(hsize);
	ifstream ifs(" ... ");  //any text file
	string s;
	uint64_t dx;
    while(ifs >> s)
	{
	  dx = jhash(&s[0],s.length(),hsize); //jhash turns raw byte data into a number from 0-hsize here
	  vs[dx].push_back(s);                     
	}
     //use the table for something. 	
}


writing a hash function (jhash here) that distributes the data you have evenly can take some time and thought. Or you can trust your luck to sha or similar encodings.

to use the table, you would call jhash on your key again, go to that vs[index] and search that vector linearly with find(). As long as the distribution across vs is good and the table size is appropriate, its only going to search a few items and will be done very fast.
Last edited on
Adding to the front of the list is fine. Honestly, I don't understand why people think you should always add to the back of a linked list. It causes people to write horrific code to traverse the list if they implement it themselves.

But you can simplify add(). You only need the code in the "else" part because it works whether the list is empty or not.
1
2
3
4
5
6
7
void hashTable::add(string name, string type)
{
        hashNode *temp = new hashNode(name, type);
        int location = hash(name);
        temp->next = table[location];
        table[location] = temp;
}


The problem with printTable is that you're only printing the first item in the list.
In this version, I'm printing the bucket number just to make sure things are hashing right.
1
2
3
4
5
6
7
8
9
10
void hashTable::printTable()
{
        for (int i = 0; i < TABLE_SIZE; i++)
        {
            for (hashNode *p = table[i]; p; p = p->next) {
                cout << "bucket " << i << " name = " << p->name
                     << ". type = " << p->type << endl;
            }
         }
}


For testing, I modifed the main program to read names and types from cin:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int main()
{
        hashTable hash;
        cout << "Before:\n";
        hash.printTable();

        string name, type;
        while (cin >> name >> type) {
            hash.add(name, type);
        }

        cout << "After:\n";
        hash.printTable();

        system("pause");

}


With this input:
HP-21   woodstock
HP-25   woodstock
HP-12C  voyager
HP-10C  voyager
HP-35   classic
HP-67   classic
HP-34C  Spice
HP-33E  Spice

I get this output:
Before:
After:
bucket 0 name = HP-25. type = woodstock
bucket 1 name = HP-35. type = classic
bucket 1 name = HP-10C. type = voyager
bucket 3 name = HP-12C. type = voyager
bucket 6 name = HP-67. type = classic
bucket 6 name = HP-21. type = woodstock
bucket 7 name = HP-34C. type = Spice
bucket 8 name = HP-33E. type = Spice
sh: pause: command not found

Is this school or real code?
the map containers are like hash tables (but, I find them a little sluggish, and do not use them for large things). Are you building something you don't need to, or are you doing it to learn about the ideas etc?


It is for a school project, so I am building something I do need to, but also to learn about the ideas of it since i have never touched on the concept of hash tables before.


Adding to the front of the list is fine. Honestly, I don't understand why people think you should always add to the back of a linked list. It causes people to write horrific code to traverse the list if they implement it themselves.


Does adding to the front make it slower? Ive not really done a lot with complexity? just wondering however.

And hopefully without sounding like a complete idiot, could you possibly explain a part of your printTable please? The second p in the for loop, I am wondering what that is for?

1
2
3
4
5
6
7
8
9
10
void hashTable::printTable()
{
        for (int i = 0; i < TABLE_SIZE; i++)
        {
            for (hashNode *p = table[i]; p; p = p->next) {
                cout << "bucket " << i << " name = " << p->name
                     << ". type = " << p->type << endl;
            }
         }
}


Does adding to the front make it slower?
Just the opposite. Adding to the front is faster. In fact, with a singly linked list, unless you keep a tail pointer, adding to the back of the list is extremely slow, O(n) because you must traverse the list to find its end.

The second p in the for loop, I am wondering what that is for?

You have 10 buckets in your hash table. Each bucket points to a linked list of hashNodes. The "for p" loop loops through the list for table[i] and prints the entries.

This loop takes advantage of C++'s generalization of for loops. You can put almost anything you want inside one.
1
2
3
for (initial; condition; step) {
   statements;
}

is (nearly) identical to:
1
2
3
4
5
initial;
while (condition) {
    statements;
    step;
}

So the for loop that I wrote is the same as this, which you might be able to understand better:
1
2
3
4
5
6
hashNode *p = table[i];
while(p) {
    cout << "bucket " << i << " name = " << p->name
            << ". type = " << p->type << endl;
    p = p->next;
}

Just the opposite. Adding to the front is faster. In fact, with a singly linked list, unless you keep a tail pointer, adding to the back of the list is extremely slow, O(n) because you must traverse the list to find its end.


Interesting, I was thinking that because you were adding in the front, it would take time because a new node would need to be created at the end, then everything would need to move down, or is that just now how it happens lol?

Thank you for explaining, i think just seeing the p on its own is what confused me, now that you explained it i see that it is just a standard for loop.

Now for the next part, trying to read in from a file, this should be interesting, never done this either. I hope it is straight forward
Nothing moves in a linked list. You have for example 3 chunks of memory, and the only way they are related is the 'next' pointer inside each of those chunks. Get a new (4th) one, they stay where they were, it just points to what is already there. Arrays are prone to having to shift data to insert into them. There are pros and cons to both... you can go to any array location at any time, but lists you have to move from start to end every time; you can't for example binary search a traditional linked list.


Last edited on
consider having these classes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
element{ //use a meaningful name based on the context
   string name, type;
};
list_node{
   element data;
   list_node *next;
};
list{
   list_node *head;
   insert_at_front(element); //basically, a stack
   print();
   ~list();
};
hash_table{
   list table[TABLE_SIZE]; //¿is a static array good enough? ¿or do you want it to grow?
   //the functions that you've defined
};
that way your implementation is clearer and can be more easily tested
1
2
3
4
5
6
7
8
9
void hashTable::add(element value){
   int location = hash(value);
   table[location].insert(value);
}
void hashTable::print(){
   for(int K=0; K<TABLE_SIZE; ++K)
      if(not table[K].is_empy())
         table[K].print();
}
Nothing moves in a linked list. You have for example 3 chunks of memory, and the only way they are related is the 'next' pointer inside each of those chunks. Get a new (4th) one, they stay where they were, it just points to what is already there. Arrays are prone to having to shift data to insert into them. There are pros and cons to both... you can go to any array location at any time, but lists you have to move from start to end every time; you can't for example binary search a traditional linked list.


of course, i am forgetting they are not stored the same way as arrays. Basically when the new element is added, the pointer to the original first element is "broken" and then the new element is added and then that pointer then points to the element that was the original first, correct?

Thanks ne555 I will have a look at those changes
of course, i am forgetting they are not stored the same way as arrays. Basically when the new element is added, the pointer to the original first element is "broken" and then the new element is added and then that pointer then points to the element that was the original first, correct?

it depends.
a picture might help -- the efficient insert at head is just this:

head= 3->4->5->null

tmp = new node;
tmp->data = 2;
tmp->next = head;
head = tmp;
after that, which is a basic insert in front, you have this:
head = 2->3->4->5->null
the 3,4,5,null sub-list never changed: look above, and none of those nodes were touched in any way. All we did was make the new one's next point to the head of the old list, and bam, it is connected. Move head backwards, and its all tidy.

to stick the 2 after the 5, though, you have to iterate the list until you get to 5 because 5-> next is null, then say 5->next is tmp and tmp->next is null. Iterating the list every time is costly as the list gets bigger.

and we still didn't do what you said :)
what you described is inserting in the middle. If you want to put 3.1415 between 3 and 4, you have to find the 3, hold onto the 4 in a temp, append 3.14 to either the end of the 3 list or the beginning of the 4 list, and reconnect the 3 next to 3.1415 and so on.
if you keep a tail pointer adding to the end is only a tiny bit more trouble than adding to the front, but you do have to maintain that tail pointer so there is a little more work doing that. Inserting in the middle is always the worst, you have to find the location (average N/2 linear search), break the lists, and put it all back together. If you don't have tail, you have an inefficient loop to add to the end. But adding to or removing from the front, no matter what, is always exceedingly simple. That means that lists make great stacks or if you don't care about order, just use the top and that works great too.
Last edited on
Topic archived. No new replies allowed.