Implementation of Hashing

Need help in writing the code. I'm posting the Lab description and code I've done so far.

Purpose: Use Hashing Techniques
Create a ADT to handle the data in the Customer.csv data file: (last,first,id)
perez,diana,86824983-3587182
oxford,greg,49451687-6884854
smith,tsung,34722447-9802850
Place each ADT data object into a Hashing structure using a custom hashing function. Demonstrate you can hash the name data as key and id as value, and visa versa. One of the Keys or Values to the hash structure is required to be an ADT type. This will require overloading appropriate operators (in C++) to support the < operator.

Write a hashing function that produces (ideally) a maximum of 5% collisions. Also, ideally, your space usage should be around 75% of the container.



Here's the lab I've done so far but its not running properly.
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

class HashMapADT {
private:
unsigned int hash(const string& key);
int getIndex(const string& key, bool override_duplicated_key);
const static unsigned int hash_size = 514;
string keys[hash_size];
string values[hash_size];
public:
HashMapADT(); //default constructor
void insertData(const string& key, const string& value);
void printData();
};
/**
* Initialize the param of key and value
*/
HashMapADT::HashMapADT() {
for (int i = 0; i < hash_size; i++) {
keys[i] = string();
values[i] = string();
}
}

unsigned int HashMapADT::hash(const string& k)
{
unsigned int value = 0 ;
for (int i = 0; i < k.length(); i++)
value = 37*value + k[i];
return value;
}
/**
* print hash table
*/
void HashMapADT::printData() {
int checkEmpty = 1;
for (int i = 0; i < hash_size; i++){
if (!keys[i].empty()){
cout << keys[i] << ":" << values[i] << endl;
checkEmpty = 0;
}
}
if (checkEmpty)
cout << "Hash table is empty" << endl;
}
/**
* Get index before inserting an element into hash table because of duplicate entry.
* If the key is same or already there in hash table then it will not insert it again
*/
int HashMapADT::getIndex(const string& key, bool override_duplicate_key = true) {
unsigned int h = hash(key) % hash_size, offset = 0, index;

while (offset < hash_size) {
index = (h + offset) % hash_size;
if (keys[index].empty() || (override_duplicate_key && keys[index] == key))
return index;

offset++;
}
return -1;
}
/**
* Insert string of key and value to hash table
*/
void HashMapADT::insertData(const string& key, const string& value) {
int index = getIndex(key);
if (index == -1) {
cout << "Table is full!" << endl;
return;
}

keys[index] = key;
values[index] = value;
}


int main() {
HashMapADT hmap;
cout << "Before inserting an element. The hash table is:" << endl;
cout << "==========================================="<<endl;
hmap.printData();
cout << endl;
cout << "After insertion of an element. The hash table is:"<<endl;
cout << "=================================="<<endl;
string key, value;
string fname[30], lname[30], custid[30];
ifstream myfile("Customer.csv");
int fid = 0, lid = 0, cid = 0;
if(!myfile)
{
cout<<"Error opening output file"<<endl;
return 0;
}
int count = 0;
while(!myfile.eof())
{
getline(myfile,fname[fid], ',');
getline(myfile,lname[lid],',');
getline(myfile,custid[cid],'\n');
fid++; lid++; cid++; count++;
}
int inc;
if (count>0){
for(inc=0; inc<count; inc++)
{
key = fname[inc]+"_"+lname[inc];
value = custid[inc];
hmap.insertData(key, value);
}
}


hmap.printData();
cout << endl;

return 0;
}
What does 'not running properly' mean?

You will read beyond the the end of the stream:
1
2
3
4
5
6
7
while(!myfile.eof()) // This is a problem
{
getline(myfile,fname[fid], ','); // Imagine the eof occurs here
getline(myfile,lname[lid],','); // Despite the eof it is read
getline(myfile,custid[cid],'\n'); // Despite the eof it is read
fid++; lid++; cid++; count++; // Worse: The variables are also increase
}
Another poblem is that you do not check whether the indexes remain within the limits. If you have more than 30 lines in the file your program will most likely crash.

Why don't you use count only? fid, lid, cid are not used anywhere else and don't have any other value than count.
Topic archived. No new replies allowed.