Hash table with speed required

Hi, everyone!
I'm a C++ beginner, and here is a project which is about implementing a hash table to store the patients' data in a hospital. As the attached data, each row indicates the index (key), gender, height, and weight of a patient.

The goal is to retrieve data from the table efficiently.
I have to start the program from the attached main.cpp file, and implement HashTable.h and HashTable.cpp files to make the whole project works.

One require is that the speed of the algorithm should be as fast as possible
. The speed will be determined by a linear interpolation.

Seeking some fast algorithm, and also some example codes according to the main.cpp.

Thanks for your help!!

1
2
3
4
5
6
patients' data:
0050253839 female  155 43
0054203884 male    177 73
0076792815 female  178 71
0037780853 female  158 87
........ 


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
main.cpp:

#include <iostream>
#include <fstream>
#include <ctime>
#include <chrono>
#include <cstdlib>
#include <vector>
#include "HashTable.h"

using namespace std;

/*
class to evaluate time cost
*/
template <typename Timer = std::chrono::milliseconds>
struct measure
{
    template <typename F, typename... Args>
    static typename Timer::rep execution(F func, Args &&... args)
    {
        auto start = std::chrono::high_resolution_clock::now();
        func(std::forward<Args>(args)...);
        auto duration = std::chrono::duration_cast<Timer>(std::chrono::high_resolution_clock::now() - start);
        return duration.count();
    }
};

/*
function to be evaluated
*/
void EvaluateFunc(HashTable hashTable, vector<string> testCases)
{
    string gender;
    int height, weight;
    for (int i = 0; i < testCases.size(); i++)
    {
        gender = hashTable[testCases.back()].getGender();
        height = hashTable[testCases.back()].getHeight();
        weight = hashTable[testCases.back()].getWeight();
        testCases.pop_back();
    }
}

int main()
{
    /*
    HashTable is the class implemented by yourself, 5 member functions you need to implement as belows:
        1. addItem(key, gender, height, weight) : add data into your hash table
        2. operator[key]    : return item by selected key
        3. getGender()      : return gender by item
        4. getHeight()      : return height by item
        5. getWeight()      : return weight by item
    */
    HashTable hashTable;

    // for evaluation
    vector<string> testCases;

    // read data line by line
    std::ifstream infile("data.txt");
    string key, gender;
    int height, weight;

    while (infile >> key >> gender >> height >> weight)
    {
        hashTable.addItem(key, gender, height, weight);

        // collect evaluated test cases
        testCases.push_back(key);
    }

    /*
    basic test cases
    ANS:
    hashTable["0015239667"].getGender(): female
    hashTable["0015239667"].getHeight(): 160
    hashTable["0015239667"].getWeight(): 57
    */
    cout << "hashTable[\"0015239667\"].getGender(): " << hashTable["0015239667"].getGender() << endl;
    cout << "hashTable[\"0015239667\"].getHeight(): " << hashTable["0015239667"].getHeight() << endl;
    cout << "hashTable[\"0015239667\"].getWeight(): " << hashTable["0015239667"].getWeight() << endl;

    // Evauluate the speed of your hash table
    auto totalCost = measure<std::chrono::nanoseconds>::execution(EvaluateFunc, hashTable, testCases);
    cout << "Mean: " << totalCost / testCases.size() << " ns" << endl;

    return 0;
}
Last edited on
One require is that the speed of the algorithm should be as fast as possible


"As fast as possible" is a pretty bad requirement to state. But, let's run with it. How much space can your hash table take up? Looks like your key is a string. What restrictions exist on that string? How many letters will be in it, and what characters are acceptable (basic ascii characters, just letters, uppercase/lowercase, spaces allowed?). Is the key simply that 10 digit number at the start of the data? Are all numbers from 0 - 9999999999 possible?

To be fast, you need your hash function to be fast, and you need few hash collisions.

The fastest you'll get will be having that string directly indicate a memory location; basically, just an offset from the start of the table. This will require huge, huge amount of memory. If the key is that 10 digit number, I reckon you could have one pointer per possible key, in about 80GB of table; this would guarantee you zero collisions and be a super-fast hashing function. Presumably you don't have 80GB of memory, but do you have, say 1 GB of memory?


Last edited on
as fast as possible means no collisions, don't even detect or handle -- you have to ensure it. That means an in depth study of your data, and it isn't always reachable, but it is where you have to start. If you go down the collision path, its slower. You also should not use strings as the keys, as that makes it slower. here, for example, the first value, if you drop the first 2 bytes ("00") leaves you 8 bytes, exactly fitting into a 64 bit integer, which will be faster to process. uint64_t* key = (uint64_t*)(&firststring[2]); and from there run with it as an integer into an efficient hash algorithm that maps it to your table. Ideally you would have access to other info that you can mash into your keys, like their name or address or something. in practice, you are probably looking to map an 64 bit key into 2^20 to 2^25 type table sizes without any repeats... but there are MANY values that the key cannot have if its all numeric. Its really only 2^27ish possible values, which is only a few hundred MB sized table... which works as along as the first 2 digits are always zero (?). And then it gets into that data study... is 0000000001 a valid key? or are they all bigger than 10^8?


The speed will be determined by a linear interpolation.
No, the speed is determined by how fast it runs. What the heck are you going to interpolate?

hashing is great and all, but what are you really doing? lets say you get your O(1) data insert and fetch that takes a nanosecond. Now what? Is that all you wanted, was a high speed save and fetch?
Last edited on
Topic archived. No new replies allowed.