string vs wide string hashcode time

I have a "basic" question about the time it takes to compute a hash code for strings because Qt doesn't have support for single byte wide hash code calculation for non-literal strings under certain circumstances. So I am trying to decide whether I need to write my own hash table class.

I know that for calculating a hash code, big O is the same for a single byte wide char string as it would be for a string of wchar_t's. But in terms of the processor time, is one faster than the other? Does the fact that wchar_t's characters take twice as many bytes make it slower or does the fact that processors can handle at least 32 bits at once make it the same? (or possibly even faster?) Assume the hash code is an unsigned int.

Also, as a side question. Has anyone ever experimented with hash table efficiency where a hash table reallocates more space when efficiency falls below a certain value? I am defining efficiency as (size - collisions) / size where size is the number of items in the table.

> So I am trying to decide whether I need to write my own hash table class.

No.

http://en.cppreference.com/w/cpp/string/basic_string/hash
http://en.cppreference.com/w/cpp/container/unordered_map
Ok. Thanks. However I am unfamiliar with the usage of some of the template arguments.

What are these?
class KeyEqual = std::equal_to<Key>,
class Allocator = std::allocator< std::pair<const Key, T> >

Also... I need to be able to inherit from this template. Is it possible to do that? For example...

class InfoHashTable : public std::unordered_map<char*, Infoclass*, std::hash<std::string> >
class KeyEqual = std::equal_to<Key> provides the equality comparison for keys.
Make the key one of the string types eg. std::string and the parameter can be defaulted.

class Allocator = std::allocator< std::pair<const Key, T> > provides the memory management policy; use the default for that too.

std::unordered_map<> is not designed to be derived from. Contain and forward instead.

1
2
3
4
5
6
7
8
9
class InfoHashTable 
{
       // ...

    private:
       // make the key type std::string (instead of char*)
       // and use defaults for the other template parameters.
       std::unordered_map< char* std::string, Infoclass* > hashtable ;
};


If the pointer Infoclass* is an owning pointer, consider making it a smart pointer instead of a raw pointer.
does supplying a char* to the std::string class cause a data copy or a string that wraps the raw char*?

I cannot use std::string as a native argument for the key so unless std::string offers a way to wrap the pointer without copying it, this will cause similar problems I ran into using Qt's hashtable class.

Containing and forwarding is also annoying as it requires extra allocation and call time. Are you sure this can't be done? I was able to do it using QHash except that the only single byte wide class in Qt used with strings does not have a default constructor.

Unless I can find a way around the above two issues, I need my original question answered.
Last edited on
> Containing and forwarding is also annoying as it requires extra allocation and call time.

If the forwarding functions are inlineable, it won't require anything more than what inheritance would have required.


> I cannot use std::string as a native argument for the key

Consider replacing raw c-style strings with std::string.

If you must work with raw c-style strings, and the key is intended to be the logical sequence of characters rather than a physical address:

a. make the key a const char* instead of a char*
b. provide a custom hash function
c. provide a key_equal which uses std::strcmp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <iostream>
#include <unordered_map>
#include <cstring>

struct Infoclass{ int v ; operator int() const { return v ; } } ;

struct InfoHashTable
{
    struct hash
    {
        std::size_t operator() ( const char* key ) const // djb2 (dan bernstein)
        {
            std::size_t h = 5381;
            while( std::size_t c = *reinterpret_cast<const unsigned char*>(key++) ) h = h*33 + c ;
            return h ;
        }
    };

    struct equal_to
    {
        bool operator() ( const char* a, const char* b ) const { return !std::strcmp(a,b) ; }
    };

    std::unordered_map< const char*, Infoclass*, hash, equal_to > hashtable ;

    bool insert( const char* key, Infoclass* value ) { return hashtable.emplace(key,value).second ; }

    Infoclass* find( const char* key )
    {
        const auto iter = hashtable.find(key) ;
        return iter != hashtable.end() ? iter->second : nullptr ;
    }
};

int main()
{
    char keys[][12] = { "one", "two", "three", "four", "five", "six" } ;
    constexpr std::size_t N = sizeof(keys) / sizeof(keys[0]) ;

    Infoclass objects[N] = { {1}, {2}, {3}, {4}, {5}, {6} } ;

    InfoHashTable iht ;
    for( std::size_t i = 0 ; i < N ; ++i ) iht.insert( keys[i], objects+i ) ;

    for( const char* k : { "two", "five", "twelve", "three", "eight" } )
    {
        std::cout << '\'' << k << '\'' ;
        const auto p = iht.find(k) ;
        if(p) std::cout << " => " << *p << '\n' ;
        else  std::cout << " was not found\n" ;
    }
}

http://coliru.stacked-crooked.com/a/d501dc5f50aa25c7
I really do appreciate the effort you put into a programming lesson. It has given me some new things to consider. But could you or someone else please answer my original question?

I may have gaps in my programming knowledge because much of it is self taught. But I am with it enough to know what I need to know to solve my problem. And that is an answer to my question. Because you seem to be having a hard time with this let me spell out my problem in detail for everyone.

I am working on a prototype for an extremely large project that will eventually need to distinguish between potentially tens of thousands of info class instances extremely quickly. A hashtable can provide this but there are several issues I need to deal with.

My project deals with THREE different libraries (none of them the std library) with FOUR different base string classes (one of them being a wide string) and I need direct access to all of them as potential sources of keys. Using the std::string class would introduce a FIFTH which would be complete nonsense because converting between the all the different classes requires unnecessary processing time. I also need to be able to use string literals as keys. The only format I can access that ALL FOUR of these formats have in common is the char* (or const char*... I was being generic). The fourth string type is the QString class (which is the wide character string). I want to keep string copying time and memory use to a minimum.

As it is, I feel my best approach to maximize efficiency is to write my own hashtable class in order to deal all the different string types and key sources. I know enough to do be able to do that. My shortcoming is the knowledge of the intricacies of the hashing functions themselves. Since Qt is the main/base library I am using for the project, and since they provide hashing functions for both single byte and wide strings, I need to weigh the pros and cons of either writing my own single wide hash table class to deal with all the different string types, or to simply use QString as the key. The main unknown in this decision is the unknown I expressed in my original question (and the thread title):

In terms of the processor time, is [hashing] one [single byte wide/double byte wide] faster than the other? Does the fact that wchar_t's characters take twice as many bytes make it slower or does the fact that processors can handle at least 32 bits at once make it the same? (or possibly even faster?) Assume the hash code is an unsigned int.

If you have other input that can help, I will be glad to hear it. But please answer my original question.

EDIT: If I assume that the same basic hash code function you wrote can also be used for a wide character string, then I can deduce from what you wrote that the hashing functions for both would take the same amount of time. Can you confirm this?
Last edited on
Because you seem to be having a hard time with this [...]

I'm curious. At what point did you decide being snarky and condescending to the members of this community would be beneficial in encouraging them to help you further?
Last edited on
> In terms of the processor time, is [hashing] one [single byte wide/double byte wide] faster than the other?

No.

Assuming that 'single byte wide/double byte wide' implies hashing sequences of character types of fixed width. Hashing null-terminated multibyte strings (NTMBS) would be considerably more expensive.


> Does the fact that wchar_t's characters take twice as many bytes make it slower

No.


> (or possibly even faster?)

No.


> the same basic hash code function you wrote can also be used for a wide character string

Yes. For both null-terminated byte strings (each byte encodes one character) and null-terminated wide strings.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
struct hash
{
    static constexpr std::size_t INITIAL_VALUE = 5381;
    static constexpr std::size_t MULTIPLIER = 33;

    std::size_t operator() ( const char* key ) const // djb2 (dan bernstein)
    {
        std::size_t h = INITIAL_VALUE;
        while( std::size_t c = *reinterpret_cast<const unsigned char*>(key++) ) h = h*MULTIPLIER + c ;
        return h ;
    }

    std::size_t operator() ( const wchar_t* key ) const
    {
        std::size_t h = INITIAL_VALUE;
        while( std::size_t c = *key++ ) h = h*MULTIPLIER + c ;
        return h ;
    }
};

struct equal_to
{
    bool operator() ( const char* a, const char* b ) const { return !std::strcmp(a,b) ; }
    bool operator() ( const wchar_t* a, const wchar_t* b ) const { return !std::wcscmp(a,b) ; }
};
THANK YOU!!!
Last edited on
I am sorry about being condescending. I was just very surprised that with all the great advise I had asked the question 3 times without getting an answer (once in the title). I suppose though the way I asked it the third time (before my condescending statement) didn't focus on my need for an answer so I am sorry about that.
Topic archived. No new replies allowed.