Some qustions on managing a large number of objects with a hash table

I am building an object manager that manages several type of objects that serve different purposes. All of these must be searchable by a string that represents the object's name. I have a few questions in regard to efficiency of each lookup.

Keep in mind that the number of each type of objects is not known. However, it IS known that there will be a lot more of one type of object than another.

1. Is it better to have one very large hash table or to break it down into smaller tables? For example, would it be better to create several hash tables based on object type or lump them all into one hash table (using a base type)?

2. If I decide to break it into several hash tables, I want to provide both generic and type specific lookup functions. I am considering using a template and I was wondering if it is possible to mix both typed and non-typed template arguments. For example, would this work?

1
2
3
4
5
6
7
8
9
10
template<class ReturnType, int N> ReturnType* find(const QString& objectName)
{
	return reinterpret_cast<ReturnType*>(mObjectTables[N][objectName]);
}

#include "SomeType.h"
int main()
{
	SomeType* object = find<SomeType, 5>("MyObjectName");
}
Last edited on
template<class ReturnType, int N> ReturnType* find(const QString& objectName)
{
return reinterpret_cast<ReturnType*>(mObjectTables[N][objectName]);
}
If you're going to do this, you may as well just lump all the objects together and make the object's type a part of the hash function. You'll need a metafunction to map a type to an integer to use in the hash function.

1
2
3
4
5
6
7
8
9
10
11
12
template <typename T>
struct TypeToInt{
};

template <>
struct TypeToInt<SomeType>{
    static const int value = 5;
};

template<class ReturnType> ReturnType* find(const QString& objectName){
    return reinterpret_cast<ReturnType*>(mObjectTable[std::make_pair(objectName, TypeToInt<ReturnType>::value)]);
}
Hi,

Is there any particular reason why you can't use a std::unordered_map with the value part of the pair being assigned a pointer to derived type? In other words, use the container in a polymorphic fashion? This container uses a hash function internally.

That way you can use an STL container with std algorithms/ functions and avoid casting and rolling your own versions of these.
I haven't planned to roll out my own version. My application uses the Qt library and I was planning on using the QHash object. I could use an unordered map but the only benefit I see from that is being able to write my own hash function as helios suggests.

TheIdeasMan... I am not sure I understand what your are saying. From the documentation I have seen the unordered map takes a template argument of T for the value type; so a pointer to a derived type won't hold all the necessary types. Even if I was going to split the table, I am not going to split it that many times (There are going to literally be dozens to possibly a hundred or more derived classes in the table). I already have around 40+ classes going into the table. I am going to use a base type; but not necessarily the most generic one.

So Helios... you are suggesting I write my own hash function and stick them all in one table. TheIdeasMan... you seem to think I should split it?
primem0ver wrote:
From the documentation I have seen the unordered map takes a template argument of T for the value type; so a pointer to a derived type won't hold all the necessary types.


I think it will: declare it taking a pointer to base; assign it with pointer to Derived. A pointer to Derived is a valid pointer to Base - that is one of the big features of virtual polymorphism in C++. It means that one doesn't have to write lots of functions. So for this reason, I don't think it should be split.

My application uses the Qt library and I was planning on using the QHash object. I could use an unordered map but the only benefit I see from that is being able to write my own hash function as helios suggests.


Ah, well QHash looks to be almost the same thing as std::unordered_map. Either way, if you can populate it with pointer to Derived, you may not need to write a custom hash function at all.

I am going to use a base type; but not necessarily the most generic one.


If you do it in the polymorphic fashion described above, then you would use the most generic one.

The Base type can be used in the code calling the function as well:

1
2
3
4
5
6
7
8
9
int main()
{
        QHash<QString, BaseType* > MyHash;

     // fill MyHash with QString and pointer to Derived

       QHash<QString, BaseType* >::const_iterator Iter =  MyHash.find("MyObjectName");
	
}


If you use Iter.value(), you will get a pointer to Derived because of the polymorphism and that the container was filled with pointers to Derived.

Just out of interest, what sort of things do have in your 40 or 100 odd types of objects? There might be ways to deal with that many types.
TheIdeasMan,

LOL. Thanks for the lesson even though I am familiar enough with C++ to know pretty much everything you said about types stored in the hash table. I just don't know the terminology so sometimes the way you say things confuses me. I am aware that one can implicitly downcast to a base class. In my experience, the opposite though cannot be done implicitly.

My only uncertainty is in what you are saying or implying about the _cast operations. I am not always 100% sure which cast method is best for a specific re-casting situation. It seems to me that if the table stores a pointer to a base class, that is what I am going to get when I pull it out of the table. So I need to re-cast it as the derived class if I want a method that returns the cast I am looking for for each type of item. But from what I understand you to be saying I don't need to use a casting mechanism? That doesn't make any sense to me when my BaseClass pointer in the table could refer to a DerivedClass1 or DerivedClass2, or DerivedClass3 etc...

About 50-70% of the classes in my application will need to be searchable and they all get put into the object manager. UI compone nts, plugin classes, plugin data classes, and other types of classes. Currently there are 3 plugin classes (with a fourth and fifth planned) and around 25-30 plugin data classes alone. There will be a fairly large number of different types of UI components as well.

This is all useful... but to my original question. Is a single generic table better (i.e. more efficient) or multiple more specific tables when there is the potential for thousands of objects to be stored?
Last edited on

Hi,

Sorry if my message came across as a "lesson", but I have come across quite a few people who don't quite understand this :+)

primem0ver wrote:
I am aware that one can implicitly downcast to a base class. In my experience, the opposite though cannot be done implicitly.


But from what I understand you to be saying I don't need to use a casting mechanism? That doesn't make any sense to me when my BaseClass pointer in the table could refer to a DerivedClass1 or DerivedClass2, or DerivedClass3 etc...


The key idea for this is the declaration is a pointer to Base, but the container is populated with pointer to Derived. Quite simply, a pointer to Derived is a valid pointer to Base.

That's the thing, there is no casting for the polymorphism to work. The compiler uses the vtable to check that a pointer to Derived really is a descendant of Base class, if it is, then one can use that pointer to call the (pure) virtual functions that exist in the Base class. So all the Derived classes should supply a concrete definition of those virtual Base class functions. But, seen as the pointer is a pointer to Derived, the Derived class functions are called. That's it, there is no conversion or casting done by compiler, nor is there any need for the code to do any explicitly. Often the need for casting could point to problems with the design.

As I said earlier this is a really powerful thing: One can have one set of interface functions for all the Derived classes, not different functions for each Derived class. Adding a new Derived classes just means providing a concrete definition for the virtual Base class interface functions, keeping the same names.

This is all useful... but to my original question. Is a single generic table better (i.e. more efficient) or multiple more specific tables when there is the potential for thousands of objects to be stored?


So, keep it all in one table :+) In my mind the number and type of the objects is irrelevant - seen as you are using a Hash container. This type of container is efficient for lookup because of the hash function, perhaps more importantly it will be better when there are lots of insertions after the initial construction of data. Having one table is also better than multiple ones because one doesn't have to have extra logic to decide which of the other tables to go looking in. Just on having thousands of objects: thousands is a drop in the ocean; millions is more normal. Perhaps the limiting factors are: the size of 1 object and how much RAM one has.

Can you provide short descriptions of the real world classes you have for your UI components, plugins, plugin data and other classes? There might still be other ways of doing this. In my mind it is often a danger when someone expresses their problem in generic fashion (Base, Derived, foo, bar etc). By danger I mean we could spend a lot of time discussing the generic solution when there may be a better solution. Often alternative solutions may be envisaged when the real situation is presented :+)

Regards :+)
TheIdeasMan wrote:
That's it, there is no conversion or casting done by compiler, nor is there any need for the code to do any explicitly

I know that this is true for down-casting. Now I also see why you feel that it isn't necessary for "up" casting if tasks can be done by virtual functions. However, in my case up casting is going to be necessary because of the vast differences in classes that exist across the entire span of my project.

TheIdeasMan wrote:
Can you provide short descriptions of the real world classes you have for your UI components, plugins, plugin data and other classes?

I would have to think about that one. I could probably share some that are general enough to have a non-specific application (such as UI components) and varied enough to show the significance of the difference between examples of classes that will go in the table. However, I am not comfortable with sharing the class structure, particularly those dealing with the plugins in a public environment.
Topic archived. No new replies allowed.