Stupid questions about data structures

Forum

Forum
Beginners
Stupid questions about data structures

Stupid questions about data structures

Oct 28, 2019 at 2:01pm

I am trying to learn more about data structures, and i am wondering are hash tables the only type of tables or are there other types. The only thing I seem to find on google is hash tables so i am beginning to think they are the only type, but then I wonder why are they not just called hash tables then instead of tables and start to doubt myself. Can someone clarify this

Oct 28, 2019 at 2:10pm

Ganado (6834)

A "table" is just a way to organize information. Tabulation is the act of writing things into tables. There are look-up tables (LUTs), database tables (relational database management), arrays that are accessed by index that you could call a "table", and I'm sure other variations.

A look-up table is typically used to access a pre-computed result. For example, some early programs (like the Apollo space program software) used look-up tables for their sine and cosine tables to avoid having to do trigonometric math operations (relatively expensive for that time). In other cases, you might use an ordered map for looking up a stored value (O(log n) access), for example, map["dog"] == 42, map["cat"] = 123. Or just an array with index offset, for example, you could store the results of the fibonacci sequence as fibo[0] = 0, fibo[1] = 1, fibo[2] = 1, fibo[3] = 2, etc. This could be a "table".

A database table is a set of rows with different properties in each row that contain information someone might want to query, update, or delete.
https://en.wikipedia.org/wiki/Table_(database)

A spreadsheet (like Excel) is a huge array of table cells.

A hash table is another way of mapping input to the output information that you're trying to reach. Typically hash tables want O(1) look-up on average, and they do this by computing a hash that shouldn't have many collisions (should be relatively unique).

Last edited on Oct 28, 2019 at 4:24pm

Oct 28, 2019 at 2:38pm

Niccolo (720)

Ganado has the goods here, so I'm going to focus on your point about hash tables.

As Ganado points out, tables are just about any organization of data in rows and columns. The rows are typically accessed by some key or index. The columns represent the various pieces of data for any retrieved row in the table. If the table held objects, the columns would be the member data in the object.

The hash table is a popular means of keying rows for fast retrieval when an simple numeric index is not realistic and a sorted key (comprised of some piece(s) of data) is either slower or impractical. One may substitute many options for a hash table, but the popularity of the hash table seems to overwhelm Google's ability to find information on such alternatives.

The selection of the means by which the rows are organized for fast retrieval doesn't otherwise impact the structure of the table itself. The nature of the table is merely the organization of that data when "keyed", by whatever means. I believe the association of the word "table" with the "hash" keying concept is a matter of naming convention for that one method. I would have preferred hash container. When the key method is, say, a balanced binary tree, I don't find it being referenced as a table even when the data retrieved would otherwise qualify as a table. The "sound" of the name "binary tree table", or some facsimile, seems odd, but then, too, hash "table" seems odd to me when the data to be retrieved is only one entry (as in a table with only 1 column). We don't usually refer to an array of structures as a table, but it is essentially one.

Like Ganado's explanation, one may reserve the word "table" for a structure of data which actually represents the individual columns as a container instead of an object. In this approach, some container fashioned to manage the key (be that hash, binary tree, sort or unsorted array, etc) merely returns a container of the columns with data. This would be useful especially when it is expected that many columns in the table format may be empty.

I assert this makes the "table" a bit less formally defined than those containers typically discussed in texts on data structures and algorithms. The "standard" containers, like the trees, arrays and hash tables, are a lot more strictly defined than are tables. SQL database engines store data in "tabular form", and are therefore a major focus of tables, but store their information on disk in a server (with caches in RAM no doubt), whereas much of the focus of attention on the standard containers is relative to storage in RAM (where storage on disk is usually treated as an auxiliary subject). For example, we don't typically refer to a file of fixed sized records as an array, but essentially it is. Tables, on the other hand, are frequently discussed in the context of SQL, which implies everything from network connections to disk storage.

Oct 28, 2019 at 3:23pm

deleted account xyzzy (5768)

Hash tables are one of many different types of data structures, many directly available with C++ as containers.

https://www.hackerearth.com/practice/data-structures/hash-tables/basics-of-hash-tables/tutorial/

In C++ hash tables are also known as associative containers:

set/multiset, map/multimap, unordered_set/unordered_multiset & unordered_map/unordered_multimap.

http://www.cplusplus.com/reference/stl/

Last edited on Oct 28, 2019 at 3:30pm

Oct 28, 2019 at 3:29pm

jonnin (11493)

I have gotten along pretty well by reading 'table' as 'vector' or 'array' when talking C++.
Which is the effective but overly simple way of restating Ganado's post. When talking about algorithms, an additional constraint, most of the time 'table' means 'not searched' in a traditional fashion, but directly accessed.

Now I will diverge a little...
to me, a lookup table and a hash table are the same core/major idea.
ASCII is a table: the letter 'A' is the value 65. If you want to get an 'A' out, you go to the 65th location. Whether that 65 came from some screwy numerical computation or is just a hard-coded offset does not matter (in the first, its a 'hash', in the second, its a 'lookup') that much in the big picture: the point is, at some location that you can acquire, your data is found. A lookup table, in the 'big picture', is just a hash table where the hash key is known instead of computed and indeed, the key itself is often 'data' rather than a meaningless number. A key point is that because the lookup table is all known up front, there is no collision issues and everything just gets one slot cleanly.

The details do matter if you are coding one.
really quickly:
int smallfact[] = {0,1,2,6,24,120};
int factorial3 = fact[3]; //lookup table, you want the factorial of 3, 3 is not calculated or computed, its just 'known' at this point in the code, whether user input or a variable or a constant or whatever.

int hash(int data)
{
srand(data);
return rand()%10;
}
int tbl[10];
tbl[hash[1234]]= 1234; // hash table. you don't know WHICH location 1234 is in, just that it went in your table somewhere, and to get it back, you have to hash 1234 again to find the location. Its still a lookup, but you have an extra step here, and another extra step(s) if you need to handle collisions, which may involve multiple lookups (with collisions it could be in any of several potential slots in the table, and you check them all or until found it).

in any case, you have an array like thing and data put into it, and some way to find it via its location directly (as opposed to a linear or binary search to find it). The ability to 'go get it' instead of 'search for it' is the key to what makes a 'table' conceptually.

interestingly, for a while CPUS had the trig lookups on the chip as well. They may still do that, I do not know.

Other cool stuff:
most switch statments are coded as tables where the compiler can do it, making them very efficient.
I use a lot of tables: they are very high performance and modern computers have ram to spare to hold the data. Tables are a great example of the time space tradeoff.
the random seed hash table (<random> is better but I still have to look up the details to write code with it so in quick example i used rand() ) is surprisingly effective in practice.

Last edited on Oct 28, 2019 at 3:47pm

Oct 28, 2019 at 4:20pm

Ganado (6834)

When talking about algorithms, an additional constraint, most of the time 'table' means 'not searched' in a traditional fashion, but directly accessed.

I like that summary, I think it fits well.

Topic archived. No new replies allowed.