Improve hash table algorithm

Hi.

The exercise is to find all the pairs in a vector that have a difference of k = 2
In the vector below, the pairs are: (1,3) (7,5) (7,9) (5,3)

I'm using hash tables to achieve this goal in a faster way than using a nested for loop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
constexpr int k = 2;

vector<int> nums = { 1, 7, 5, 9, 2, 12, 3 };
const int size = nums.size();

// hash tables used for lookups
unordered_set<int> lookup{ nums.begin(), nums.end() };

// pairs found
vector<std::pair<int, int>> pairs;
		
for (int i = 0; i < size; i++)
{
   // search in the hash table for those values that are equal either to currentNumber+k OR to currentNumber-k
   auto search = std::find_if(lookup.begin(), lookup.end(), 
			[nums, i, k](const int& val)->bool { return nums[i] + k == val || nums[i] - k == val; });
		
    // if found, save the pair	
    if(search != lookup.end())
	pairs.push_back(make_pair(nums[i], *search));
}


It all works fine.

BUT comparing this algorithm to the "trivial" one, the output is:
(1,3) (7,9) (5,7) (9,7) (3,1)
instead of
(1,3) (7,5) (7,9) (5,3)

This happens because the hash table performs a "global" lookup, whereas in the "trivial" algorithm, the vector would only compare the currentNumber to the numbers that come AFTER it (as a contiguous sequence)

Any idea about how to avoid those "duplicates" and, hence, improve my algorithm?

Thanks in advance!
Last edited on
There might be a better way, but I would first try the following:

1. Sort the data, O(n log n)
{ 1, 2, 3, 5, 7, 9, 12 };

2. Remove duplicates, if any. (could be combined into step 3 to prevent another traversal)
{ 1, 2, 3, 5, 7, 9, 12 };

3. Have two indices to keep track of where you are (one for each pair), increment them like the movement of an inch worm, so to speak.

Counter A: nums[0] --> 1
Counter B: nums[1] --> 2
    difference less than 2, so increment counter B

Counter A: nums[0] --> 1
Counter B: nums[2] --> 3
    difference equal to 2, so add to list of pairs, and increment counter A

Counter A: nums[1] --> 2
Counter B: nums[2] --> 3
    difference less than 2, so increment counter B

Counter A: nums[1] --> 2
Counter B: nums[3] --> 5
    difference greater than 2, so increment counter A

Counter A: nums[2] --> 3
Counter B: nums[3] --> 5
    difference equal to 2, so add to list of pairs, and increment counter A.

etc.

Last edited on
...

what if you hashed each value in nums.

then get the first element. it is 1.
there are only 2 possible values with a difference of 2 for an input of 1.
search hash table for 3? found it, generate output.
search hash table for -1 (if negatives allowed). didn't find it.
second value 7. look for 5 and 9, generate output.
...
if you generate output in sorted format (lowest, highest) or (equal, equal) form
and you store that into something that ignores duplicates (if already there, don't store it again) that should give you what you want.

That is O(N), right?


@jonnin

store that into something that ignores duplicates


That's the point.

std::pair's operator== only considers equal those pairs where first == first and second == second.

(1,2) (1,2) are equal
(1,2) (2,1) are not

I was wondering if there's an STL function that checks if two pairs are equivalent... otherwise I'm forced to implement my own operator== for std::pairs

I would like to avoid the latter, and find a more elegant way...
Last edited on
@jonnin
True, that seems like it's O(n) if there's no rehashing done (i.e. big-enough pre-allocation), and if inserts and queries are both O(1) as well...

Updated psuedo code based off of what jonnin said:

for i in range[0, N-1]
    n = nums[i]
    Hash { n-2, n } pair into hash table.
    If pair not already in hash table, add to vector.
    Hash { n, n+2 } pair into hash table.
    If pair not already in hash table, add to vector.



Yeah, that looks O(n) to me.

@gedamial
Making your own orderless pair is trivial, but I don't think you need to.
Last edited on
I was saying that when you create the pairs, enforce a rule that first <= second yourself. It isnt built into anything, you have to actually do it by hand I think. Or you can write the compare operator. I think its less comparisons and work to do it by sorting the pairs yourself.

I am trying to think of a greatly reduced compare operator but they are all too complex so far..
you can do it without a jump, how fast do you really need this?
Last edited on
> compare operator
Everything he needs is already built-in. He shouldn't have to define any new operators, imo.

Edit: Oh, so apparently the standard library DOESN'T implement a way to hash std::pair. That's annoying. But not hard, you just have to implement a simple hash yourself, which is relatively easy because int is a built-in type.
https://stackoverflow.com/questions/20590656/error-for-hash-function-of-pair-of-ints
Last edited on
@Ganado
If pair not already in hash table, add to vector

Yeah I was focusing on that.
When the pair is (1,3) all is fine... but when an EQUIVALENT pair pops up (3,1) it should be ignored

@jonnin
how fast do you really need this?

Just O(n)
Last edited on
@gedamial
Don't try to hash (3,1) in the first place. Only ever hash (n-2, n) or (n, n+2). Problem solved. (In my previous post I accident wrote +/-1 instead of +/-2, fixed now).

Edit: (duplicate edit of previous post copied here)
So apparently the standard library DOESN'T implement a way to hash std::pair. That's annoying. But not hard, you just have to implement a simple hash yourself, which is relatively easy because int is a built-in type.
https://stackoverflow.com/questions/20590656/error-for-hash-function-of-pair-of-ints
Last edited on
right, that is what I was trying to say ^^^^^^^^^^^
that is the right way I think. A comparison operator to figure out what you know intuitively for this data is going to cost more, whether built in or written, even if written well.
Thanks to everyone!
Topic archived. No new replies allowed.