Binary searching by a key containing two variables?

So I have an array of distinct pairs of natural numbers. That is, my array looks something like this (for instance):

{ (1,5) , (6,4), (5,3), (2,3), (2,4) }

And my task is, for a given pair (a,b) find (if it exists) the pair (c,d) which has c > a and d > b and, in addition, is the "minimal" such pair (that is, c is minimal and, if there are multiple pairs which satisfy this with the same c, then d is minimal.

So for instance, if I run the query for the above array for pair (1,1), it should return (2,3) and if I feed it (5,2) it should return (6,4).

Now the problem is that I have to run multiple queries so linear time is not good enough. And my question is: is there any way in / any law by which I can "sort" such an array so that I can later run binary searches on it to find my answer?

I was thinking I could simply go like this: let (x1,y1) and (x2,y2) be two pairs to compare. Then:

1
2
3
4
5
6
if(x1<x2)
    //pair 1 is smaller
else if(x1==x2 && y1<y2)
    //pair 1 is smaller
else
    //pair 2 is smaller 


This seemed to solve most cases, however.. if I have an array which looks like this:

 
{  ....   (50,60)  ..... }


where (50, 60) is at the middle of the array. And I run a query for (40,70) for instance. The problem is that the pair that I want to find could be either on the left or on the right of (50,60) (we could have something like (45,90) on the right of (50, 60), but then again we could only have (x,10) on the left of (50,60) where x<50 so no pair on the left would satisfy our condition and I'd have to look to the right).

So to sum it up, if I'm running a query for pair (x1,y1) and through my binary search I come across and element (x2,y2) which has x1<x2 and y1>=y2, then my algorithm cannot decide whether it should keep searching left or right. So I have to search both sides and in a worst case scenario this ends up being O(n).

So is there a more clever way in which you can do this? Any help would be appreciated. Thanks in advance :)
Have you considered taking advantage of a std::multimap?
http://www.cplusplus.com/reference/map/multimap/
http://en.cppreference.com/w/cpp/container/multimap
It's not a perfect fit but it may help.
Depending on your restrictions/how perfomance is measured there are different approaches.
Most obvious is to sort values by first number then second. Find first pair larger than first number of querry and then make linear search for first pair where second number is larger than querry one. Still O(N)

By sacrificing some memory (m×n in worst case where n is number of pairs and m is number of unique first number) you can make it O(log n):
there would be container of unique first numbers; each element containing list of all pair which might suit you sorted by second number. Two binary searches = O(log n)

By sacrificing way more memory (k×s, k and s are difference between max and min first and second number) you can make it O(1): huge 2D array U such as U[x][y] contains best match for querry (x, y)
Most obvious is to sort values by first number then second. Find first pair larger than first number of querry and then make linear search for first pair where second number is larger than querry one. Still O(N)

By sacrificing some memory (m×n in worst case where n is number of pairs and m is number of unique first number) you can make it O(log n):
there would be container of unique first numbers; each element containing list of all pair which might suit you sorted by second number. Two binary searches = O(log n)


{ (8 9), (30 40), (50 20), (50 30), (50 40), (55 30), (60 70), (75 75) }

And you want to run the query for (40 60). You do binary search for the first pair for which the first number is greater than 40 and you get (50 20). But then through the entire list of pairs starting with 50 you won't find any which will have their second number smaller than 60. Nor (50 20) nor (50 30) nor (50 40) satisfy this. However there is a pair (in fact more than one in this case) later on in the array which satisfies the conditions of the query (in this case the minimal one would be (60,70)).

And there would be sort of the same problem in the second case as well. Third case just seems like it would use unrealistically much memory unfortunately.

I do appreciate the help though. I wasn't really expecting to get any answers on this

EDIT: I only now understand what you meant in the second case, sorry. The problem with that is it would require precalculations and I'm trying to use this for a dynamic data structure. It might work in some way, however. I will have to think about it. Again, thanks for the answer. Any other solutions?
Last edited on
@MiiNiPaa: i don't understand your propose structure for second case, ¿may show an example?

With O(n^2) memory, O(lg n) query time,
Sort the points by the 'y' coordinate (yes, the second one).
Then make each point to contain a list of the points that have a higher 'y' coordinate. Sort each of these list by the 'x' coordinate.

So first you make the lower bound on 'y', and find the list of candidate points (those that may contain the query). Then lower bound on 'x', to get best fit.


> The problem with that is it would require precalculations
> and I'm trying to use this for a dynamic data structure
perhaps you should tell us what are you trying to solve.
In this case, inserting or deleting a point would take O(n lg n)
ne555 wrote:
i don't understand your propose structure for second case, ¿may show an example?
 upper
 bound
(first
 number)
  ↓
[ 8 ] -> (8 9), (50 20), (50 30), (55 30), (30 40), (50 40), (60 70), (75 75)
  ↓
[ 30] -> (50 20), (50 30), (55 30), (30 40), (50 40), (60 70), (75 75)
  ↓ 
[ 50] -> (50 20), (50 30), (55 30), (50 40), (60 70), (75 75)
  ↓
[ 55] -> (55 30), (60 70), (75 75)
  ↓
[ 60] -> (60 70), (75 75)
  ↓
[ 75] -> (75 75)
      ↑
    upper
    bound
   (second
    number)  
Actually, by having single array sorted by second variable and making first lookup table (vertical in my diagram) store pointers to different parts of the array will bring it to O(n) memory usage. And it is easily supported structure which does not need much maintenance and can be recalculated quickly if needed.Nope, missed another case where it will not work.

Edit: My approach here will give least y possible instead of x. You need to search by second number first as ne555 suggested. Everything else still holds true.

sebihp2007 wrote:
And you want to run the query for (40 60). You do binary search for the first pair for which the first number is greater than 40 and you get (50 20). But then through the entire list of pairs starting with 50 you won't find any which will have their second number smaller than 60.
...and you continue searching until you hit end of array or find suitable number. In your example algorithm stops when it hit (60 70).
Last edited on
If these are 32 bit integers then just put them into a 64 bit integer. Loop through that array. Find the smallest number that's greater then x.
Thanks everyone :).. I managed to figure out a way to do this in O(log(n)^2) time and O(n) space using segment trees (which means O(n) space is something like 3n); but it fits my needs. I will mark this as solved now.
Topic archived. No new replies allowed.