How do I compare the similarity of two vectors?

This is possibly more of an algorithm question rather than c++ related.

Suppose I have two vectors:

1
2
std::vector<double> first = {1.0,2.01,3.05,4.05}; 
std::vector<double> second = {1,2,3,3,4}; // very similar except no decimals and a second "3" 


And now I want an algorithm that will tell me how similar these two vectors are. That is, I want to know how much of first is similar to second. An appropriate answer in this case might be about 80%, as only 1 out of 5 elements would have to be removed (second[2]) from second to make it almost identical to first.

Are you aware of any (preferably fast) methods of achieving this? I'm open to suggestions for suggestions for alternatives.
First, define your distance properly

By instance http://en.wikipedia.org/wiki/Levenshtein_distance
where the substitution weight is proportional to the relative error
and the removal set up limits.


¿what problem are you trying to solve?
The problem is this:

Suppose I have two "cells", each composed of a series of genes, which in turn are really just sets of numbers, called codons (they're basically double floats). I want to give each cell a number, or metric which indicates the degree of similarity between its genome and that of another. The problem is that while two codons might be very similar, they're not necessarily the same (3.01 and 3.02 similar, but not the same).

I'm not sure levenshtein distance will help, from what I can tell it's just for discrete sets such as integers and letters, but I think it's a start, so I'll do some more reading :)
Last edited on
This is possibly more of an algorithm question rather than c++ related.


Correct. You really need to define what you mean by "similar".
What would you suggest? Suppose I give you two integers of floats, how would you go about comparing them?
again you need to define 'similar' for yourself.
You said this:
3.01 and 3.02 similar, but not the same


but you *could* argue they are not similar at all. That they are completely different numbers.
You define a tolerance perhaps so if the float falls within this tolerance range away from your integer you could class them as 'similar'. Or if the ratio of the float and int is approaching one, you could class this as similar?
Last edited on
I suppose I could quantify the degree of similarity in terms of the following:

1) if the integer component (ie the cast to int result) of any two floats is the same, then the "difference" between these two numbers shall be the fractional difference between them. So, if we compare "3.1" and "3.2" the difference between these should be 10% of the difference that the Levenshtein Distance would be for two different integers.

2) Any number that is more than an integer apart should be considered "different" as per the levenshtein distance.

I'll give that a go and see how it pans out.
Topic archived. No new replies allowed.