Comparing DNA sequences using arrays

This is what the program outcome should look like...

Enter your DNA sequence: ATTCGACTGA
Enter the number of potential relatives: 3

Please enter the name of relative #1: Peter
Please enter the name of relative #2: James
Please enter the name of relative #3: John

Please enter the DNA sequence for Peter: TTTCGACTGA
Please enter the DNA sequence for James: AAACGTCAGT
Please enter the DNA sequence for John: ATTGCAGTCA

Percent match for Peter: 90%
Percent match for James: 50%
Percent match for John: 60%

This is what mine is doing right now...
Enter your DNA sequence: ATTCGACTGA
Enter the number of potential relatives: 3

Please enter the name of relative #1: Peter
Please enter the name of relative #2: James
Please enter the name of relative #3: John

Please enter the DNA sequence for Peter: TTTCGACTGA
Please enter the DNA sequence for James: AAACGTCAGT
Please enter the DNA sequence for John: ATTGCAGTCA

Percent match for Peter: 10%
Percent match for Peter: 10%
Percent match for Peter: 10%
Percent match for Peter: 10%
Percent match for Peter: 10%
Percent match for Peter: 10%
Percent match for Peter: 10%
Percent match for Peter: 10%
Percent match for Peter: 10%
Percent match for James: 20%
Percent match for James: 20%
Percent match for James: 20%
Percent match for James: 20%
Percent match for James: 20%

I think I'm having issues with my last function where I loop through the array for my DNA sequence and the array for my relatives DNA sequences to find the similarity. Here it is:

void getPercentMatch(char relativeSequence[][11], char dnaSeq[], char names[][256], int relatives)
{
for (int i = 0; i < relatives; i++)
{
for (int j = 0; j < 10; j++)
{
if (dnaSeq[j] == relativeSequence[i][j])
{
cout << "Percent match for " << names[i] << ": " << i * 10 << "%" << endl;
}
}
}
}

Point me in the right direction?
Your "cout" line should be at the end of the i loop, not the middle of the j loop.

You should have a counter for the number of matches, initialised to 0 at the start of the i loop and incremented when each dna base matches the relative's dna base.

The percentage will then be count*10, not i*10, assuming you are always comparing 10 bases.
What is a percentage? How does one count one?

It does look like that you have two words that have same length and you count the positions that have identical characters. Lets take ABBA and BABA:
ABBA
BABA
  **

Two positions with identical characters.

If the two words are identical, then all positions are identical and the count equals length of the words. Identical words surely means 100% ?

Yes, you count something and then see that it is some fraction of something else. ABBA and BABA have 2/4, which equals 50/100, also known as 50%.

Like lastchance said: count first and calculate later.


Furthermore, do not make your function to process all pairs. Make a simpler one that merely compares one "relative" to your "word". Then call that function with each "relative".


skaa mentions longest common subsequence. Real DNA can have mutation, insertion and deletion events. Counting identical characters from words handles only the mutation. Lets look at the two words again, but introduce gaps
-ABBA
BA-BA
 * **

The "percentage" might actually be 3/5, i.e. 60% rather than the 50% of the first method.

For some reason I have a feeling that it is enough for you to do the first method.
Topic archived. No new replies allowed.