### I need help coding a function that counts the amount of times a sequence occurs in a gnome?

am having issues in c++ trying to have a command count the number of occurrences.
I am tasked to do the following:
int​ ​ countMatches ​ (string​ ​ genome,​ ​ string​ ​ sequence1​ ​ , ​ ​ float​ ​ min_score)
The​ ​ countMatches ​ ​ function​ ​ takes​ ​ three​ ​ parameters,​ ​ a ​ ​ string​ ​ containing​ ​ the​ ​ genome​ ​ to
search,​ ​ a ​ ​ string​ ​ containing​ ​ the​ ​ sequence​ ​ to​ ​ find,​ ​ and​ ​ a ​ ​ floating​ ​ point​ ​ value​ ​ containing
the​ ​ minimum​ ​ similarity​ ​ score​ ​ that​ ​ will​ ​ be​ ​ considered​ ​ a ​ ​ match.
The​ ​ function​ ​ does​ ​ the​ ​ following:
● The​ ​ function​ ​ should​ ​ find​ ​ all​ ​ the​ ​ positions​ ​ of​ ​ the​ ​ genome ​ ​ where​ ​ the​ ​ genome’s
substring​ ​ matches​ ​ sequence1 ​ ​ with​ ​ a ​ ​ similarity​ ​ greater​ ​ than​ ​ or​ ​ equal​ ​ to​ ​ the
min_score ​ .
● The​ ​ function​ ​ should​ ​ return​ ​ the​ ​ count​ ​ of​ ​ the​ ​ number​ ​ of​ ​ such​ ​ positions​ ​ it​ ​ found.

The input is ccgccgccga , cgc, .6
the ouput is 3

my code :
int main()
{
string genome;
string sequence3;
float minscore;

cout << "Please enter Genome: " << endl;
cin >> genome;
cout << "Please enter sequence: " << endl;
cin >> sequence3;
cout << "Please enter minimum score: " << endl;
cin >> minscore;
countMatches (genome, sequence1, minscore);

}

int countMatches (string genome, string sequence3, float minscore)
{

float n = 0;
float c = 0;
float m = 0;
for (int i = 0; i <= genome.length(); i++)
{
int d;

if (sequence3 != genome.substr(i, sequence3.length()))
{

int a = 0;

while (a <= sequence3.length())
{
if (genome[a] == sequence3[a] )
{
m = m+1;
}
a++; //add 1 and redo the a loop until a is equal to the length
}
d = m/sequence3.length();
cout << "similarity : " << d;
}

if (sequence3 == genome.substr(i,sequence3.length()) || d >= minscore)
{
c = c + 1;
}

}

cout << " Matches : " << c << endl; //display matches
}

Last edited on
ok, but if you are looking for ccc and the string to search is
cccccccccccc how many do you have :)

also we need to understand the fuzzy matching algorithm better. Its not really explained what the similarity score thing is in the problem, we have to infer it from (not working) code...
Are garden gnomes living beings now?
Modern science is amazing!

Sorry for the laugh. Gotta admit it is kind of funny. Anyway, +1 to jonnin for asking the question that needs to be asked: how is your fuzzy matching algorithm supposed to work? Because actual comparative genomics is significantly more involved...
You to good for Moodle?
Take your g(e)nome (sorry, I saw the same as @Duthomas!) string and numerically slide the sequence string along underneath.

For each possible position, count the number of matching characters between the genome substring and the searched-for sequence, divide this by the sequence length (careful not to use integer division) and compare with minscore. If the matching fractional equals or exceeds minscore then increment the count of matching subsequences.

For your test example ccgccgccga , cgc, .6:
 ```matches = 0; [ccg]ccgccga - score 0.333 - NO c[cgc]cgccga - score 1.000 - YES - matches becomes 1 cc[gcc]gccga - score 0.333 - NO ccg[ccg]ccga - score 0.333 - NO ccgc[cgc]cga - score 1.000 - YES - matches becomes 2 ccgcc[gcc]ga - score 0.333 - NO ccgccg[ccg]a - score 0.333 - NO ccgccgc[cga] - score 0.667 - YES - matches becomes 3 return matches (equals 3)```

For @jonnin:
 ```ok, but if you are looking for ccc and the string to search is cccccccccccc how many do you have :) ```

10
Last edited on
Topic archived. No new replies allowed.