I need help coding a function that count

Forum

Forum
General C++ Programming
I need help coding a function that count

I need help coding a function that counts the amount of times a sequence occurs in a gnome?

am having issues in c++ trying to have a command count the number of occurrences.
I am tasked to do the following:
int countMatches (string genome, string sequence1 , float min_score)
The countMatches function takes three parameters, a string containing the genome to
search, a string containing the sequence to find, and a floating point value containing
the minimum similarity score that will be considered a match.
The function does the following:
● The function should find all the positions of the genome where the genome’s
substring matches sequence1 with a similarity greater than or equal to the
min_score .
● The function should return the count of the number of such positions it found.

The input is ccgccgccga , cgc, .6
the ouput is 3

my code :
int main()
{
string genome;
string sequence3;
float minscore;

cout << "Please enter Genome: " << endl;
cin >> genome;
cout << "Please enter sequence: " << endl;
cin >> sequence3;
cout << "Please enter minimum score: " << endl;
cin >> minscore;
countMatches (genome, sequence1, minscore);

}

int countMatches (string genome, string sequence3, float minscore)
{

float n = 0;
float c = 0;
float m = 0;
for (int i = 0; i <= genome.length(); i++)
{
int d;

if (sequence3 != genome.substr(i, sequence3.length()))
{

int a = 0;

while (a <= sequence3.length())
{
if (genome[a] == sequence3[a] )
{
m = m+1;
}
a++; //add 1 and redo the a loop until a is equal to the length
}
d = m/sequence3.length();
cout << "similarity : " << d;
}

if (sequence3 == genome.substr(i,sequence3.length()) || d >= minscore)
{
c = c + 1;
}

}

cout << " Matches : " << c << endl; //display matches
}

Last edited on

jonnin (11333)

ok, but if you are looking for ccc and the string to search is
cccccccccccc how many do you have :)

also we need to understand the fuzzy matching algorithm better. Its not really explained what the similarity score thing is in the problem, we have to infer it from (not working) code...

Duthomhas (13130)

Are garden gnomes living beings now?
Modern science is amazing!

Sorry for the laugh. Gotta admit it is kind of funny. Anyway, +1 to jonnin for asking the question that needs to be asked: how is your fuzzy matching algorithm supposed to work? Because actual comparative genomics is significantly more involved...

DrKnox (1)

You to good for Moodle?

lastchance (6980)

Take your g(e)nome (sorry, I saw the same as @Duthomas!) string and numerically slide the sequence string along underneath.

For each possible position, count the number of matching characters between the genome substring and the searched-for sequence, divide this by the sequence length (careful not to use integer division) and compare with minscore. If the matching fractional equals or exceeds minscore then increment the count of matching subsequences.

For your test example ccgccgccga , cgc, .6:

matches = 0;
[ccg]ccgccga - score 0.333 - NO
c[cgc]cgccga - score 1.000 - YES - matches becomes 1
cc[gcc]gccga - score 0.333 - NO
ccg[ccg]ccga - score 0.333 - NO
ccgc[cgc]cga - score 1.000 - YES - matches becomes 2
ccgcc[gcc]ga - score 0.333 - NO
ccgccg[ccg]a - score 0.333 - NO
ccgccgc[cga] - score 0.667 - YES - matches becomes 3
return matches (equals 3)

For @jonnin:

ok, but if you are looking for ccc and the string to search is
cccccccccccc how many do you have :)

Last edited on

Topic archived. No new replies allowed.

C++

Forum

I need help coding a function that counts the amount of times a sequence occurs in a gnome?