Bioinformatics program with string of DNA

This is just one part of the program. We have this file which contains a string of DNA (ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCC) but much longer. I was able to read into my program pass it to the function and change it to RNA. Now I have to store so it comes out as three letters at a time because I will be using it for later to help with a search.

I was told to use substr and I am confused on how to use it and if I should use it in the main or the function after it is converted.

Here is the code I need help with.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <cstdlib>
#include <iostream>
#include <fstream>
#include <string>

using namespace std;
  class AminoAcid{
      public:
             AminoAcid();
             AminoAcid(string n, string three, string mass, string cods);
            void set_chart(string);
            void transcript(string&);
            string DNA_file, chart;            
      private:
              string name;
              string etterName;
              string codons;
              double molarMass;
              string RNA;
};
int main(int argc, char *argv[])
{
ifstream infile;
    infile.open ("input.txt");                 
    if(infile.fail()){
                  cout <<"error.\n";
    }
    else{     
         string temp;  
         while (!infile.eof()){    
              getline(infile,temp);  
              DNA.DNA_file.append(temp); 
              DNA.DNA_file.substr(0,3);                  
        }
        cout << DNA.DNA_file<<" " <<endl;
        cout<<DNA.DNA_file.size();
        cout<<endl;
        DNA.transcript(DNA.DNA_file);
        infile.close(); //closes file
}  
    system("PAUSE");
    return 0;
}

void AminoAcid::transcript(string& DNA){ 
    int changeT;
    changeT = DNA.find("T");
    cout << DNA << "\n" << endl;
        while (changeT > 0){
            DNA.replace(changeT, 1, "U");
            changeT = DNA.find("T");
        }
       RNA=DNA;
       cout << RNA << endl;
}     
Last edited on
1. Reinvention of wheel. There are many bioinformatics-programs already. You should use existing libraries for such mundane jobs.

2. What you do in the code, does not quite make sense. Perhaps it is due to the things that you do not show? Perhaps it is the naming of the variables? Hard to tell.

3. How do you know the correct reading frame? How do you know that the sequence actually codes a protein?

4. What data do you actually need to keep? Protein sequence?

This ain't right, but:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <map>
#include <vector>
#include <string>
#include <iostream>
#include <fstream>

class AminoAcid {
  char code;
public:
  AminoAcid( char aa ) : code(aa) {}
  char short() const { return code; }
};

int main() {
  std::map<std::string, char> codonTable;
  codonTable[ std::string("TTT") ] = 'F';
  // and the rest

  std::ifstream infile("input.txt");
  char a, b, c;
  std::vector<AminoAcid> sequence;
  while( infile.get(a) && infile.get(b) && infile.get(c) ) {
    std::string codon( 3, a );
    codon[1] = b;
    codon[2] = c;
    auto it = codonTable.find( codon );
    if ( codonTable.end() != it ) {
      sequence.emplace_back( it->second );
    }
  }

  for ( auto aa : sequence ) {
    std::cout << aa.short();
  }
  return 0;
}



Oh, that substr returns a new string object. See http://www.cplusplus.com/reference/string/string/substr/
Last edited on
well my code actually does work and I know it codes a protein.

I just need it to output ACA AGA instead of ACAAGA.

That is why I need help with substr
Last edited on
You can navigate strings like arrays(sorta) so why not use a nested for() loop (not very efficient, but w/e). Something like:

1
2
3
4
5
6
7
8
9
for(int i = 0; i < temp.length(); i++)
{
      for (int j = i; j < (i+3); j++)
      {
            cout << temp[j]
      }
i += 3;
cout << " ";
}

Not quite what I have above, but you get the point, without having to use macro functions.
Last edited on
Seag + substr ~>
1
2
3
4
5
6
7
8
9
10
std::string temp;
// fill temp from input

// print as RNA triplets
for ( size_t index = 0; index < temp.size(); index += 3 ) {
  std::string triplet { temp.substr( index, 3 ) };
  std::replace_if( std::begin(triplet), std::end(triplet),
                   [](char x){return 'T' == x;}, 'U' );
  std::cout << triplet << ' ';
}

Topic archived. No new replies allowed.