How to find matched character in string using input file?

Hello everyone:

I am trying to write the code that can searches the string for the first character that matches any of the characters specified in its arguments.
The recommended function for me was c++ compare function, but the exam given in this website is :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Ref: cplusplus.com
// comparing apples with apples
#include <iostream>
#include <string>

int main ()
{
  std::string str1 ("green apple");
  std::string str2 ("red apple");

  if (str1.compare(str2) != 0)
    std::cout << str1 << " is not " << str2 << '\n';

  if (str1.compare(6,5,"apple") == 0)
    std::cout << "still, " << str1 << " is an apple\n";

  if (str2.compare(str2.size()-5,5,"apple") == 0)
    std::cout << "and " << str2 << " is also an apple\n";

  if (str1.compare(6,5,str2,4,5) == 0)
    std::cout << "therefore, both are apples\n";

  return 0;
}


Below is part of my code and i would like to use any function (like compare) to find the matched character like "1NCAGAATTTGCATCATGAACGATGAGCTGATCGTGANGNN" from input txt or .fastq file.
I have written this code for dna search.
part of my code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
 ifstream read_stream(read_files.at(i).c_str());
 read_stream.seekg(read_sizes[my_rank-1]);
 //checking the position in file;
 file_pos = 0;
 if(my_rank != comm_sz -1){
     while(file_pos < read_sizes[my_rank]){
      read_stream >> read_temp;
      //Get position in input sequence
       file_pos = read_stream.tellg();
       //write the code to searches the string for the first character that matches any of the characters specified in its arguments.           
      // charachters for comparision would be: "1NCAGAATTTGCATCATGAACGATGAGCTGATCGTGANGNN"
          {
             code::
          }
     }
 }


input file:

>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
cttttctggtccaatctatttggaattctgtaggcttcttttatgttcat
ggatatctctttctttaagtttgggaagttttcttctctaattttgttaa
agatatttgctggtcctttaagttgaaaatcttcattctcacctactctt
gttgtccatatgtttgggcttttcattgCNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
taagcagattataattaagattctacatgtatatacagatacatatatgc
actgaaaagaaggccatgggtatgcaaaagaacaagaaaatgaatatgga
agggactggagggaggaaaaggaagaaatgatgcaattatattgtaatat
ccaaaatagaaaaaAAAAAAAAAAAGAGTAACGGAATGGgaaaaaaaaag
aaagaaaaaagaaaaagaaaagaaaaagaaaaagaaaaaaaGCCCCCTTT
TCCCCCTTTGATTTTTCTCCTTAAAATATTCGGGCAAGAAAGGATAGATA
GAAAGGACATCAAAGTTCAGGTAACAAACTGTGTCTTCATGCCCTGACAC
AGCCTTGGCTCATTCATAGTTAACACAGAACTTAGTGAATATCATTTACA
TATGTAAATCACCCaggcttctcatagtttcttattctcttttgttttct
gttctgcctttgattctatttcaaatgaactacaggtttttaactctgcc
Last edited on
you can use find().
@ jonnin

I have tried find and .first of before as below but didn't work. I talked to my supervisor and he told me i can use compare function.
1
2
3
4
5
6
7
8
if(read_temp.size() == 40 && std::string::npos == read_temp.find_first_of("@1NCAGAATTTGCATCATGAACGATGAGCTGATCGTGANGNN")){
     found = read_map.find (read_temp); // only reading and checking
     if (found == read_map.end()){
     read_map[read_temp] = 1; // if it could not found in the map
            }else{ // if it is found then:
             read_map[found->first]++;
             }
  }
I misread the problem, not doing well today. Let me think for a min.

how about this:
1
2
3
4
5
6
7
8
9
10
11
12
	unsigned int lut[256] = {0};
		
	string theseinit = "ratatatmooblahblah";
	
	for(int i = 0; i < theseinit.size(); i++)
	   lut[(unsigned char)theseinit[i]] ++; //
   
	string testit = "zzzzzzzzzzzhxxxx";
	for(int i = 0; i < testit.size(); i++)
		if(lut[(unsigned char)testit[i]]) 
			cout <<"found " << testit[i] << " @ location " <<i << endl;


its a lot like your map solution.
Last edited on
Sure.
Let me try it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <string>

int main()
{
    std::string theseinit = "ratatatmooblahblah";
    std::string testit = "zzzzzzzzzzzhxxxx";
    std::string::size_type find_pos{};
    bool match{false};

    for (const auto& elem : theseinit)
    {
        if(testit.find(elem) != std::string::npos)
        {
            if(theseinit.find(elem) == theseinit.find_first_of(elem))
            //so as not to check later instances of theseinit elements that have already not matched
            {
                find_pos = testit.find(elem);
                match = true;
                break;
            }
        }
    }
    if(!match)
    {
       std::cout << "not found \n";
    }
    else
    {
       std::cout << "first matching element is '" << testit[find_pos] << "' at position " << find_pos  + 1 << "\n";
    }
}
good call my 2nd loop probably needed a break once found. It also should probably be bool instead of int and set true instead of ++. My headache is better and brain is coming back online. Minor but better. Mine won't work on Unicode, though, the string versions can be set to do that.

a for loop tied to find is O(N^2). I believe mine was just N. I couldn't find a way to use a built in string function without increasing the work done. But I would love to see it if someone figures that out.

Last edited on
with std::regex c. O(m*n) where m: regex size
https://stackoverflow.com/questions/5892115/whats-the-time-complexity-of-average-regex-algorithms
with a largish search string as in this example there's probably no benefit + time to construct the std::regex object
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# include <iostream>
# include <string>
# include <regex>

int main()
{
   std::string stringToSearch = "zzzzzzzzzzzhxxxx";

   std::string word = "ratatatmooblablahx";
   std::regex re{"[" + word + "]"};
   std::smatch m;

   std::cout << std::boolalpha << std::regex_search(stringToSearch, m, re) << "\n";
   if(std::regex_search(stringToSearch, m, re)) std::cout << m[0] << "\n";
}

Nice one!
I think it's possible to do what you're trying but try no to be rush. Those examples might lead you code to suffer from a lot of bugs and errors that are going to be hard to detect later. There are programs that might help you with it, as checkmarx but I recommend you to make sure you code well and detect them on your own.
Good luck.
@gunnerfunner and @jonnin
Thank you all for you comments and your help. It really helped me to understand to how to do this function. I really appreciate your time.
@benhart
Thanks for your comment. Sure, i will definitely do my own code.
Topic archived. No new replies allowed.