Searching Algorithm

1011101111101111011000110011

Me and a friend are working on a project that requires us to create a C++ program (that isnt object oriented) that reads and searches for this pattern of 28 bits in a specific txt file of 2000 bits. Once it finds a full pattern match, it should assign the beginning bit position of the match. e.g. "Match occurs starting at bit position 43".

One constraint: assume no overlapping of pattern matches.

I am just beginning to learn algorithms for C++, so i have asked for help in many places. Any and all help is greatly appreciated.

Edit: to clarify, i am not asking anyone to write this for me. I just want help in regards on where to start, What algoritm you think best fits my case, etc.
Last edited on
if your file is a consequence of chars '0' and '1' then you can use a library function strstr
You could read it into a string then use the std::string::find method. http://www.cplusplus.com/reference/string/string/find/

or the c equivalent would be what interester mentioned. Though if you do the c way you will have to find the index based on the pointer. Maybe subtract the value of the first position from that.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
const char bitpattern[] = {"1011101111101111011000110011"};

LET occurence_start = 0
start:

WHILE getnextchar() != bitpattern[0]
DO
    increment occurence_start
END

FOR k in 1 to len(bitpattern)
DO
    IF getnextchar() == bitpattern[k] then
    DO
        increment k
    ELSE
        increment occurence_start
        SET k = 0
        GOTO start
    END IF
END

print (Match occurs starting at bit position " + occurence_start)
occurence_start += k
GOTO start 


It might need a bit of tweaking to make it linear, but this should find all matches without requiring extra space. In the worst case it is not to far from linear time because of the small subset that is being searched (O(kN)), but if the subset or the search area grows larger, this algorithm might be approaching quadratic time in the worst case. i.e. N2.
So this is my first try:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
#include <ctime>
#include <iostream>
#include <string>
#include <fstream>
#include <stdio.h>

using namespace std;


void readFromSearchFile(int data1[], int data2[])
{
   ifstream myfile ("charbits.txt");
   if (myfile.is_open())
   {
       int i = 0;
       while ( myfile.good() && i<28)
       {
           data1[i] = myfile.get() - '0';
           i++;
       }
       i=i-28;
       while ( myfile.good() )
       {
           data2[i] = myfile.get() - '0';
           i++;
       }
       myfile.close();
   }

   else cout << "Unable to open file";
}

int main()
{
   clock_t t1, t2;
   //Input initializations

   int data[2000];
   int key[28];
   int result[40];
   for(int i=0; i<40; i++)
   {
        result[i]=-1;
   }
   readFromSearchFile(key,data);

   t1=clock();
   for(long int i = 1; i<=1000000; i++)
   {
   //routine

        int k=0;
        int a=0;
        for(int j=0; j<1972; j++)
        {
            if(data[j]==key[0])

            //while(data[j]==key[k])
            //{
            //    k++;
            //    j++;
            //}
            if(k>27)
            {
                result[a]=j-27;
                a++;
                j=j-1;
            }
            else
            {
                j=j-k;
            }
            k=0;
        }
   }
   t2=clock();
   //output
    int j=0;
   while(result[j]!=-1)
   {
        cout << "Match occurs starting at bit position " << result[j] << "\n";
        j++;
   }

   cout << "Time difference is " << (t2-t1)/CLK_TCK << " " << " microseconds.";
   return 0;
}


This compiles in 16 seconds. Now i just need it to go faster.

Smac89, i'll start playing with that code and see if i can get a faster result


If anyone wants to try it themselves, here's the data in the text file, where the first 28 bits are the pattern itself (name the text file charbits.txt):
10111011111011110110001100110111011100011110010110000
111111010010110010101010101010111100100110111000111101100111
111011111110011111100111001110110001001110110010110000001010
110101001101001001101010100111011100110100111010100000100110
101011111100100111011110001001110111101011010001101111011011
001001101011000011000101101011111000100010100000110001101101
111011010101100010100000111100110010100101001010011110001111
010111101100001011010010110111111111010010111110000010000011
011111101110000000101000010001101111001100001001101101101001
111000100111011011100110101110111001001011011010000110110111
001000001010000011101011101011011001101010101001011100100010
000110011111000011000001011110001110011110010001010101001000
000111100101110001001011011111010011011000001100100010001010
101010011100101111110010111010010000101011000001100110111100
001011001011000101011010000111000110111101110011010101001111
111100111001000001101100101101101101100000011101010000100100
000001100010110011101000100001100001011000101101110110010011
100010001110101010001111110000010011101110110000011001011100
001001100010111100110110100001000101110010011001100110100110
111101011101100010010100100001001100101100010010110100010110
011110101100011111100001101111111001110110110000110011111101
100001000000100101000001101000101000110001100000110110010010
110010010010011101100000000111110100110010101110111011111011
110110001100100011110001000110000111100010011111010001101100
101000101001101111001010101000000001100001101101110011010111
110001100001001011100101110101101101011000101100111011100000
010000011110110111011111011110110001100110010010011110010011
010111000011110010011010110001101110010011001111001011101111
101111011000110011101110111110111101100011001110010001110001
000111011110001010100101001001100111111100111001110001111101
111101101000111000011100001001011110100111101100100111000011
011100101101001101100011100010111110011000100011111110100010
010101110111110111101100011001100100001001111100011001011010
1111000000110111111011101110110001101010001101001011100
Just tried it with if statements, and got it to compile in 4(ish) seconds!
Its not pretty, but it works using nested if statements:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
#include <ctime>
#include <iostream>
#include <string>
#include <fstream>
#include <stdio.h>

using namespace std;


void readFromSearchFile(int data1[], int data2[])
{
   ifstream myfile ("charbits.txt");
   if (myfile.is_open())
   {
       int i = 0;
       while ( myfile.good() && i<28)
       {
           data1[i] = myfile.get() - '0';
           i++;
       }
       i=i-28;
       while ( myfile.good() )
       {
           data2[i] = myfile.get() - '0';
           i++;
       }
       myfile.close();
   }

   else cout << "Unable to open file";
}

int main()
{
   clock_t t1, t2;
   //Input initializations

   int data[2000];
   int key[28];
   int result[40];
   for(int i=0; i<40; i++)
   {
        result[i]=-1;
   }
   readFromSearchFile(key,data);

   t1=clock();
   for(long int i = 1; i<=1000000; i++)
   {
   //routine
        int a=0;
        for(int j=0; j<1972; j++)
        {
            if(data[j]==key[0])
                if(data[j+1]==key[1])
                    if(data[j+2]==key[2])
                        if(data[j+3]==key[3])
                            if(data[j+4]==key[4])
                                if(data[j+5]==key[5])
                                    if(data[j+6]==key[6])
                                        if(data[j+7]==key[7])
                                            if(data[j+8]==key[8])
                                                if(data[j+9]==key[9])
                                                    if(data[j+10]==key[10])
                                                        if(data[j+11]==key[11])
                                                            if(data[j+12]==key[12])
                                                                if(data[j+13]==key[13])
                                                                    if(data[j+14]==key[14])
                                                                        if(data[j+15]==key[15])
                                                                            if(data[j+16]==key[16])
                                                                                if(data[j+17]==key[17])
                                                                                    if(data[j+18]==key[18])
                                                                                        if(data[j+19]==key[19])
                                                                                            if(data[j+20]==key[20])
                                                                                                if(data[j+21]==key[21])
                                                                                                    if(data[j+22]==key[22])
                                                                                                        if(data[j+23]==key[23])
                                                                                                            if(data[j+24]==key[24])
                                                                                                                if(data[j+25]==key[25])
                                                                                                                    if(data[j+26]==key[26])
                                                                                                                        if(data[j+27]==key[27])
                                                                                                                        {
                                                                                                                            result[a]=j;
                                                                                                                            a++;
                                                                                                                            j=j+27;
                                                                                                                        }


        }
   }
   t2=clock();
   //output
    int j=0;
   while(result[j]!=-1)
   {
        cout << "Match occurs starting at bit position " << result[j] << "\n";
        j++;
   }

   cout << "Time difference is " << (t2-t1)/CLK_TCK << " " << " microseconds.";
   return 0;
}

If anyone can make this faster, you're a saint! :D
Maybe something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <iostream>
#include <string>

int main()
{
    const std::string file = 
        "10111011111011110110001100110111011100011110010110000"
        "111111010010110010101010101010111100100110111000111101100111"
        "111011111110011111100111001110110001001110110010110000001010"
        "110101001101001001101010100111011100110100111010100000100110"
        "101011111100100111011110001001110111101011010001101111011011"
        "001001101011000011000101101011111000100010100000110001101101"
        "111011010101100010100000111100110010100101001010011110001111"
        "010111101100001011010010110111111111010010111110000010000011"
        "011111101110000000101000010001101111001100001001101101101001"
        "111000100111011011100110101110111001001011011010000110110111"
        "001000001010000011101011101011011001101010101001011100100010"
        "000110011111000011000001011110001110011110010001010101001000"
        "000111100101110001001011011111010011011000001100100010001010"
        "101010011100101111110010111010010000101011000001100110111100"
        "001011001011000101011010000111000110111101110011010101001111"
        "111100111001000001101100101101101101100000011101010000100100"
        "000001100010110011101000100001100001011000101101110110010011"
        "100010001110101010001111110000010011101110110000011001011100"
        "001001100010111100110110100001000101110010011001100110100110"
        "111101011101100010010100100001001100101100010010110100010110"
        "011110101100011111100001101111111001110110110000110011111101"
        "100001000000100101000001101000101000110001100000110110010010"
        "110010010010011101100000000111110100110010101110111011111011"
        "110110001100100011110001000110000111100010011111010001101100"
        "101000101001101111001010101000000001100001101101110011010111"
        "110001100001001011100101110101101101011000101100111011100000"
        "010000011110110111011111011110110001100110010010011110010011"
        "010111000011110010011010110001101110010011001111001011101111"
        "101111011000110011101110111110111101100011001110010001110001"
        "000111011110001010100101001001100111111100111001110001111101"
        "111101101000111000011100001001011110100111101100100111000011"
        "011100101101001101100011100010111110011000100011111110100010"
        "010101110111110111101100011001100100001001111100011001011010"
        "1111000000110111111011101110110001101010001101001011100";
        
    const std::string key = "1011101111101111011000110011";
    
    std::size_t pos = 0;
    while((pos = file.find(key, pos)) != std::string::npos)
    {
        std::cout << "key found at position: " << pos << std::endl;
        pos += key.size();
    }
}


http://coliru.stacked-crooked.com/a/d0a8c6449186c2b5
If you are trying to squeeze out the last bit of performance, you might want to look at the Boyer-Moore algorithm. It might give you a slight improvement.

With binary strings (as opposed to strings using a larger set of characters) it may not give you much improvement. And the string::find() function may use Boyer-Moore anyway (I have no idea).
Topic archived. No new replies allowed.