searching problem (help needed asap)

Pages: 12345
bool five_chars_are_repeated( const std::string& str, std::size_t pos ) ;

Was the function you were meant to implement.

Did you test your code? What value do you suppose pos holds when you use it on line 4? Why are you returning 0 after you've returned true or false?

If you don't understand what you're doing, ask for clarification. Don't just make stuff up in the hopes that someone will eventually write it for you. Think about what you're doing.
Oops - double post.
Last edited on
obviously am very confused, need clarification pleaseeee.
1
2
3
4
5
6
7
8
9
bool tand( string & temp_file_arr)
{
	size_t pos;
	if( temp_file_arr.size() < pos + 9 ) return true ;
	string first_five = temp_file_arr.substr(0,5);
	string next_five = temp_file_arr.substr(5,5);
	if( first_five == next_five ) return true ;
    else return false ;
}


was wondering where is pos supposed to get its initialized value from?
bool five_chars_are_repeated( const std::string& str, std::size_t pos ) ;

In the above prototype (posted now for the third time) pos gets it's value from the calling code.
Am sorry it sounds embarrasing that still i dont really understand.
if pos gets its value from its calling code, that means pos should have been initialized somewhere in the main. am askin ?
if pos gets its value from its calling code, that means pos should have been initialized somewhere in the main. am askin ?


In a sense. There need be no variable named pos in main, and the code needn't be called only from main.

Here is one way you might use the function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <string>
#include <iostream>

bool five_chars_are_repeated( const std::string& str, std::size_t pos ) ;
bool any_five_chars_repeated( const std::string & str) ;

int main()
{
     std::string input ;
     std::cin >> input ;

    if ( any_five_chars_repeated(input) )
        std::cout << "Contains an immediately repeating 5 character sequence.\n" ;
    else
        std::cout << "Doesn't contain an immediately repeating 5 character sequence.\n" ;

}

bool any_five_chars_repeated(const std::string& str)
{
    for ( unsigned i=0;  i < str.length()-5; ++i )
        if ( five_chars_are_repeated( str, i ) )
            return true ;

    return false ;
}

bool five_chars_are_repeated( const std::string& str, std::size_t pos )
{
    // ...
}
Honestly i appreciate the time you guys are taking to put me through this.but at this point am totally confused.
1. string str; is supposed to be the whole string to be searched. right?
2. in the immediate code above, the function (five_chars_are_repeated) is being called in the function (any_five_chars_are_repeated)?
3. i still dont get where pos gets its value from atol. cos i get an error saying pos is uninnitialized
Last edited on
1. string str; is supposed to be the whole string to be searched. right?

"str" is the name given to the argument fed to any_five_chars_repeated and also the argument fed to five_chars_are_repeated. They refer to the same variable, although if we renamed str to abracadabra in five_chars_are_repeated and left it as str in any_five_chars_repeated, it wouldn't make any difference; the name of a reference doesn't have any direct relationship to what it aliases. They also happen to refer to the variable in main known as input, although they could refer to any variable that is fed to the function.


2. in the immediate code above, the function (five_chars_are_repeated) is being called in the function (any_five_chars_are_repeated)?

Yes.


3. i still dont get where pos gets its value from atol. cos i get an error saying pos is uninnitialized

You, apparently, still haven't paid any attention to the function prototype that has been posted four times which differs from the one you're trying to implement. Change the prototype to match.

In the code above, pos is initialized with the value of i in any_five_chars_repeated.
Must i creat two functions to get it done? Because thats what is getting me confused
> Must i create two functions to get it done?

Four functions including main()

1
2
3
4
std::string file_to_string( const std::string& path2file ) ;
bool five_chars_are_repeated( const std::string& str, std::size_t pos ) ;
bool any_five_chars_repeated( const std::string & str) ;
int main() ;
Whooop!!!!!!! Its the one with the size_t thats giving me the confusion really
JLB/CIRE....... this is wat i came up with. running it gives me a position of any five repeated. but am still in doubt of the code. am i on the right track? pleassseee







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
bool nextfive( string & file_arr, size_t pos )
{
	if( temp_file_arr.size() < 12 ) return false ;
	string first_five = file_arr.substr(pos,5);
	string next_five = file_arr.substr(pos+1,5);
	if (first_five ==next_five)  return true ;
    else return false ;
}
	
bool tand( string & file_arr)
{
	for (int  p =0; p < file_arr.length() -5; p++)
	{
		if (nextfive(file_arr, p))
		{
		  
			   cout<< nextfive << p <<endl;
		}
	  
	} 
	
	
	return 0;
}
I know this might not actually help, but I'd rather use the KMP (Knuth-Morris-Pratt) Algorithm for pattern matching in this case.

Here's it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <iostream>
#include <vector>
using namespace std;
 
vector<int> KMP(string S, string K)
{
        vector<int> T(K.size() + 1, -1);
        vector<int> matches;
 
        if(K.size() == 0)
        {
            matches.push_back(0);
            return matches;
        }
        for(int i = 1; i <= K.size(); i++)
        {
                int pos = T[i - 1];
                while(pos != -1 && K[pos] != K[i - 1]) pos = T[pos];
                T[i] = pos + 1;
        }
 
        int sp = 0;
        int kp = 0;
        while(sp < S.size())
        {
                while(kp != -1 && (kp == K.size() || K[kp] != S[sp])) kp = T[kp];
                kp++;
                sp++;
                if(kp == K.size()) matches.push_back(sp - K.size());
        }
 
        return matches;
}
@werlay:
1
2
3
4
5
1. Try to compile your code first. Fix the compile time errors.

2. Once it compiles, run it and test it if it works.

3. If it does not, modify the code and go back to step 1.


1
2
3
4
5
6
7
8
9
10
11
bool nextfive( string & file_arr, size_t pos )
{
	if( temp_file_arr.size() < 12 ) return false ; // *** won't compile
        // *** there is also a logical error

	string first_five = file_arr.substr(pos,5); // fine

	string next_five = file_arr.substr(pos+1,5); // *** logical error
	if (first_five ==next_five)  return true ;
    else return false ;
}
Last edited on
Actually, this is how I solved it (for .txt files).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#include <fstream>
#include <iomanip>
#include <string>
#include <vector>

using namespace std;

// Knuth-Morris-Pratt Algorithm -- Nothing to be explained. Google it.
vector<unsigned> KMP(string S, string K){
	vector<unsigned> T(K.size() + 1, -1);
	vector<unsigned> matches;

	if (K.size() == 0){
		matches.push_back(0);
		return matches;
	}
	for (unsigned i = 1; i <= K.size(); i++){
		int pos = T[i - 1];
		while (pos != -1 && K[pos] != K[i - 1])
			pos = T[pos];
		T[i] = pos + 1;
	}

	unsigned sp = 0;
	unsigned kp = 0;
	while (sp < S.size()){
		while (kp != -1 && (kp == K.size() || K[kp] != S[sp]))
			kp = T[kp];
		kp++;
		sp++;
		if (kp == K.size())
			matches.push_back(sp - K.size());
	}
	
	return matches;
}

int main(int argc, char *argv[]){
	fstream inFILE_S("text.txt", ios::in); // Input text file
	fstream inFILE_K("search.txt", ios::in); // Search string -- word/phrase to be searched
	fstream outFILE("KMP.txt", ios::out); // Output -- the positions where the word given in the file "search" was found in the file "text"

	string S;  // The string holding the whole text
	string Aux; // The string which will read line by line the text
	string K;  // The word/phrase to be found

	getline(inFILE_K,K); // Read the word/phrase to be found
	while (inFILE_S.good()){ 
		getline(inFILE_S,Aux); // Read each line of the text
		S.append(Aux); // Add it to the string
	}	

	vector<unsigned> matches(KMP(S,K)); // Call of function KMP. Initialise a vector with all positions found.
	vector<unsigned>::iterator it;
	unsigned counter = 0;
        // Printing out the data
	for (it = matches.begin(); it != matches.end(); ++it){
		counter++;
		outFILE << "Occurrence number " << setw(3) << counter << ": " << "From position " << setw(4) << *it <<
			" to position " << setw(4) << *it + K.size() << " in the given text.\n";
	}

	return 0;
}


Works like a charm. I hope it helps you.

PS: It might be a little different from your task, but I did this one to prove the power of the Knuth-Morris-Pratt pattern matching algorithm. This algorithm saved me from hours of coding not only once.

PS 2: Using the std::string::append allows you to put the whole file into one single string. Only after that can you actually write the pattern matching code.

EDIT: The file "text.txt" is your "file_name.dat", the "search.txt" holds the word which you are looking for, while the "KMP.txt" holds the result, all the positions of the found matches.

EDIT 2: I hope I was clear, but if not, I will explain each line in detail.

EXAMPLE:

text.txt
There’s a family at our school from the Ukraine. Each morning, the mom walks her five kids to our school, drops off the two oldest children at the flagpole and then walks back home with the three youngest. But before she leaves, she swings past my classroom to check on Alex. She looks through the window, catches his eye, and smiles. Then she waves to me and repeats the same procedure outside her other son’s room. She wants to make sure they made it safely into their classrooms. Later, when school’s over, she waits for her two oldest kids at the flagpole, and she smiles at me when she sees Alex. And I smile back.

The Ukrainian mom does not sign permission slips for her sons to go on field trips. She’s not comfortable with the idea of letting them leave the school, so she usually keeps them home on those days.

Last week, while I was collecting permission slips for an upcoming field trip, Alex asked to spend the day in his older brother’s classroom so that he wouldn’t have to stay home. I spoke with the other teacher to make the arrangements and we talked briefly about the family. We agreed that the Ukrainian mom was “over-protective.”

That’s right. We derisively called this wonderful mom “over-protective.”

This one got to me more than the others. Maybe it was the proximity to Christmas. Or maybe it was the age of the victims.

Or maybe this time we have to face the fact that we’re entirely unable to protect our most innocent and our most vulnerable from our most evil. And their weapons.

Like you, I’m supposed to go back to school tomorrow and talk to my students. I’m supposed to make them feel safe. I’m sure I’ll think of something. And we’ll get through the day, and then the week and then the year.

But I’ll tell you this: I have no idea what to say to the over-protective Ukrainian mom when I see her at the flagpole.

I’m not even sure I’ll be able to look her in the eye.


search.txt
the


KMP.txt
Occurrence number 1: From position 36 to position 39 in the given text.
Occurrence number 2: From position 63 to position 66 in the given text.
Occurrence number 3: From position 116 to position 119 in the given text.
Occurrence number 4: From position 143 to position 146 in the given text.
Occurrence number 5: From position 160 to position 163 in the given text.
Occurrence number 6: From position 186 to position 189 in the given text.
Occurrence number 7: From position 294 to position 297 in the given text.
Occurrence number 8: From position 368 to position 371 in the given text.
Occurrence number 9: From position 400 to position 403 in the given text.
Occurrence number 10: From position 440 to position 443 in the given text.
Occurrence number 11: From position 465 to position 468 in the given text.
Occurrence number 12: From position 547 to position 550 in the given text.
Occurrence number 13: From position 730 to position 733 in the given text.
Occurrence number 14: From position 750 to position 753 in the given text.
Occurrence number 15: From position 761 to position 764 in the given text.
Occurrence number 16: From position 794 to position 797 in the given text.
Occurrence number 17: From position 917 to position 920 in the given text.
Occurrence number 18: From position 941 to position 944 in the given text.
Occurrence number 19: From position 1010 to position 1013 in the given text.
Occurrence number 20: From position 1015 to position 1018 in the given text.
Occurrence number 21: From position 1036 to position 1039 in the given text.
Occurrence number 22: From position 1081 to position 1084 in the given text.
Occurrence number 23: From position 1108 to position 1111 in the given text.
Occurrence number 24: From position 1249 to position 1252 in the given text.
Occurrence number 25: From position 1254 to position 1257 in the given text.
Occurrence number 26: From position 1274 to position 1277 in the given text.
Occurrence number 27: From position 1318 to position 1321 in the given text.
Occurrence number 28: From position 1329 to position 1332 in the given text.
Occurrence number 29: From position 1376 to position 1379 in the given text.
Occurrence number 30: From position 1489 to position 1492 in the given text.
Occurrence number 31: From position 1602 to position 1605 in the given text.
Occurrence number 32: From position 1674 to position 1677 in the given text.
Occurrence number 33: From position 1687 to position 1690 in the given text.
Occurrence number 34: From position 1692 to position 1695 in the given text.
Occurrence number 35: From position 1705 to position 1708 in the given text.
Occurrence number 36: From position 1710 to position 1713 in the given text.
Occurrence number 37: From position 1773 to position 1776 in the given text.
Occurrence number 38: From position 1825 to position 1828 in the given text.
Occurrence number 39: From position 1884 to position 1887 in the given text.


Though, the KMP algorithm is a case-sensitive pattern matching algorithm, so whatever word/phrase you enter, you have to pay attention to upper or lower case letters.

~ Raul ~
Last edited on
jumper007,

There are no comments in your code.
nothing to explain what string K and S are,
what goes in the text files,
how to make the program work or what it's doing.
Check it now. It's edited with explanations.

~ Raul ~

PS: I really don't understand why some people try to "re-invent the wheel", when there are already fully optimized algorithms for specific things (KMP being one of them), but whatever, it's everyone's choice whether to use an algorithm or their own code.

PS 2: The only reason for which I wrote the code was that no one actually solved the problem from what I saw, and even if someone did, it was not the optimal solution to it (I just checked if any of the posts had a solution using KMP, and since they didn't, it is not optimal).

PS 3: I guess @werlay can close the topic.

If you need a full explanation (on KMP), write me in PM and I'll gladly answer you.
Last edited on
@JLB
1
2
if( temp_file_arr.size() < 12 ) return false ; // *** won't compile
        // *** there is also a logical error 


: can i do without this part?

string next_five = file_arr.substr(pos+1,5); // *** logical error


i couldnt find the logical error here.
bool five_chars_are_repeated( const std::string& str, std::size_t pos ) ;

Take a sheet of blank paper and a pencil. Now, walk through your logic, writing down what is happening.

For example:

> can i do without this part?

Let str be "abcd12345yz" (11 chars) and pos be 16.
Is a check for string length required?

Let str be "abcd1234512345z" (15 chars) and pos be 4.
What should the check for string length be?

etc.


> i couldnt find the logical error here

Let str be "abcd1234512345xyz", and pos be 4
Which are the two substrings to be compared?
At what position does the second substring start? Is it pos+1 (4+1)?

etc.
Pages: 12345