How does find() work internally?


Functions such as find(), how do they work internally.
Do they compare exactly every character in a str/char array until they have the right sequence? (brute force) Or how does it work?

Can it be done in any other way, or is this the fastest way to search for a string / sequence in a char array?


Best regards
Volang
The C++ standard makes a few restictions on std::find, which you can read about here:

https://en.cppreference.com/w/cpp/algorithm/find

You'll also find there an example conforming implementation. However, every provider of std::find can do it their own way, so if you want to know how your version is doing it, you'll need to follow the source code through to that function and read it.
Are you asking about std::string::find (rather than std::find)?
https://en.cppreference.com/w/cpp/string/basic_string/find

Conceptually, std::string::find is similar to std::search
https://en.cppreference.com/w/cpp/algorithm/search
a string is an unordered array.
basic theory is going to tell you that to find an item in an unordered array, you have to iterate and look for it one by one. However you slice it up, you still have to look at it iteratively.

consider an array of char:
"the quick brown fox jumped over a lazy dog"
find the first instance of 'x'.
its not sorted, so binary search is out. its not hashed. there isn't any information to exploit... its a chunk of text that could be anything, could start with x, x could be last, anywhere in the middle, or not even there at all. Lets consider that 'not even there at all'... do you know a way to figure that out without touching each letter? It can't be done. So the worst case of 'not even there' is at least O(n).

the only way you can make this faster is, if the string is very, very large, you can multi-thread it and search N smaller strings. And, if you are looking for a substring and not a single letter, you can overlap the smaller strings by the size of the search-target to ensure you don't cut it in the middle of the target. Creating a thread costs a bit of time: you have OS level requests that need to be made and acknowledged. If the strings are small, threading this up will take longer than just looping once.

for specific programs and special snowflake code, you may be able to write a faster search by assisting the search with some kind of ordering or pseudo-ordering that you do upon data acquisition. For example if you had a lookup table of the array locations of every instance of the letter 'x' because you knew you would need it later and built it.... you could search faster...
Last edited on
Do they compare exactly every character in a str/char array until they have the right sequence? (brute force) Or how does it work?

Can it be done in any other way, or is this the fastest way to search for a string / sequence in a char array?
std::string::find() works by linear search ("brute force", as you call it), because the standard gives various requirements that force the implementation of std::string to use a naive array of characters.
However, other forms of search are possible if the internal data structure is more sophisticated: https://en.wikipedia.org/wiki/Rope_(data_structure)
Last edited on
There are some hideously clever string matching algorithms. A big insight is to work from the back of the pattern string rather than the front. The wikipedia page is a nice place to start:
https://en.wikipedia.org/wiki/String-searching_algorithm
Thank you all for your replays. Very helpful and gave me some ideas :)

Good luck today!
The standard library makes it easy to use the Boyer–Moore and the Boyer–Moore–Horspool algorithms. I haven't tested them but I suspect that they are mostly useful when searching a lot of text. If you're only using it to search a short text once the extra preprocessing step is probably not worth it.

https://en.cppreference.com/w/cpp/utility/functional/boyer_moore_searcher
https://en.cppreference.com/w/cpp/utility/functional/boyer_moore_horspool_searcher
Last edited on
Topic archived. No new replies allowed.