The best way to match and replace

I have a large table of words need to process.
Since it needs to handle massive call. I want this process can run as fast as possible.

What I've tried:
declaring
const char* WORDS[10000] = {new const char{12, 23 ...};
and match the document one by one from 0 to end.
so if a document has size of 20000.
It loop each index and start matching from WORDS from 0 to 10000.
Then it loop to match if each char is equal.

That would made a 20000*10000*sizeof(WORDS[m]) loop .

I tested and it is very slow(approximately takes about several secs)


So I wonder, if I make it like this:
1
2
3
4
5
6
7
8
9
10
11
void match_p(char const* str, int start, char const* &out, int &out_size)
{
	// matching each using if statement
	if(str[start] == 65 && str[start + 1] == 65)
	{
		out = "AAAA";
		out_size = 4;
	}
	else if(...
	// I could use some macro to generate this code
}


Would this be the fastest way to do so?

PS: Since I am still a beginner, I didn't do much approach since I am getting so many syntax error while coding. I decided to ask for the best way to do so than I can start digging in to the code, thanks:)
Last edited on
What do you actually need? It isn't clear from your post. And you probably really should use C++ containers instead of char arrays.
Emm... Let say I have a 2 word lists:

List 1:
tager, robbit, cot, dag, shep
List 2:
tiger, rabbit, cat, dog, sheep

And an whatever input string argv[1];

Say that the input string somehow may contains words in List 1.
I want to match and replace the words in list 1 to list 2.

What is the fastest way to do so? ( Fastest processing speed )
I would use a std::map<std::string, std::string> to match values in list 1 to values in list 2. As you read in your input, try to find each word in the map. If the find is successful, output the value from the map. If the find is unsuccessful, output the value from the input.
Fastest way is to use std::unordered_map<std::string, std::string>
In following example I assume, that you have a file named "dictionary.txt" with structure similar to:
tager tiger
robbit rabbit
//etc...
And you need to output words from argument list to cout replacing those which exists in list 1 by corresponding from list 2:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <fstream>
#include <iostream>
#include <string>
#include <unordered_map>

int main(int argc, char* argv[])
{
    std::unordered_map<std::string, std::string> dictionary; //

    /*Reads information from file to dictionary.
     *Probably most costly part of program on large files
     */
    std::ifstream input("dictionary.txt");
    std::string key, value;
    while (input >> key >> value) {
        dictionary[key] = value;
    }

    int count(0);
    while(++count < argc) { //loop all arguments aside from program name
        std::string temp(argv[count]); //Create a std::string from c-string;
        if(dictionary.find(temp) != dictionary.end())
            temp = dictionary.at(temp); //Replace word if found in dictionary
        std::cout << temp << ' '; //Output it
    }
}
Note, that this example uses C++11 features (fast unordered_map for example)
If you do not have access to C++11 compiler (which I reccomend you to have), you can replace unordered_map with map for cost of speed.

With words from example and input:
tager hello world cot while robbit
Output will be:
tiger hello world cat while rabbit
Last edited on
Thanks, I will make it this way:)

What if the input doesn't have space characters?
looping argc won't work for that:\
Last edited on
If character exist in user preferred system encoding and both input string and file saved in that encoding, it will. Otherwise no. I have Russian language set, so Cyrillic letters works fine, but some symbols from German do not. After I switched encoding for non-unicode application to German, those letters start to work right, but Cyrillic ones stops.
If there will be somebody with experience of working with Unicode in standard library, he might help you more.
You can simply have the first array sorted and apply to it either std::lower_bound or std::equal range.
Topic archived. No new replies allowed.