The superfast string array finding - Follow up!!!!!!!!!

Ok. Hi!
I got an idea. It's superfast finding character method.
Now this idea has come true.
...I'd like to hear some comments and opinions please... :)

Features :
1 - No need to get length, compare every character through the string. It will give the result instantly (lighting speed) , and with perfect accuracy.
2 - Highest Optimization, tweaking and many more...
3 - Gives the best privacy for everyone, safely comparing.
4 - Save times, memory.
5 - No complex, memory leak.

Targets, Usage :
Manages a very large of database (string, characters) (PERFORMANCE)
At any time, if you don't want to keep string data, delete it, only keep the output values for further comparisons. (CONVENIENT)
Eg : You have a dictionary text file and you loaded it. If the users search words many times through your program and with your some additional features, that might cause slow. You only load and parse once and then delete it immediately, save memory and times. Faster than 1000x !!! (SPEED, MEMORY)

But simple programs please use the popular one, it's better.

Requirements :
You only need integer variables and a function which is used to parse string.

How to use :
I found an algorithm on the internet that is used to encrypt data into a hex value.

1
2
3
4
5
///////////////////////////////////////////////////////////////////////////////////////////
/*//*/unsigned int GetEncryptCode(const char *Txt){register unsigned long Encrypt = 0;int len = strlen(Txt);if(!len)return 0;Encrypt ^= 0xFFFFFFFF;
/*//*/register int size = 0;while (++size < len)
/*//*/Encrypt=(Encrypt >> 8) ^ crc32_table[(Encrypt ^ (unsigned char)Txt[size]) & 0xFF];Encrypt ^= 0xFFFFFFFF;return Encrypt;}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// 


Then, how to use it?

Please look at this Linear Search example (Only name finding, no word descriptions) :

Original :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
char Word[65535][256]; //Suppose it contains all words of a dictionary
int nWordTotal;
char *strFind = "zoo";

bool find (char *string){
for(int i = 0; i < nWordTotal;i++)
if(!stricmp(Word[i], string))return true;return false;}

int main(){
//Load words into "Word" variable
//Try a search
find(strFind);//It's horrible! It will waste your time !!!
return 0;
}


The program trick :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
unsigned int nWord[65535] //Suppose it contains all words of a dictionary
int nWordTotal;
char *strFind = "zoo";

bool find (char *string){
unsigned int nCheck = GetEncryptCode(string);

for(int i = 0; i < nWordTotal;i++)
if(nCheck == nWord[i])return true;return false;}


int main(){
//Load words (Example)
char temp[256];
for(int i = 0;;i++){
temp = //Source
Word[i] = GetEncryptCode(temp);
nWordTotal++;}

//Try a search
find(strFind);//It's right !!!
return 0;}


Now, remark two examples :
Memory consuming :
- Original : 65535 * 256 = 16.776.960 bytes (16 Mb)
- Optimized : 65535 * 4 = 262.140 bytes (Less than 64x !!!!)

Convenience :
- Original : Loads directly and instantly....
- Optimized : Parses input strings once.

Speed :
- Original : Checks through a function, (maybe) very slow.
- Optimized : Checks between variables and variables, and only uses a single comparison , so it's so much faster!!!!

THEN FINALLY, WHO'LL WIN?
The superfast string array finding win !!!!!!!!!!!!!!!!!!!


How to Figure out :
To get an encrypted value of a string, use :

GetEncryptCode(char*)
Attach your string to (char*) slot. Every different string will produce a different hex value, Eg :
0xAF92FC85,0x475CEE12
Ah.. I'm very tired (11 h). (??) I will continue providing guidelines information...



What do you think? It's very faster than the popular one. Maybe many further programs will use a lot of string searching in many types, so it's very proper choice.

What do you think? Feedback please!!!
Last edited on
Yes, it's certainly a good idea. You discovered the hash function:

http://en.wikipedia.org/wiki/Hash_function

It's not bad for a beginner

the problem is just to guarantee a unique value for each unique string.
the problem is just to guarantee a unique value for each unique string.


If you manage this, the universe will implode. :)
See: "pigeonhole principle".
You can get clashes but it depends on the length of your key and how evenly the hash codes are spread over that space.

To index protein sequences in a database I used a 27 character base 64 SHA-1 hash code. This gives 64^27 = 5.84600654 x 10^48 distinct hash keys which should be evenly spread over the space.

http://bioinformatics.anl.gov/seguid/overview.aspx

I have no worries about clashing keys here

Topic archived. No new replies allowed.