Performance when using TCHAR

Pages: 12
I have started to move over to using Unicode, wide character null-terminated strings in my Windows programmes. Accordingly I set the 'Use Unicode Character Set' Visual C++ compiler option. It is my understanding that once you do that the many macros which determine whether you transparently call ...A() or ...W() API functions automatically shift over to calling the wide character variants. As this is a compiler directive, all the choices are made and hardcoded in to the resultant executable at compile/link-time BEFORE it is ever run. Therefore using for example the macro OpenFilename() in the source code instead of specifically calling OpenFilenameW() has no impact on run-time performance.

The next logical step, instead of explicitly using wchar_t is to declare null-terminated string character arrays as TCHAR. Then, so long as I also employ the tcn... variants of CRT string functions and call TEXT() or _T() macros to create string literals the preprocessor will chose, again transparently whether to create an executable using standard multibyte or unicode wide characters - and their associated functions - all determined by the 'Use Unicode Character Set' switch. That way I can cover both eventualities with the same source code.

So, with all that - I THINK!!! - properly under by belt, I am fairly sure that using TCHAR and its friends will not effect run-time performance at all. However, in his otherwise excellent article http://www.codeproject.com/Articles/76252/What-are-TCHAR-WCHAR-LPSTR-LPWSTR-LPCTSTR-etc the author makes it sound as if using Unicode EXPLICITLY through wchar_t, ...W() API functions and tcn.. CRT calls is faster than the TCHAR alternative.

At the end of the day my question is - have I got the right end of the stick; TCHAR makes no difference to executable performance?
I've ranted about TCHARs numerous times on this forum.

The next logical step, instead of explicitly using wchar_t is to declare null-terminated string character arrays as TCHAR.


Explicitly using wchar_t, as far as I'm concerned, should be the final step. Moving toward TCHAR is not an improvement.

So, with all that - I THINK!!! - properly under by belt, I am fairly sure that using TCHAR and its friends will not effect run-time performance at all


You are correct about everything you've said so far, including this statement.

TCHARs are just a typedef... and the TCHAR functions/structs are just macros. There is 0 runtime penalty for using them.

TCHAR makes no difference to executable performance?


Yes. You are correct.


But TCHARs are retarded and I strongly recommend against using them. See this post for a list of reasons why:

http://www.cplusplus.com/forum/windows/105027/#msg566904
VERY interesting posts Disch!

It sounds like you would recommend not even using the crutch of the function-name macros and for example not use CreateWindowEx() even with Use Unicode Character Set made and always call CreateWindowExW() or the other ...W() functions directly?
That's what I do.
You could make your own TCharToWideChar() if you really wanted to. Just saying.
You could. Or you could just use wide chars for everything.

Also, TCharToWideChar would be inefficient if TCHAR=wide char. Unnecessary string copy and all that.
Last edited on
Although your original question has been answered, unless I missed this detail, no one mentioned that all this magic has been performed by the pre-processor. So the compiler itself never sees TCHARs. It will only ever see char or wchar_t depending on whether UNICODE and _UNICODE is defined or not.

Let me state what works for me on this issue. Rather than going the way of Visual Studio, I explicitely place the following #defines at the top of my source code *.cpp files...

1
2
3
4
5
6
#ifndef  UNICODE            
#define  UNICODE           
#endif                      
#ifndef _UNICODE     
#define  _UNICODE
#endif 



That is, unless I specifically want an ansi build. But here we'll assume we want wide character.

Then, whenever you need a null terminated character string literal, simply preface it with an 'L', i.e., L"Hello, World!".

Next, simply use the macroed functions, i.e., CreateWindowEx() instead of CreateWindowExW(), etc.

Now lets analyze what I've just done, and compare it to the alternatives. First, because the character set to use is hardcoded right into the source, there is no possibility of someone taking your source and trying to create an ansi build with it. It won't compile. Is this a limitation? At this point I think not. Maybe 14 years ago when both Win 2000 and Win 95/98 machines were in use. But not now.

Second, once those defines are in place, the only extra typing/coding difficultiy you will encounter is the need to preface your strings with that 'L' symbol. Certainly not as bad as typing _T("") or even worse TEXT("").

Third, you won't have to remember which Win Api functions need the 'W' suffix. The #defines will pick those out for you. This is a biggie. It isn't always obvious. For example, a quick look at DefWindowProc() indicates that it shouldn't need a wide character version, because none of the parameters are obvious strings. However, it does. The LPARAM can be a pointer to a character string, so it needs a 'W' version.

The above is what works for me. I've been struggling with this issue since Windows 95 times, and have tried everything imaginable. What I have described above is what I have found to be easiest. Once those #defines are in place, all you need to do is preface null terminated character literals with that 'L'. Of course, you'll need to use wchar_t and wchar_t* in place of chars. But if you need them, you can still use chars. Just don't try feeding them to functions expecting wchar_ts!
freddie makes a fantastic point. I agree 100%
To add to what's already been said, the API natively uses Unicode. Ascii calls are translated to and from Unicode in user-space. So, it ought to be marginally faster to use the native Unicode calls.
One other issue that bears on this is string processing performance. In the tests I've run, ansi builds are faster due to (I'm guessing) buffers only needing to be half as big. For example, if you have 20 million text characters needing to be processed in some way, that translates to 20 million chars or 40 million wchar_ts. In the tests I've run the wide character versions of the code took approx twice as long. Your mileage may vary. This is a different issue from what kbw mentioned above though.

However, if I had an app that had to do a lot of time critical text processing, I'd strongly consider doing the visual front end with wchar_ts, and use chars in the processing code.
In the tests I've run, ansi builds are faster
Yeah, but does that test user space or kernel space code?

Yeah, but does that test user space or kernel space code?


I would say in whatever space functions from the C Standard Library and functions from the C++ Standard Library run in. Don’t know if that answers your question. The dichotomy between those two areas is not one in which I usually think.

But I can provide some interesting ready made test programs to test string manipulations char verses wchar_t. But let me first mention the context in which these programs were developed. About five years ago a new member came into the PowerBASIC programming community, and the reason he acquired and tested that compiler was that he had seen claims that it was ‘as fast as C’. But in his tests it was proving to him to be much, much slower than C. He threw this out into the PowerBAIC user forums and also provided his test programs, which were rather small simple manipulations of mostly integers. But to make a long story short, MS Visual C++ was beating PowerBASIC by a huge margin – like something in the order of ten times to a hundred times faster. To see what was going on one of the PowerBASIC assembler gurus disassembled the compiled programs of both compilers, i.e., the PowerBAIC created one and the MS Visual C++ one, to see what was going on, and why the PowerBASIC compiler was doing so poorly compared to the C compiler. What was found out was astonishing, to say the least!

The original program was apparently something compiler writers use to test optimazation schemes, and could be easily factored apart and rewritten in much simpler terms that eliminated loops involving millions of iterations. That’s why the Microsoft C code was beating the PowerBASIC code; it was rewriting the original source in more efficient code and compiling that! The PowerBASIC binary on the other hand simply reflected what the original coder wrote.

This caused kind of an uproar in the PowerBASIC Community. One segment was saying “Hey! Give up! We’ve been beaten by a better compiler! Another segment was crying “Faul! That’s not fair! Its not the same code!”

This went around for awhile, and finally a member who is into asm and writing fast code and such suggested that perhaps a more ledgitimate test program could be produced that would be more of a real world test of applications running in the wild as compared to something a compiler writer schemed up. So he suggested this little string test, which he felt would be complicated enough that crafty compiler optimazation code wouldn’t know what to do with it other than to compile it as is …

1) Create a 15 MB string of nulls
2) Change every 7th null to a "P"
3) Replace every "P" with a "PU" (hehehe)
4) Replace every null with an "8"
5) Put in a carriage return & line feed every 90 characters

Note item #3 above where the ‘hehehe’ is at. It was John Gleason who came up with that, and he is an expert asm guy who knows how to torture hardware! And item #5 also grows the string.

At the time nobody took up John’s challenge, but I made a mental note of it because it interested me. About a year or two later I was doing work on my own C++ String Class which I regularly use instead of the one in the C++ Standard Library, and I worked quite a few different implementations of this algorithm in PowerBASIC, C and C++, and in C++ I compared my String Class against the one in the Standard Library. I’ll not get into the String Class end of it here, but I’ll just provide some compilable code showing the difference between ansi and wide character runs of the algorithm described above. The only thing significantly different is that I only used 2 MB strings instead of 15 MB ones.

The first implementation of the algorithm above wasn’t written by me but by Vijayan over in the Daniweb C++ community. However I modified it quite a bit and made a wide character version of it. Anyway, here is the ansi version using std::string. All these programs output the last 4000 characters to a Message Box upon completion, and in the Message Box Title state how many ‘ticks’ it took…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include "Windows.h"
#include "stdio.h"
#include <string>
#include <algorithm>
typedef std::string::size_type size_type;
typedef std::string::iterator iterator;

int main()
{
 enum {K=7, N=2000000, N1=N+N/7, BLOCKSZ=90, M=N1+2*N1/BLOCKSZ, TAIL=4000};
 const char* crnl="\r\n" ;
 int tick=GetTickCount();
 char szBuffer[64],szTmp[16];

 std::string str( N, ' ' );                 //create a string containing N ' '
 for(unsigned i = K-1 ; i < N ; i += K )    //replace every Kth ' ' with a 'P'
     str[i] = 'P';
 std::string temp ;                         //replace every 'P' with a 'PU'
 temp.reserve(N1) ;                         //we could do this in quadratic time by:
 iterator i = str.begin() + K ;             //for(size_type i = K ; i < N1 ; i += K+1 ) str.insert( str.begin()+i, 'U' ) ;
 const iterator end = str.end() - K ;       //however, by using some temporary memory, we can do it in linear time by:
 for(  ; i < end ; i += K )
 {
      temp.insert( temp.end(), i, i+K ) ;
      temp += 'U' ;
 }
 temp.insert( temp.end(), i, str.end() ) ;  //copy the tail fragment and finally modify the original str with a single assignment
 str = temp ;                               //this is presumably what PowerBASIC would have done for: Replace "P" With "PU" In s
 std::replace_copy(str.begin(), str.end(), str.begin(), ' ', '8' );   // replace every ' ' with an '8'
 std::string dest;
 dest.reserve(M);
 iterator j = str.begin();                  //copy blocks of BLOCKSZ chars to dest, appending a cr-nl to each copied block
 const iterator last = str.end() - BLOCKSZ ;
 for(; j < last; j += BLOCKSZ)
 {
     dest.insert( dest.end(), j, j+BLOCKSZ );
     dest += crnl ;
 }
 dest.insert( dest.end(), j, str.end());    // copy what is left
 std::string s;
 s=dest.substr(dest.size()-TAIL);
 strcpy(szBuffer,"Here Is Your String John In ");
 tick=GetTickCount()-tick;
 sprintf(szTmp,"%u",tick);
 strcat(szBuffer,szTmp);
 strcat(szBuffer, " Ticks John!");
 MessageBox(NULL,s.c_str(),szBuffer,MB_OK);

 return 0;
}


continued...

Last edited on
And here is my std::wstring version of that …


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
include "Windows.h"
#include "stdio.h"
#include <string>
#include <algorithm>
typedef std::wstring::size_type size_type;
typedef std::wstring::iterator iterator;

int main()
{
 enum {K=7, N=2000000, N1=N+N/7, BLOCKSZ=90, M=N1+2*N1/BLOCKSZ, TAIL=4000};
 const wchar_t* crnl=L"\r\n" ;
 int tick=GetTickCount();
 wchar_t szBuffer[64],szTmp[16];

 std::wstring str( N, L' ' );                 //create a string containing N L' '
 for(unsigned i = K-1 ; i < N ; i += K )      //replace every Kth L' ' with a L'P'
     str[i] = L'P';
 std::wstring temp ;                          //replace every L'P' with a L'PU'
 temp.reserve(N1) ;                           //we could do this in quadratic time by:
 iterator i = str.begin() + K ;               //for(size_type i = K ; i < N1 ; i += K+1 ) str.insert( str.begin()+i, 'U' ) ;
 const iterator end = str.end() - K ;         //however, by using some temporary memory, we can do it in linear time by:
 for(  ; i < end ; i += K )
 {
      temp.insert( temp.end(), i, i+K ) ;
      temp += L'U' ;
 }
 temp.insert( temp.end(), i, str.end() ) ;    //copy the tail fragment and finally modify the original str with a single assignment
 str = temp ;                                 //this is presumably what PowerBASIC would have done for: Replace "P" With "PU" In s
 std::replace_copy(str.begin(), str.end(), str.begin(), L' ', L'8' );   // replace every ' ' with an '8'
 std::wstring dest;
 dest.reserve(M);
 iterator j = str.begin();                    //copy blocks of BLOCKSZ chars to dest, appending a cr-nl to each copied block
 const iterator last = str.end() - BLOCKSZ ;
 for(; j < last; j += BLOCKSZ)
 {
     dest.insert( dest.end(), j, j+BLOCKSZ );
     dest += crnl ;
 }
 dest.insert( dest.end(), j, str.end());      // copy what is left
 std::wstring s;
 s=dest.substr(dest.size()-TAIL);
 wcscpy(szBuffer,L"Here Is Your String John In ");
 tick=GetTickCount() - tick;
 swprintf(szTmp,L"%u", tick);
 wcscat(szBuffer,szTmp);
 wcscat(szBuffer, L" Ticks");
 MessageBoxW(NULL,s.c_str(),szBuffer,MB_OK);

 return 0;
}


I just tested these two versions on a very old Windows laptop from around 2005 and came up with these numbers. This is using tdm - GCC 4.4.1…

1
2
3
4
                        GetTickCount()     Program Size
======================================================
Ansi                     112.6                   46 KB
Unicode                  131.2                   60 KB


Those numbers are the average of five runs, as are all my results I will show. So the wide character runs are taking somewhat longer than the ansi runs.

I was not satisfied with these speeds and wondered what I could do by droping down to C low level memory buffer manipulations by eliminating the C++ Standard Library String Class. So my next program shows that, and in keeping with the theme of the original poster’s question, it uses TCHARs and the TCHAR macros. So by commenting in/out the UNICODE and _UNICODE #defines at top, you can produce ansi/wide character builds…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
//#define  UNICODE
//#define  _UNICODE
#include <Windows.h>  //for MessageBox(), GetTickCount() and GlobalAlloc()
#include <tchar.h>
#include <String.h>   //for strncpy(), strcpy(), strcat(), etc.
#include <cstdio>     //for sprintf()

enum                                              // Exercise
{                                                 // =======================================
 NUMBER         = 2000000,                        // 1)Create a 2MB string of dashes
 LINE_LENGTH    = 90,                             // 2)Change every 7th dash to a "P"
 NUM_PS         = NUMBER/7+1,                     // 3)replace every "P" with a "PU" (hehehe)
 PU_EXT_LENGTH  = NUMBER+NUM_PS,                  // 4)replace every dash with an "8"
 NUM_FULL_LINES = PU_EXT_LENGTH/LINE_LENGTH,      // 5)Put in a CrLf every 90 characters
 MAX_MEM        = PU_EXT_LENGTH+NUM_FULL_LINES*2  // 6)Output last 4K to Message Box
};

int __stdcall WinMain(HINSTANCE hInstance, HINSTANCE hPrevIns, LPSTR lpszArg, int nCmdShow)
{
 TCHAR szMsg[64],szTmp[16];             //for message box
 int i=0,iCtr=0,j;                      //iterators/counters
 TCHAR* s1=NULL;                        //pointers to null terminated
 TCHAR* s2=NULL;                        //character array bufers

 DWORD tick=GetTickCount();                           //Get Initial Tick Count Number
 s1=(TCHAR*)GlobalAlloc(GPTR,MAX_MEM*sizeof(TCHAR));  //Allocate two buffers big enough to
 s2=(TCHAR*)GlobalAlloc(GPTR,MAX_MEM*sizeof(TCHAR));  //hold the original NUMBER of chars
                                                      //plus substitution of PUs for Ps and
 for(i=0; i<NUMBER; i++)                              //CrLfs after each LINE_LENGTH chunk.
     s1[i]=_T('-');
                                      // 1) Create 2MB string of dashes putting a 'P' every
 for(i=0; i<NUMBER; i++, iCtr++)      //    seventh char;
 {

     if(iCtr==7)
     {
        s1[i]=_T('P');
        iCtr=0;
     }
 }

 iCtr=0;                              // 3) Substitute 'PUs' for 'Ps'  This is
 for(i=0; i<NUMBER; i++)              //    tricky!  Note the buffer needs to
 {                                    //    grow.  See John's (hehehe) above!
     if(_tcsncmp(s1+i,_T("P"),1)==0)
     {
        _tcscpy(s2+iCtr,_T("PU"));
        iCtr+=2;
     }
     else
     {
        s2[iCtr]=s1[i];
        iCtr++;
     }
 }

 for(i=0; i<PU_EXT_LENGTH; i++)         // 4) Replace every '-' with an 8;
 {
     if(s2[i]==_T('-'))
        s2[i]=56;   //56 is '8'
 }

 i=0, j=0, iCtr=0;                      // 5)Put in a CrLf every 90 characters
 while(i<PU_EXT_LENGTH)
 {
    s1[j]=s2[i];
    i++, j++, iCtr++;
    if(iCtr==LINE_LENGTH)
    {
       s1[j]=13, j++;
       s1[j]=10, j++;
       iCtr=0;
    }
 }
 s1[j]=0, s2[0]=0;
 _tcsncpy(s2,&s1[j]-4001,4000);         // 6) Output last (right most) 4 K to
 s2[4000]=0;                            //    MessageBox().
 tick=GetTickCount()-tick;
 _tcscpy(szMsg,_T("Here's Your String John In "));   //Let me clue you in on something.
 _stprintf(szTmp,_T("%u"),(unsigned)tick);           //You'll get real tired of this
 _tcscat(szMsg,szTmp);                               //sprintf(), strcpy(), strcat()
 _tcscat(szMsg,_T(" ticks!"));                       //stuff real fast.  It'll wear you
 MessageBox(0,s2,szMsg,MB_OK);                       //right into the ground!
 GlobalFree(s1), GlobalFree(s2);

 return 0;
}


This cuts those last numbers down a good bit. Here are the results of five runs on my old laptop…

1
2
3
4
5

                          GetTickCount()     Program Size
======================================================
Ansi                       43.8                      7 KB
Unicode                    93.8                      8 KB
Last edited on
1
2
3
4
5
 strcpy(szBuffer,"Here Is Your String John In ");
 tick=GetTickCount()-tick;
 sprintf(szTmp,"%u",tick);
 strcat(szBuffer,szTmp);
 strcat(szBuffer, " Ticks John!");



Intermediate buffer copy and TWO strcats!?!? blasphemy!

 
sprintf( szBuffer, "Here Is Your String John In %u Ticks John!", tick );
Note with this C Standard Library code the ansi version is running twice as fast as the wide version. Finally, I developed an exact PowerBASIC duplicate of the above program, and it came in with essentially the same timings. In other words, it beat the C++ Standard Library String Class code and matched the C Library code. When I posted that on the PowerBASIC forums another assembler low level guru, Paul Dixon, redid it by adding some more optimizations that I hadn’t thought of. These caused it to beat my C code somewhat. Here is his original ansi version …

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
#Dim              All
%GPTR             = 64
%NUMBER           = 2000000   'Number of characters to fool around with in buffers
%LINE_LENGTH      = 90        'We'll eventually create lines out of what's in the buffers
%RIGHT_BLOCK      = 4000      'of this length to output to a message box the RIGHT_BLOCK amount
Declare Function GlobalAlloc  Stdcall  Lib "kernel32.dll"  Alias "GlobalAlloc"  (Byval wFlags As DWord, Byval dwBytes As DWord) As Long
Declare Function GlobalFree   Stdcall  Lib "kernel32.dll"  Alias "GlobalFree"   (Byval hMem As DWord) As Long
Declare Function GetTickCount Stdcall  Lib "kernel32.dll" Alias "GetTickCount"  () As DWord
Declare Function QueryPerformanceFrequency Lib "KERNEL32.DLL" Alias "QueryPerformanceFrequency" (lpFrequency As Quad) As Long
Declare Function QueryPerformanceCounter  Lib "KERNEL32.DLL" Alias "QueryPerformanceCounter" (lpPerformanceCount As Quad) As Long

Function PBMain() As Long
  Local freq, count0, count1, q As Quad
  Local NUM_FULL_LINES As Long
  Local PU_EXT_LENGTH As Long
  Local pAsciz As Asciiz Ptr
  Local s1,s2 As Byte Ptr
  Local MAX_MEM As Long
  Local NUM_PS As Long
  Register t As Long
  Register r As Long
  Register k As Long
  Local w As Word

  QueryPerformanceFrequency freq                      ' Get timer frequency.
  NUM_PS         = %NUMBER / 7 + 1                    ' Think 'Number of Ps'
  PU_EXT_LENGTH  = %NUMBER + NUM_PS                   ' How big will the buffer grow to after adding 'U' after every 'P'?
  NUM_FULL_LINES = PU_EXT_LENGTH / %LINE_LENGTH       ' How many lines after inserting CrLf every LINE_LENGTH chars?
  MAX_MEM        = PU_EXT_LENGTH + NUM_FULL_LINES * 2 ' So what will the final required buffer size be???
  QueryPerformanceCounter count0                      ' Read the timer at the start of the test
  s1=GlobalAlloc(%GPTR,MAX_MEM)
  If s1 Then
     s2=GlobalAlloc(%GPTR,MAX_MEM)
     If s2 Then
        'fill string with "-" 8 bytes at a time
        For r = s1 TO (s1 + %NUMBER) -8 Step 8
            Poke Quad, r,&h2d2d2d2d2d2d2d2d  '&h2d = 45 = "-"
        Next
        'and fill in any surplus bytes left if not a multiple of 8
        For r = r TO (s1 + %NUMBER-1)
            Poke Byte, r,&h2d  '&h2d = 45
        Next
        'replace every 7th character with "P"
        For r = s1 + 6 TO s1 + %NUMBER -1 Step 7
            Poke Byte, r,80
        Next
        'replace every "P" with "PU"
        t=s2
        For r = s1 TO s1 + %NUMBER -1
            k = Peek(Byte,r)
            If k = 80 Then
                Poke Word,t,&h5550   '&h5550 = "PU"
                t += 2
            Else
                Poke Byte,t,k
                t += 1
            End If
        Next
        'replace every "-" with "8"
        For r = s2 TO s2 + PU_EXT_LENGTH -1
            t = Peek(Byte,r)
            If t = 45 Then
               Poke Byte, r,56
            End If
        Next
        'add $CRLF after every 90 characters
        t = s1
        For k = s2 TO s2 + PU_EXT_LENGTH -91 Step 90
            'copy 88 characters
            For r = k TO k+80 Step 8
                q = Peek(Quad,r)
                Poke Quad,t,q
                t+=8
            Next
            'plus left over 2 characters and the CRLF
            w = Peek(Word,r)
            Poke Word,t,w
            Poke Word, t+2,&h0a0d
            t+=4
        Next
        For r = k TO s2 + PU_EXT_LENGTH -1
            'copy leftover characters
            Local l As Byte
            l= Peek(Byte,r)
            Poke Byte,t,l 'Peek(Byte,r)
            t+=1
        Next
        Poke Byte, t,0
        pAsciz = t - 4001
        QueryPerformanceCounter count1   'read the timer at the End of the test
        MsgBox @pAsciz, %MB_OK, "Here's Your String John In " & FORMAT$(1000*(count1-count0)/freq,"######0.000") & " Milli-Seconds!"
        GlobalFree(s2)
     End If
     GlobalFree(s1)
  End If

  PBMain=0
End Function


Here are the timings on these…
1
2
3
4
5

                          GetTickCount()   Program Size
======================================================
Ansi                       43.1                   13 KB
Unicode                    53.2                   10 KB

But I haven’t provided the wide character version of the above, so here is that..

continued...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
#Compile Exe
#Dim All
%UNICODE            = 1
#If %Def(%UNICODE)
    Macro ZStr      = WStringz    'This is exactly how C/C++ programmers handle the ansi/unicode
    Macro BStr      = WString     'issue.  They have a macro called TCHAR that reduces to a single
    %SIZEOF_CHAR    = 2           'byte char data type if UNICODE isn't defined and wchar_t if it
#Else
    Macro ZStr      = Asciiz      'is defined.  wchar_t is a 'typedef' of an unsigned short int in
    Macro BStr      = String      'C or C++, and that is a WORD or two byte sequence.  Just what
    %SIZEOF_CHAR    = 1           'unicode uses.
#EndIf
%GPTR               = 64
%NUMBER             = 2000000     'Number of characters to fool around with in buffers
%LINE_LENGTH        = 90          'We'll eventually create lines out of what's in the buffers
%RIGHT_BLOCK        = 4000        'of this length to output to a message box the RIGHT_BLOCK amount
Declare Function GlobalAlloc  Stdcall Lib "kernel32.dll"  Alias "GlobalAlloc"  (Byval wFlags As DWord, Byval dwBytes As DWord) As Long
Declare Function GlobalFree   Stdcall Lib "kernel32.dll"  Alias "GlobalFree"   (Byval hMem As DWord) As Long
Declare Function GetTickCount Stdcall Lib "kernel32.dll"  Alias "GetTickCount" () As DWord


Function PBMain() As Long
  Local      NUM_FULL_LINES As Long
  Local      PU_EXT_LENGTH  As Long
  Local      MAX_MEM        As Long
  Local      NUM_PS         As Long
  Local      s1             As Word Ptr
  Local      s2             As Word Ptr
  Register   t              As Long
  Register   i              As Long
  Register   j              As Long
  Local      q              As Quad
  Local      w              As DWord
  Local      fp             As Integer
  Local      pZStr          As ZStr Ptr
  Local      iTicks         As Long

  NUM_PS         = %NUMBER / 7 + 1                    ' Think 'Number of Ps'
  PU_EXT_LENGTH  = %NUMBER + NUM_PS                   ' How big will the buffer grow to after adding 'U' after every 'P'?
  NUM_FULL_LINES = PU_EXT_LENGTH / %LINE_LENGTH       ' How many lines after inserting CrLf every LINE_LENGTH chars?
  MAX_MEM        =PU_EXT_LENGTH  + NUM_FULL_LINES * 2
  iTicks = GetTickCount()
  s1=GlobalAlloc(%GPTR, MAX_MEM * %SIZEOF_CHAR)
  If s1 Then
     s2=GlobalAlloc(%GPTR,MAX_MEM * %SIZEOF_CHAR)
     If s2 Then
        For i = s1 To (s1 + %NUMBER * 2) - 8 Step 8        'fill string with "-" 8 bytes at a time
          Poke Quad, i, &H002d002d002d002d                 ' <<< Hexidecimal representation of four
        Next i                                             'unicode dashes.  A Quad int is 8 bytes.
        For i = i To (s1 + %NUMBER * 2 - 2)                'and fill in any surplus bytes left if not a multiple of 8
          Poke Word, i, &h002d  '&h2d = 45
        Next i
        For i = s1 + 12 To s1 + %NUMBER * 2 - 2 Step 14    'replace every 7th character with "P"
          Poke Word, i, &H0050                             '50 hex is 'P'; 80 decimal
        Next i
        t=s2
        For i = s1 TO s1 + %NUMBER * 2 - 2 Step 2
          j = Peek(Word, i)
          If j = 80 Then                                   ' 80 decimal is 'P'
             Poke Dword, t, &h00550050                     '&h5550 = "PU"
             t += 4
          Else
             Poke Word, t, j
             t += 2
          End If
        Next
        For i = s2 TO s2 + PU_EXT_LENGTH * 2 - 2 Step 2    'replace every "-" with "8"
          t = Peek(Word, i)
          If t = 45 Then
             Poke Word, i, &H0038
          End If
        Next i
        t = s1                                             'add $CRLF after every 90 characters
        For j = s2 To s2 + PU_EXT_LENGTH * 2 - 182 Step 180
          For i = j To j + 168 Step 8                      'copy 88 characters
            q = Peek(Quad, i)
            Poke Quad,t, q
            t+=8
          Next i
          w = Peek(DWord, i)                               'plus left over 2 (now 4) characters and the CRLF
          Poke DWord, t, w
          Poke DWord, t+4, &h000a000d
          t+=8
        Next j
        For i = j To s2 + PU_EXT_LENGTH * 2 - 2 Step 2     'copy leftover characters
          Local l As Word
          l= Peek(Word, i)
          Poke Word, t, l 'Peek(Byte,r)
          t+=2
        Next i
        pZStr = t - %RIGHT_BLOCK * %SIZEOF_CHAR
        iTicks=GetTickCount() - iTicks
        MsgBox @pZStr, %MB_OK, "Here's Your String John In " & Str$(iTicks) & " Ticks!"
        GlobalFree(s2)
     End If
     GlobalFree(s1)
  End If

  PBMain=0
End Function


So as you can see, in these tests of mine, wide character string processing code is taking somewhat longer, at least on the machines I’ve tested. Your mileage may vary.
Last edited on
The narrow version takes ~4.7 times as long than the wide version on my computer. The std::string narrow version takes 0.8 times as long as the wide version, probably because of the increased allocation.

Problems:
* You generally can't benchmark code with GetTickCount() based on just one run. The function has a time resolution of around 15 ms.
* The tests make the implicit assumption that an application can switch to ANSI encoding without losing anything, as far as the string processing routines are concerned. In reality this is almost never the case.
* These tests perform just one API call. Your average Windows application will pass strings to the API far more often than it will do heavy string processing.
Those tests aren't really hitting the Windows API, so you're really just comparing the cost of using 1 byte chars on fixed width strings with 2 byte chars on fixed width strings.

You're not hitting the conversion costs in user32.dll.
Disch said ...


Intermediate buffer copy and TWO strcats!?!? blasphemy!


Wow! That is horrible. What was I thinking? I stand corrected!

My father always said a poor excuse was better than none, so mine goes something along the lines of ... by the time I got to that point in the code after dealing with all those pointer manipulations, my mind was fried. Not enough horsepower left to spend much time figuring how to get a string in a message box title!
Helios said …


You generally can't benchmark code with GetTickCount() based on just one run. The function has a time resolution of around 15 ms.


I’m aware of that. When I first started working on this with high level C++ String Class code (as opposed to the quite fast low level implementations I just posted), my programs were taking several minutes to complete the job. Try this, but be forewarned you had better not hold your breath!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#include "windows.h"
#include <cstdio>
#include <string>
#define  NUMBER 2000000
using namespace std;


string& ReplaceAll(string& context, const string& from, const string& to)
{
 size_t lookHere=0;
 size_t foundHere;

 while((foundHere=context.find(from,lookHere)) != string::npos)
 {
  context.replace(foundHere, from.size(), to);
  lookHere=foundHere+to.size();
 }

 return context;
}


int main(void)
{
 unsigned t1=0,t2=0;
 char szBuffer[64];
 int iCount=0;
 string s2;

 t1=GetTickCount(), t2=t1;
 puts("Starting....");
 string s1(NUMBER,'-');
 t2=GetTickCount()-t2;
 printf("Done Creating String With %u Of These - :   milliseconds elapsed - %u\n",NUMBER,t2);
 t2=t1;
 for(int i=0; i<NUMBER; i++)
 {
     iCount++;
     if(iCount%7==0)
        s1[iCount-1]='P';
 }
 t2=GetTickCount()-t2;
 printf("Done Inserting 'P's In s1!                   :   milliseconds elapsed - %u\n",t2);
 t2=t1;
 ReplaceAll(s1,"P","PU");
 t2=GetTickCount()-t2;
 printf("Done Replacing 'P's With PU!                 :   milliseconds elapsed - %u\n",t2);
 t2=t1;
 ReplaceAll(s1,"-","8");
 t2=GetTickCount()-t2;
 printf("Done Replacing '-'s With 8!                  :   milliseconds elapsed - %u\n",t2);
 t2=t1;
 s2.reserve(2400000);
 puts("Now Going To Create Lines With CrLfs!");
 for(int i=0; i<NUMBER; i=i+90)
     s2+=s1.substr(i,90)+"\r\n";
 t2=GetTickCount()-t2;
 printf("Done Creating Lines!                         :   milliseconds elapsed - %u\n",t2);
 s1=s2.substr(s2.length()-4000,4000);
 t1=GetTickCount()-t1;
 sprintf(szBuffer,"Here Is Your String John In %d Ticks!",t1);
 MessageBox(NULL,s1.c_str(),szBuffer,MB_OK);
 getchar();

 return 0;
}


There’s printf calls in there to let you know its tick count progress as it runs through the algorithm. Here’s a typical output run on the old and slow laptop I used for the other runs posted…

1
2
3
4
5
6
7
Starting....
Done Creating String With 2000000 Of These - :   milliseconds elapsed - 0
Done Inserting 'P's In s1!                   :   milliseconds elapsed - 47
Done Replacing 'P's With PU!                 :   milliseconds elapsed - 272953
Done Replacing '-'s With 8!                  :   milliseconds elapsed - 273110
Now Going To Create Lines With CrLfs!
Done Creating Lines!                         :   milliseconds elapsed - 273141


So do you see now why 15-16 ticks timer resolution isn’t a problem!?! In case you haven’t looked, that code took over 273,000 ticks to run the algorithm!

I’m actually telling this story backwards. Back when John Gleason posted his little string test idea over in the PowerBASIC Forums he also posted a PowerBASIC implementation of the idea. It went like this …

Continued…
Pages: 12