strtok uses

First off, I visit this site quite regularly, but I've never posted. I find lots of help here and thought I'd return the favor. I recently worked on a project that required I use the strtok function. A problem I was running into was that the strtok was changing my original variable. I finally was able to fix my problem to get a successful copy to tokenize without changing the original.



A little background: This is set up as a function of a child class. The char variable is obviously declared elsewhere, but I showed it for the sake of clarity. Also, strcpy_s was used because I have VS, however strcpy works also (parameters would be different). Hopefully with the comments, the rest of the code is clear enough to be understood easily:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
char decimalNumber[] = "12.34";

int ChildClass::getNumberBeforeDecimal()
{
	char numBeforeDecimal[6]="";
	char* token; 

	strcpy_s(numBeforeDecimal,                   //copying decimalNumber
            sizeof(numBeforeDecimal),decimalNumber); //to numBeforeDecimal
                                                

	strtok(numBeforeDecimal, ".");
	token = strtok(NULL, "."); //should assign "12" to token
	
	return atoi(token);  //converts the token and returns 12
}


Now, I haven't tested this exact code (feel free to correct it if I made a mistake). I took the code that I originally had written (yes, it worked!) and tried to make it generic enough to be understood without screwing it up. Oh, and I know there's an easier way of getting numbers before a decimal. This is just for the purpose of helping understand one use of strtok. With little effort this could be used to return the numbers after the decimal.

Feel free to post other uses of strtok if you want.

Peace,
S. Jones
If you wanted to have a copy of it before you used strtok() you could've just done.

1
2
string copy = original;
strtok(original, ".");




I think that is exactly the part that trips most people up, and the reason for the OP's post. The documentation really should have an x-large, bold text in strobing read and orange that says:

strtok() changes your string!


This can be a particular problem if you are messing with const or const-reference data, and just tell the C++ compiler to shut-up about the argument type warning.
I think you would be better refactoring the code to use strtok_r instead of strtok as it is inherently thread-unsafe (strtok that is). For an example of why strtok_r would be better, consider this SSCCE:

(I changed use of strcpy_s to the more widely implemented strlcpy)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <stdio.h>
#include <strings.h>

int getNumberBeforeDecimal(char *decimalNumber)
{
	char numBeforeDecimal[6]="";
	char* token; 

	strlcpy(numBeforeDecimal,decimalNumber, sizeof(numBeforeDecimal));

	strtok(numBeforeDecimal, ".");
	token = strtok(NULL, ".");
	
	return atoi(token);
}

int main(int argc, char *argv[])
{
  char s[] = "14.23:23.41";
  
  char *tok = strtok(s,":");
  while(tok!=NULL) {
     int num = getNumberBeforeDecimal(tok);
     tok = strtok(NULL, ":");
     printf("pre-decimal: %d\n", num); 
  }
  
  return 0;
}


All going well, you would expect to see:

pre-decimal: 23
pre-decimal: 41

but instead you see

pre-decimal: 23

This is because the inner nested strtok call of your method overwrote the outer strtok cursor (strtok uses a static char * to keep a ref to the cursor).

A better solution would be:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <stdio.h>
#include <strings.h>

int getNumberBeforeDecimal(char *decimalNumber)
{
	char numBeforeDecimal[6]="";
	char *token, *p; 

	strlcpy(numBeforeDecimal,decimalNumber, sizeof(numBeforeDecimal));

	strtok_r(numBeforeDecimal, ".", &p);
	token = strtok_r(NULL, ".", &p);
	
	return atoi(token);
}

int main(int argc, char *argv[])
{
  char s[] = "14.23:23.41", *p;
  
  char *tok = strtok_r(s,":", &p);
  while(tok!=NULL) {
     int num = getNumberBeforeDecimal(tok);
     tok = strtok_r(NULL, ":", &p);
     printf("pre-decimal: %d\n", num); 
  }
  
  return 0;
}


Now, as expected, the output is:

[mackco00:/prv/src] $ ./testTok
pre-decimal: 23
pre-decimal: 41
Except that strtok_r() and strtok_s() and the like are MS-specific extensions --meaning that they are completely unportable.

Personally, I think that strtok() is a pathetic function to begin with --one that can easily be rewritten to be much more intelligent (and safe).

The strspn() and strcspn() are much more ideally suited to deal with this stuff.

Off the top of my head:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <stdlib.h>
#include <string.h>

typedef struct
  {
  const char* result;
  const char* source;
  const char* delimiters;
  }
  token_t;

token_t tokinit( const char* source, const char* delimiters )
  {
  token_t result;

  result.source     = source;
  result.delimiters = delimiters;
  result.result     = (const char*)malloc( strlen( source ) +1 );

  return result;
  }

const char* toknext( token_t* tok )
  {
  size_t n;

  if (!tok || !(tok->source) || !(tok->delimiters) || !(tok->result))
    return NULL;

  tok->source += strspn( tok->source, tok->delimiters );
  n = strcspn( tok->source, tok->delimiters );
  strncpy( (char*)(tok->result), tok->source, n );
  *((char*)(tok->result) +n) = '\0';
  tok->source += n;

  return tok->result;
  }

void tokend( token_t* tok )
  {
  if (tok)
    {
    if (tok->result)
      free( (void*)(tok->result) );
    tok->result     =
    tok->source     = 
    tok->delimiters = NULL;
    }
  }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <stdio.h>

int getNumberBeforeDecimal( const char* decimalNumber )
  {
  char numBeforeDecimal[ 6 ] = {'\0'};
  token_t tok;
  int result;

  strncpy( numBeforeDecimal, decimalNumber, sizeof( numBeforeDecimal ) -1 );

  tok = tokinit( numBeforeDecimal, "." );
  result = atoi( toknext( &tok ) );
  tokend( &tok );

  return result;
  }

int main()
  {
  char s[] = "14.23:23.41";

  token_t tok = tokinit( s, ":" );
  while (*toknext( &tok ))
    {
    int num = getNumberBeforeDecimal( tok.result );
    printf( "pre-decimal: %d, from \"%s\"\n", num, tok.result );
    }
  tokend( &tok );

  return 0;
  }


In C++, there is never any valid need to use strtok() --avoid it like the plague.

My $0.02.
It's much clearer to use regular expressions to stuff like this. Esp. in a larger context, like the real one you're using it, not just this simplified example.

You can get an excellent regular expression library from boost.org

#include <boost/regex>
using boost_regex;
const regex token("\\G(\\d+\\.\\d+)(:|$)");
regex_match m;
while ( regex_match(s, m, token) ) {

and so forth... There a few more details.

Admittedly, sometimes regular expressions are overkill.

It is my programming philosophy that if the input comes a source that is not 100% reliable (such as the user), then you should not make any assumptions about its structure or correctness. It's always a judgment call about how loosely you may interpret the input, but, for example, if someone enters a telephone number as "617-55512-12", don't just strip out the dashes, assume that he made an error and that there is a very good chance that the number itself is not what the user intended. How anal you get about this depends in large part on the consequences of bad input, in other words, are you writing a flight simulator or controlling the flight of real planes with real humans in them.
Topic archived. No new replies allowed.