Checking for null characters in the middle of a character sequence

I'm having some problems with null characters being in the middle of a character sequence, resulting in the rest of the characters being ignored. I have tried the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
char *StringRemoveNull(char *str)
{	
	int nullcount = 0;
	char *retch = new char[MAX_BYTES];

	for (int i=0;i<MAX_BYTES;i++)
	{
		int npos = i-nullcount;

		if (str[i] != NULL)
			retch[npos] = str[i];
		else if (str[i+1] == NULL)  //Replace with a check to see if the next element is out of range?
		{
			retch[npos] = '\0';
			break;
		}
		else
			nullcount++;
	}

	return retch;
}


However, if there are 2 or more null characters next to each other in the array it causes issues - I need a way to check if a null character has any other characters after it so that I know whether to remove it or not.

Thanks.
Last edited on
I'm sorry...but I didn't really understand what you are asking. You want to parse a string containing more than one NULL characters...that i get. But into what? Do you want to transform something like:

 
"abc\0def\0ghi"


into:

 
"abcdefghi"
Yes, but I need to keep the null character at the end. The problem is that I don't know where 'the end' is without asking for a parameter containing the actual length of the sequence.
Last edited on
Just post your code and we'll help you fix the problem from occuring in the first place.

your question:
There is no way to do this by searching for \0.

You can eaither post the code that generates this and we'll help you fix it or, if you didnt screw this yourself than you can only write a somehow smart code to parse the string and determine which part of it are you interested (given that you use strings for sentances and you havent written over a string)


Last edited on
The premise behind a C-style char* string is that it cannot contain nulls -- since a null terminates the character sequence.

If you want to have them, then you must have additional constraints on the data structure.

The C++ STL string class does it by keeping track of how many characters there are in the string. Hence, the string doesn't have to end with a null -- the length of the string is part of the structure. This is the most robust way to do it.

An older way to do it (mostly for environment tables) is to consider it a sequence of null-terminated character strings, where the sequence itself is terminated by an empty string. For example:

"NAME=Johnny Five\0AGE=7\0"

Notice that that sequence ends with two consecutive nulls. Notice also that this means that you cannot have more than one consecutive null character in your sequence.

Finally, a C-style way to do it would to be to introduce a 'quote' character. Just as:

"Hello \"world\""

uses the backslash (\) to 'quote' the double-quote (") characters (among other things), you can do the same. For example, you can use '\1' as the 'quote' character (to indicate that the next character carries no special meaning), where "\1\1" resolves to a single '\1' and "\1\0" resolves to a single '\0', etc, and just "\0" (with no preceding '\1') is the end-of-sequence character.
This method works for any string, but requires special handling to account for the quoting.

Hope this helps.
Thanks Duoas. Anyway, i'll make my scenario a little bit clearer.

I am writing an application to receive packets, check the data they contain, call any function related to that data, and then forward them to the target destination. At the moment i'm using the 'strstr' function to check for certain data within the packet, however it's only checking the data up until the first null terminator (which in this case is '\x00').
Since you are working with binary data, you should avoid using string functions. Unfortunately, there is no equivalent of strstr() for binary memory, but it is easy enough to construct one:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <string.h>

/* ----------------------------------------------------------------------------
 * memmem()
 * ----------------------------------------------------------------------------
 * function
 *   Like strstr(), but for binary data -- seeks to find 'tofind' in 'source'.
 *
 * returns
 *   A pointer to the matching data.
 *   NULL if not found.
 */
const void* memmem(
  const void* source, size_t sourcesize,
  const void* tofind, size_t tofindsize
  ) {
  int         value;
  const void* found;

  /* If either 'source' or 'tofind' are missing or blank then we are done */
  if (!source || !tofind || !sourcesize || !tofindsize) return source;

  value = *(const unsigned char*)tofind;

  /* While we can find potential matches */
  while ((found = memchr( source, value, sourcesize )))
    {
    /* Keep track of what amount remains of the source */
    sourcesize -= (const unsigned char*)found - (const unsigned char*)source;

    /* If there isn't enough of the source left to match, then return 'not found' */
    if (sourcesize < tofindsize) return NULL;

    /* Did we find an exact match? */
    if (memcmp( found, tofind, tofindsize ) == 0) return found;

    /* Ready to find the next potential match */
    source = (const unsigned char*)found + 1;
    sourcesize--;
    }

  /* No match found */
  return NULL;
  }

Untested!

Hope this helps.
I was just about to post in here saying that I found a working solution, but thanks Duoas - if I have problems with my fix then i'll use that method.

I solved the problem by using the return value of the 'recvfrom' function to determine the size of the incoming data. With this I create a separate char* and copy all the characters that aren't null from the original incoming data into that.
Last edited on
Topic archived. No new replies allowed.