strange behavior with pointer arithmetic

The first example I show below makes sense. We are defining an array of char pointers. Thus, the char * is the type of the array. As such, when we increment the index and reference the item that the next index, it will always be a char * pointer, which points to the first item in the string.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <stdio.h>

int main(void)
{
  char *s[] = {"a-z", "1-6", "A-Z", "5-5", NULL};
  int i = 0;

  while(s[i]){
    printf("the string is: %s\n", s[i]);
    ++i;
  }

  return 0;
}


But this next example is confusing. s is dereferenced. Since s is a char*, dereferencing removes one pointer at a time from the type and so we go from char* to char. Therefore, since ++expr and * are both unary operators with right-to-left associativity, we first use pointer arithmetic to increment the char* to the next item, and then we dereference it. The item is '-' and not the ascii character 'b', because we used pointer arithmetic on char* not on the actual character itself. Once we perform pointer arithmetic on char*, we derefence the value, which is '-'. Then we do this again with the next iteration of the while loop to get 'z'. But then what happens after that boggles my mind. We get some empty character and then we get '1'. But wait a minute that '1' is not in the char* at index 0 and the example below clearly shows that I am indexing position 0. What's going on?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <stdio.h>

int main(void)
{
  char *s[] = {"a-z", "1-6", "A-Z", "5-5", NULL};
  int i = 0;

  while(s[i]){
    printf("the character is: %c\n", *++s[0]);
    ++i;
  }

  return 0;
}

output:
the string is: -
the string is: z
the string is: 
the string is: 1





I think what you have here is undefined behaviour because you are trying to access out of bounds array in the last two iterations of the while loop.
When you were to open some kind of "memory" window to look inside to see how s looks like.
You would probably see (once you made your way from binary to hex to characters) this:
s = a-z\01-6\0A-Z\05-5\0

So first: how does in the first example your "printf(%s, s[i])" knows when it prints the first string when to stop? How does it know where one string ends and one begins?
=> \0 = "end of string"

What your second example is doing is that it takes advantage of the fact that your strings in your array are right next to each other.

Accessing s[0] means we access a char pointer to "a".
We increment that pointer and then it will point to "-".
(Consider "a" is at the adress "0xFF00" and we increment that => "0xFF01" => hey there is "-")
then we dereference it so we jump from "0xFF01" to '-' and print that.

Next we go from '-' to 'z' the same way.

Your "empty character" Is the '\0' it prints.
After that because the array is in memory the way I posed it above we arrive at '1'.

The condition of the while-loop is a bit strange but the way it is it stops now =)

That's all the magic ^.^
Last edited on
Imma make another post to add a comment related to this type of code:

Well, it is not undefined behaviour andbut what it is it is freakin hell.
To make it simple and short: Avoid using this type of code.
C++ is language at allows for this to happen which doesn't mean you should.
Java would've already given you the finger and told you to go away if you wanted to do something like this.

It's simply a type of code which often results in nasty errors, that take hours to find.
You gonna have an easy time to screw things up.

It's nice to know that this works and how it works since it shows you how good you actually understand what the hell is happening in your memory and how the hell pointer arithmetic works.
But don't really "use" it ^.^

Have a nice day

Edit: after some thinking I guess I might have made a slight misstake
Sorry abhishekm71 you were right
but I wasn't entirely wrong either
There are a few sentences to add:
You are lucky your strings are ordered this way "a-z\01-6\0A-Z\05-5\0". (It most certainly often happens to be this way but it's no rule it has to)
That's where undefined behaviour takes place. They could not be ordered like this instead be spread out somewhere isolated each. Then you would run into problems.
=> don't ever write this code ^.^

To the "out-of-bounds"-part => yeah accessing '1' is out of bounds, the '\0' is actually part of the string and should always be there
*urgh* hope I clarified
Last edited on
The first example I show below makes sense. We are defining an array of char pointers. Thus, the char * is the type of the array. As such, when we increment the index and reference the item that the next index, it will always be a char * pointer, which points to the first item in the string.

The type of the array is "array of 5 char*" or "char*[5]", it is not "char*". Your second sentence makes no sense logically. For a pointer to point to the first element in the array, it would be a pointer to pointer to char. In many cases we can treat the name of the array as a pointer to the first element, but there are cases where we cannot.
Thanks for responses. I have one more question about this. Notice the array ends with NULL. Is the sole purpose of that so the while loop knows when to exit? The while loop is not intelligent enough to exit when the array reaches its length?
There are other ways to terminate a loop. If you know the length of the array, then you can iterate only that many times.

However, I will strongly recommend you to start using std::string and std::vector in place of array of pointers.
Hey @johnmerlino

As guys have pointed, this is a memory leakage problem. That the pointer went out of range, it's dangerous of course. But what I want to say is that I'm very excited about this you know, for many security bugs lies within codes like this - some how the value of a particular variable was changed *on pupose* and the code just went out of range and boom! The hole system collapse part by part. Isn't that somehow cool? And the guy that achieve this must have a thorough understanding of algs and codes, so you might want to pay attention to this kind of *tiny* problem since it might become your weapon, or your weak spot. XD

As for the NULL stuff, I think it's some kind of a hack of code. You see that if you put a NULL at the end of array, you don't have to reference to the size of array like:
1
2
3
4
for (int i=0;i<array_size;++i)
{
// do something
}


instead, you can just put NULL as the terminate situation, this save several variables for you. But I don't like this hack, guess I prefer the old-school method, looks more natural. And by the way, since we are in C++, STLs like <string> and <vector> can be conbined to achieve the same goal, and in C++11 the improved for-loop is much more convenient to loop through an array.
Last edited on
i learned that pointers and arrays are basically treated the same by a computer. So basically *S[] is saying it is pointing to a address that is pointing to a address. Try take out the *. Try loop it to first output the A-Z then make that loop to run 3 more times. 1 for each group
Last edited on
Topic archived. No new replies allowed.