Is *ptr++ = tolower(*ptr); well defined?

A guy in the office thinks that the subject code is well defined. I'm pretty sure that it's undefined behavior because the post-increment could occur before or after evaluating ptr on the right hand side of the equation. Can anyone point me to a definitive answer?
The expression is well-defined. tolower() is passed the dereference of the original value of the pointer.
Side-effects in an expression only cause undefined behavior if you have multiple side-effects affecting the same datum in the same expression.
[searches work's codebase to make sure that abomination is not in mine]


...are you sure helios? I'm not pretending to know more, but my g++ compiler says it "may" be undefined. Not sure what clang++ would say.
D:\code\cplusplus231847>g++ main.cpp -Wpedantic -Wall
main.cpp: In function 'int main()':
main.cpp:11:9: warning: operation on 'ptr' may be undefined [-Wsequence-point]
*ptr++ = tolower(*ptr);
~~~^~


Maybe I'm just setting it up wrong? False positive?
(Note it still happens if I do -std=c++17 as well, gnu++14 is the default in GCC 7.1.0)

1
2
3
4
5
6
7
8
9
10
11
12
#include <iostream>
#include <cstdlib>

int main()
{
    char s[] = "DEF";
    char* ptr = s;
    
    *ptr++ = tolower(*ptr);
    
    std::cout << s << std::endl;
}
dEF


If we remove the tolower, isn't this basically the same as doing:
*(ptr++) = *(ptr); (redundant parenthesis for emphasis)

I don't understand how that's more defined than
ptr++ = ptr;

(I tend to completely avoid situations like this, in case you couldn't tell. Why risk being "fancy"? It's not worth it. [This comment is directed at OP's guy in the office])

______________________________________

Edit:

http://rextester.com/l/cpp_online_compiler_clang

clang++ -Wall -std=c++14 -stdlib=libc++ -O2 -o a.out source_file.cpp

gives
source_file.cpp:11:9: warning: unsequenced modification and access to 'ptr' [-Wunsequenced]
    *ptr++ = tolower(*ptr);
        ^             ~~~
1 warning generated.
dEF


However, this online compiler does not have C++17. I don't know C++17 well enough to know if it makes the behavior implementation-defined as opposed to undefined.
Last edited on
If we remove the tolower, isn't this basically the same as doing:
*(ptr++) = *(ptr); (redundant parenthesis for emphasis)

I don't understand how that's more defined than
ptr++ = ptr;
In both cases you have two side effects: one from ++ and the other from =.
In the former case ++ is applied to the pointer and = is applied to the value pointed to by the pointer.
In the latter case ++ is still applied to the pointer, but = is now applied to that same pointer.

1
2
3
4
5
int x[2] = {0};
x[0] = x[0]; // Case 1: defined
x[0]++ = x[0]; // Case 2: undefined
int i = 0;
x[i++] = x[i]; // Is this closer to case 1 or case 2? 
Last edited on
If I had to guess, I'd say line 5 is undefined (but if you are correct, it is actually well-defined).

But if we take that example, I guess what I still don't understand is: Aren't there two possibilities? Since they're both within the same sequence point, isn't the "rule" that you don't know if the i++ operation will be done before or after the i read (and therefore it's undefined pre-C++17, implementation-defined post-C++17)?

1
2
i = 0;
x[i++] = x[i]


Outcome 1:
If i++ is evaluated first,
i = 0;
i++; → i = 1, returns 0, then i @ x[i] evaluated, equivalent code: x[0] = x[1];

Outcome 2:
If i++ is evaluated last,
i = 0;
i @ x[i] evaluated, then i++; → i = 1, returns 0, equivalent code: x[0] = x[0];

Hopefully that example makes sense... could you explain at which point I misunderstand?
"i @ x[i]" meaning "referring to the i that's inside x[i], as opposed to the i in i++"

(also sorry dhayden if it seems like I'm taking over the topic, but I think it is still on-topic :p)

________________________________

Or, in other words, can you explain why
x[i++] = x[i];
is always equivalent to
x[i] = x[i+1]; (left-hand-side i++ evaluated first)

and can't be
x[i] = x[i]; (right-hand side i evaluated first)

________________________________

Edit 2:
I guess the answer lies in this paragraph?
http://en.cppreference.com/w/cpp/language/eval_order
The side effect (modification of the left argument) of the built-in assignment operator [...] is sequenced after the value computation (but not the side effects) of both left and right arguments, ...
but that still doesn't nail it down, since the i++ is a side effect of the left argument, no?

tl;dr this doesn't make sense to me, so I'm just going to avoid it like the plague.
Last edited on
According to intro.execution, paragraph 15 of the standard,
If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, and they are not potentially concurrent, the behavior is undefined.
Unless I'm misunderstanding what "value computation" means, this would seem to imply that an expression such as i + i++ is undefined (and so would OP's expression), but oddly enough the examples following this paragraph only use double side effects on the same scalar:
1
2
3
4
5
6
7
void f(int, int);
void g(int i, int* v) {
    i = v[i++]; // the behavior is undefined
    i = 7, i++, i++; // i becomes 9
    i = i++ + 1; // the behavior is undefined
    i = i + 1; // the value of i is incremented
    f(i = -1, i = -1); // the behavior is undefined 

If anyone has a better interpretation of this paragraph, feel free to provide it.
> Is *ptr++ = tolower(*ptr); well defined?

Yes.

The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand; their result is an lvalue referring to the left operand. The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. The right operand is sequenced before the left operand. With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation.
http://eel.is/c++draft/expr.ass


(Note: The highlighted sentence was added in C++17, which is what makes it well defined)
Even in pre-C++17 it is well-formed, because you are only assigning it value once, and the assignment must occur after the full expression is sequenced.

Keep in mind that

i = v[i++]

is not well-defined (pre-C++17) because there are two side-effects trying to modify i in the same expression, whereas:

*ptr++ = f(*ptr)

is well-defined because there is only one modifying side effect on ptr: the post-increment. The other side effect modifies the object referenced by the pointer, and this modification:

 • pre-C++17: is sequenced before the post-evaluation modification of ptr
 • C++17 and later: is sequenced after the evaluation of the left-side of the assignment operation.

Other examples provided by helios

1
2
3
4
5
6
i = 7, i++, i++; // WD because the , is a "sequence point"
i = i++ + 1;  // pre-C++17: UB because the ++ and = are not relatively sequenced
             // C++17 and later: WD because rhs is sequenced before lhs, so ++ comes first
i = i + 1  // WD because there is only one modifying operation
f(i = -1, i = -1)  // UB because ',' here is not the comma-operator, and function argument evaluation is not relatively sequenced.
// (Though I'd be shocked if any compiler failed to assign -1 to i.) 

Hope this helps.

Last edited on
If ptr is not an object of a user-defined type (ie. if it is a raw pointer),
*ptr++ = tolower(*ptr); engendered undefined behaviour prior to C++17.
(unsequenced modification and access to a scalar.)
> x[0]++ = x[0]; // Case 2: undefined
no, that's well defined.
it's a compilation error.


> *ptr++ = f(*ptr)
> is well-defined because there is only one modifying side effect on ptr: the post-increment
if left side evaluates first, *ptr = f(*(ptr+1))
if right side evaluates first, *ptr = f(*ptr)
only since c++17 there is a defined order of evaluation
Duthomhas:
The other side effect modifies the object referenced by the pointer, and this modification:

• pre-C++17: is sequenced before the post-evaluation modification of ptr
But when is the ptr read sequenced relative to the post-increment side effect? Is the post-increment side effect defined to be executed after all the other operations of the expression (e.g. function calls) have been executed? This was my understanding, which was why I said the behavior of OP's expression is undefined, but if so, where is this defined?
FWIW:
Even in C++17, both g++ and clang++ continue to generate undefined behaviour warnings for constructs which are no longer undefined in C++17.

info g++8 states:
-Wsequence-point
...
The C++17 standard will define the order of evaluation of operands in more cases: in particular it requires that the right-hand side of an assignment be evaluated before the left-hand side, so the above examples are no longer undefined. But this warning will still warn about them, to help people avoid writing code that is undefined in C and earlier revisions of C++.

info clang-devel is silent about this (for -Wunsequenced); but it seems to mimic the GNU behaviour

1
2
3
4
5
6
7
8
9
#include <cctype>

int main()
{
    char cstr[] = "hello" ;
    char* ptr = cstr ;

    *ptr++ = std::tolower(*ptr) ;
}


> alias c++
clang++-devel -std=c++17 -stdlib=libc++ -Wall -Wextra -pedantic-errors -g
> c++ --version | grep clang && c++ -c main.cpp
clang version 7.0.0
main.cpp:8:9: warning: unsequenced modification and access to 'ptr' [-Wunsequenced]
    *ptr++ = std::tolower(*ptr) ;
        ^                  ~~~
1 warning generated.
>
> alias g++
g++8 -std=c++17 -Wall -Wextra -pedantic-errors -g -Wl,-rpath=/usr/local/lib/gcc8
> g++ --version | grep g++ && g++ -c main.cpp
g++8 (FreeBSD Ports Collection) 8.0.1 20180211 (experimental)
main.cpp: In function 'int main()':
main.cpp:8:9: warning: operation on 'ptr' may be undefined [-Wsequence-point]
     *ptr++ = std::tolower(*ptr) ;
      ~~~^~
I stand corrected.
closed account (E0p9LyTq)
FWIW:
Even in C++17, both g++ and clang++ continue to generate undefined behaviour warnings for constructs which are no longer undefined in C++17.

With VS2017 and language standard set to C++17/latest your first code snippet compiles without undefined behavior warnings.

VS2017 warns about line 6 "warning C4244: '=': conversion from 'int' to 'char', possible loss of data" without stopping compiling and linking. /W4 needs to be set, /W3 (VS2017 default) or lower no warnings at all.

/Wall shows a non-fatal problem with <cctype>:
C:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\ucrt\ctype.h(175): warning C4514: '_ischartype_l': unreferenced inline function has been removed
Thanks everyone, especially JLBorges who, as usual, provided a definitive answer.

It seems that some people were confused about the issue. The question is whether the post-increment of the pointer is guaranteed to occur after the pointer is evaluated on the right hand side or could it occur before. It boils down to whether the right hand side is sequenced before the left:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <iostream>

int main()
{
    char arr[2] {'A', 'B'};
    char *ptr = arr;
    *ptr++ = tolower(*ptr);

    // Right side evaluated first:
    {
        char ch = tolower(*ptr);// evaluate right side
        char *oldPtr = ptr;     // evaluate left side
        ptr = ptr+1;            // do side effect of left side
        *oldPtr = ch;           // do assignment: arr[0] = tolower(arr[0])
    }

    // Left side evaluated first:
    {
        char *oldPtr = ptr;     // evaluate left side
        ptr = ptr+1;            // do side effect of left side
        char ch = tolower(*ptr);// evaluate right side
        *oldPtr = ch;           // do assignment: arr[0] = tolower(arr[1])
    }
}


JLBorges showed that before C++17, the order (and thus the behavior) was undefined. With C++17, the right side is sequenced first.
The Microsoft compiler (Visual Studio 2017) is really bad in this respect; it does not diagnose this kind of undefined behaviour at all (try it with -std:c++14).

We are in bad shape: of the three mainstream compilers, the two compilers which know how to diagnose these kinds of errors (GNU and LLVM) emit spurious warnings in C++17; and the third compiler (Microsoft) does not know how to diagnose these at all.
> The question is whether the post-increment of the pointer is guaranteed to occur after the
> pointer is evaluated on the right hand side or could it occur before.

For a raw pointer, the behaviour (prior to C++17) is undefined: the execution of unsequenced evaluations can overlap (this possible overlap is what engenders undefined behaviour).

For a user-defined type (where overloaded operators are function calls), the behaviour (prior to C++17) is unspecified: the execution of two indeterminately sequenced evaluations cannot overlap or interleave, though either one could be executed first.
Topic archived. No new replies allowed.