Understanding std::memory_order_release and std::memory_order_acquire

Please consider the following example code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <atomic>
#include <thread>
#include <cassert>
std::atomic<int> data[5];
std::atomic<bool> sync1(false),sync2(false);
void thread_1()
{
    data[0].store(42,std::memory_order_relaxed);
    data[1].store(97,std::memory_order_relaxed);
    data[2].store(17,std::memory_order_relaxed);
    data[3].store(-141,std::memory_order_relaxed);
    data[4].store(2003,std::memory_order_relaxed);
    sync1.store(true,std::memory_order_release);
}
void thread_2()
{
    while(!sync1.load(std::memory_order_acquire));
    sync2.store(true,std::memory_order_release);
}
void thread_3()
{
    while(!sync2.load(std::memory_order_acquire));
    assert(data[0].load(std::memory_order_relaxed)==42);
    assert(data[1].load(std::memory_order_relaxed)==97);
    assert(data[2].load(std::memory_order_relaxed)==17);
    assert(data[3].load(std::memory_order_relaxed)==-141);
    assert(data[4].load(std::memory_order_relaxed)==2003);
}
int main()
{
    std::thread _1{thread_1};
    std::thread _2{thread_2};
    std::thread _3{thread_3};
    _1.join(); _2.join(); _3.join();

}


My question is this. I can see that the "store" operation is a release operation, thus when thread2 waits on it (because the release synchronizes-with the acquire operation), then all the data stores in thread1 are guaranteed to be visible side effects in thread2, and likewise in thread3 as well. However, what I wanted to clarify is this: can the order of the data store operations (marked with memory_order_relaxed) thus be reordered by the compiler? Meaning that, say data[2]'s store could theoretically happen before data[0]'s store?

My reasoning is that the release-acquire pairs guarantee synchronization and thus the data store operations have all been DONE after the acquire operation is complete, but since they are "memory_order_relaxed" and not "memory_order_seq_cst," they don't have to be in the exact program order (evaluation order) that I listed them in. Is my understanding correct?

What about if there were additional operations after sync1's store? Are those also not guaranteed to be ordered and can appear before the store operation as well?
Last edited on
can the order of the data store operations (marked with memory_order_relaxed) thus be reordered by the compiler?
I could be wrong, but I believe the compiler has no concept of order in this sense. The memory order constants are meaningless to it. It's the CPU, not the compiler, that reacts to those values. And yes, I believe the CPU is free to internally reorder writes prior to a memory barrier, regardless of what code the compiler generated. Obviously, this is only in relation to other threads; code within the same thread will behave as if the writes happened in the order specified by the machine code.
Last edited on
I read in some places it happened at compile time:
https://preshing.com/20120625/memory-ordering-at-compile-time/

Obviously, this is only in relation to other threads; code within the same thread will behave as if the writes happened in the order specified by the machine code.


Can you give me an example of what you mean by this? I'm trying to wrap my head around what you said here.
Last edited on
My previous post was exclusively in response to this question:
can the order of the data store operations (marked with memory_order_relaxed) thus be reordered by the compiler?
In other words, my response was "yes, not only may the compiler reorder those writes, the CPU may reorder them as well".
The write with std::memory_order_release is different because it follows a code path that contains a memory barrier, which the compiler does understand, and does honor.

Can you give me an example of what you mean by this? I'm trying to wrap my head around what you said here.
It's really nothing very complex. If the machine code in thread A contains
1
2
3
mem[0] = 42;
mem[1] = mem[0];
mem[2] = 77;
the code will behave as expected.
If you have two threads running in parallel such that their instructions are scheduled like this:
1
2
3
mem[0] = 42;     //Thread A
mem[1] = mem[0]; //Thread B
mem[2] = 77;     //Thread A 
the value of mem[1] will be uncertain, because the propagation of effects across threads is non-deterministic unless the code asks for determinism by using a memory barrier. For example, thread B might mistakenly believe that the value of mem[0] that's in its core-specific cache is up-to-date, even though thread has already committed the value to RAM, or thread A might have delayed sending the value for a few cycles.
Last edited on
Topic archived. No new replies allowed.