Concurrent multithreading sync issues

Forum

Forum
General C++ Programming
Concurrent multithreading sync issues

Concurrent multithreading sync issues

Okay everyone, I have a problem that is KILLING me. I'm reaching out to you guys in hopes that someone can point out the problem.

Long story short

Mutex locks are not forcing my variables to be physically written, resulting in a condition_variable failing to wake up, resulting in program deadlock.

Long story long

I'm writing an NES emulator which runs the CPU in the main thread, and the PPU (graphics) in a spinoff thread. Emulation runs a frame at a time. Once a frame completes, I need the PPU to stop and wait for the next frame to start.

This is accomplished with 2 variables.

1) an "EmulationOn" boolean to indicate if a frame is currently being emulated
2) a "timestamp" which is used to sync up the CPU/PPU. When the PPU finishes the frame, it gets set to a crazy high value. It is reset to 0 at frame start.

Bear with me... there's a lot of code but it's not that hard to see the problem. Overall emulation flow is below. All the 'SuperLog' stuff is dumped to a log file:

void EmulateOneFrame()
{
    mPpuSyncSystem->SuperLog( "N:  Start of frame...\n" );
    mClock.EmulationOn = true;
    mCpu.AdjustTimestamp( -mPpu.GetLastFrameLength() );
    mPpu.SetVideoOut(video);
    mPpu.NextFrame();     //   <- HERE, this resets the timestamp to zero

    mPpuSyncSystem->SuperLog( "<ON> = %d\n", mClock.EmulationOn ? 1 : 0 );

    // ... emulation is started here, once both CPU and PPU threads finish
    //     emulating the frame, the PPU thread waits and execution resumes
    //     here...
    
    mClock.EmulationOn = false;
    mPpuSyncSystem->SuperLog( "<OFF> = %d\n", mClock.EmulationOn ? 1 : 0 );
    mPpuSyncSystem->SuperLog( "N:  End of frame...\n" );
}

Some other relevent areas in the code:


// Called by above code at frame start:
void NesPpu::NextFrame()
{
    mClock->PpuTimestamp_ = 0;   //<-  HERE, notice, PpuTimestamp_ is zero'd
    mSync->SuperLog("P: Timestamp reset T=%d  &clk=%08X\n", (timestamp_t)mClock->PpuTimestamp_, mClock);
    // ... other stuff here to reset the PPU state in prep to begin a new frame
    mSync->WakeThis();  // <- calls notify_one on the condition_variable to
         // wake the PPU thread.  This much is working.
}

//  Used as the condition callback for the condition_variable which the PPU
//  waits for once it finishes a frame
bool NesPpu::TODORemoveMe()
{
    bool result = (mClock->EmulationOn && (mClock->PpuTimestamp_ != timestamp_never)) || mSync->WantToBeJoined();
    mSync->SuperLog( "P:  checking:  on=%d, ts=%d OUT=%d   &clk=%08X\n", mClock->EmulationOn ? 1 : 0, (timestamp_t)(mClock->PpuTimestamp_),  result ? 1 : 0, mClock );
    return result;
}

// .. when the PPU finishes a frame...
        mClock->PpuTimestamp_ = timestamp_never;  // timestamp_never==0x7FFFFFFF

        mSync->SuperLog( "P:  About to wait for frame to start...\n" );
        mSync->Wait( [&] () { return TODORemoveMe(); }, mMasterSync );
        mSync->SuperLog( "P:  Begin Frame emulation...\n" );

  // (ignore the mMasterSync thing, this effectively just calls
  //   condition_variable::wait )

So when running this... it works fine for a while, then deadlocks. Checking the log reveals the problem:


P:  About to wait for frame to start...                  <- PPU finished frame
P: Thread waiting...                                     <- PPU thread now waiting
C: Thread Waking...                                      <- CPU thread is poked (still waiting, this just means notify_one has been called and it should recheck its condition)
P:  checking:  on=1, ts=2147483647 OUT=0   &clk=023DF788 <- PPU thread condition is checked, 'OUT=0' so thread remains waiting
C: Thread Waking...                                      <- poke
P: Thread Waking...                                      <- poke
C: ...waiting complete                                   <- CPU thread now wakes up and resumes
<OFF> = 0                                                <- EmulationOn changed to false, to signal this frame is done
N:  End of frame...                                      <- Frame end
N:  Start of frame...                                    <- Next frame starts
P: Timestamp reset T=0  &clk=023DF788                    <- PPU resets timestamp  !!!!IMPORTANT notice timestamp=0
P: Thread Waking...                                      <- poke PPU thread
<ON> = 1                                                 <- EmulationOn is true, new frame is starting
(reg write)  2000=10                                     <- CPU needs to sync with PPU on reg writes
C: Thread waiting...                                     <- so CPU waits
P: Thread Waking...                                      <- PPU is poked
P: Thread Waking...                                      <-  ... twice
P:  checking:  on=1, ts=2147483647 OUT=0   &clk=023DF788 <- PPU's condition is checked !!!! NOTE ts != 0

The timestamp here is the zinger. Notice how it's clearly being set to zero at the start of the frame, but is still being read back as crazy-high when the condition variable is checking the status.

The program deadlocks here, as now both CPU and PPU threads are waiting, and neither one ever gets poked again.

Now I know that variables used to sync multiple threads need to be guarded. I also know that marking them as 'volatile' isn't enough, as you have to ensure memory is actually serialized in the right order and race conditions have to be avoided. As such, both EmulationOn and my timestamp variables are 'atomic'. std::atomic isn't supported by my compiler, so I wrote my own ghetto Atomic class which basically ensures all accesses are locked behind a mutex:

template <typename T>
class Atomic
{
public:
                Atomic(T x = T())           : v(x)  { }
                Atomic(const Atomic<T>& x)  : v(x)  { }

    operator T () const
    {
        T copy;
        {
            threadlib::lock_guard<threadlib::mutex> lk(m);
            copy = v;
        }
        return copy;
    }

    void operator = (const T& x)
    {
        threadlib::lock_guard<threadlib::mutex> lk(m);
        v = x;
    }

    void operator = (const Atomic<T>& x)
    {
        *this = static_cast<T>(x);
    }
    
    void operator += (const T& x)
    {
        threadlib::lock_guard<threadlib::mutex> lk(m);
        v += x;
    }

private:

    volatile T                  v;
    mutable threadlib::mutex    m;
};

Does anyone have any ideas why this is happening? I'm ripping my hair out over this.

I thought that a mutex lock combined with the actual variable being volatile would ensure reads/writes across threads would be in every way protected, but now I'm thinking maybe there's more that's needed?

I'm using boost for threads/mutexes/etc (my compiler doesn't have std::thread implemented yet)

Thanks.

rollie (304)

Can't say threads are my forte, but a couple questions:
1) after the first output "checking...", I don't ever see "P: Begin Frame Emulation" in the logs. Were some entries in the log file omitted for brevity?

2) it seems strange to see 2 pokes to the ppu in the last few entries of the log - is this expected?

3) I'd be interested in seeing the relevant methods of mSync and AdjustTimestamp; maybe a clue is in one of them. Also any lines of code for which we already see log entries (for example "Thread waiting", "waiting complete", etc). Additional log entries could help too, to compare/contrast with the portion that's failing.

The Atomic wrapper looks correct though.

coder777 (8439)

Disch wrote:
it gets set to a crazy high value

What do you mean? accidentally or deliberately?

That value you in the log is ts=2147483647 (= 7FFFFFFF = INT_MAX). That doesn't seem to be accidentally.

I don't see where Atomic is involved. I doubt somehow that atomic is always the best way to protect variables (copy only?). Considering that the most dangerous action is

1
2

if(x ...)
  change x

which must be usually protected

Disch (13742)

Thanks for the responses, guys.

@rollie:

1) No, that is the log as-is (well, the last few lines of it anyway). And that's exactly the problem. The "Begin Frame Emulation" line doesn't show because the PPU thread is still waiting.

2) I was paranoid and poking the thread more than is really needed because I wanted to avoid these deadlocks. While it isn't exactly necessary, it wouldn't be causing this problem.

3) Certainly:

AdjustTimestamp only modifies the CPU timestamp, not the PPU timestamp, so it's pretty irrelevent:

void NesCpu::AdjustTimestamp(timestamp_t adj)
{
    mClock->CpuTimestamp += adj;
}

The mSync->Wait() function:

/* member variables defined as:
    threadlib::mutex                    mMutex;
    threadlib::condition_variable       mCv;

    threadlib = boost in this case.  I gave it the alias so I could change it to std when
     std::thread is supported.
*/
void SyncSystemPreemptive::Wait(function<bool()> condition, SyncSystem* switchto)
{
    SuperLog( "%c: Thread waiting...\n", logid); // logid = 'C' for CPU, 'P' for PPU
    switchto->WakeThis();  // <- 'switchto' is handle to the thread we're switching to.
                      //  when the PPU waits, this pokes the CPU thread to make sure it wakes up
                      //  and vice versa
    if(!condition())
    {
        threadlib::unique_lock<threadlib::mutex> lk(mMutex);
        mCv.wait( lk, [&] () -> bool { switchto->WakeThis(); return condition(); } );
        // and because I'm paranoid, I also poke the other thread every time the condition is checked here
    }
    SuperLog( "%c: ...waiting complete\n", logid);
}

// The "poke" / WakeThis function:
void SyncSystemPreemptive::WakeThis()
{
    SuperLog( "%c: Thread Waking...\n", logid);
    mCv.notify_one();
}

A correct portion of the log (note I reran and overwrote my previous log, so the &clk pointer value is different -- I just put that in as a sanity check to make sure I was always checking the same PPU timestamp):

P:  About to wait for frame to start...                    <- ppu starts waiting
P: Thread waiting...
C: Thread Waking...
P:  checking:  on=1, ts=2147483647 OUT=0   &clk=0321FE90   <- condition checked, failed because timestamp is 'never'
C: Thread Waking...                                              'never' so ppu still asleep
P:  checking:  on=1, ts=2147483647 OUT=0   &clk=0321FE90   <- another check (looks like spurious wakeups, since they
P: Thread Waking...                                              don't really seem to be triggered
C: Thread Waking...
C: ...waiting complete
<OFF> = 0                                                  <- end of frame
N:  End of frame...
P:  checking:  on=1, ts=2147483647 OUT=0   &clk=0321FE90   <- ppu checked again, still fails
N:  Start of frame...                                      <- frame start
P: Timestamp reset T=0  &clk=0321FE90                      <- ppu timestamp reset (T=0) !!important!!
P: Thread Waking...
C: Thread Waking...
<ON> = 1                                                   <- EmulationOn set to true
P:  checking:  on=1, ts=0 OUT=1   &clk=0321FE90            <- ppu checked again, PASSES because on=1 and timestamp=0 !!CORRECT!!
P: ...waiting complete                                     <- ppu wakes up
P:  Begin Frame emulation...                               <- and starts a frame

coder777 wrote:
What do you mean? accidentally or deliberately?

Deliberately. The timestamp gets set to 0x7FFFFFFF when it finishes emulating a frame:

// .. when the PPU finishes a frame...
        mClock->PpuTimestamp_ = timestamp_never;  // timestamp_never==0x7FFFFFFF

        mSync->SuperLog( "P:  About to wait for frame to start...\n" );
        mSync->Wait( [&] () { return TODORemoveMe(); }, mMasterSync );
        mSync->SuperLog( "P:  Begin Frame emulation...\n" );

The point is, timestamp is being reset to zero, when the frame starts, but that isn't "sticking", and when the condition variable is poked, it still thinks the timestamp==0x7FFFFFFF even though is was zero'd.

I don't see where Atomic is involved

The timestamp is declared as atomic:

(*checks* OSHT it's not in this test version! oh well, this is how it's supposed to be: I'll explain myself afterwards:)

    typedef Atomic<bool>            atomicbool;
    typedef Atomic<timestamp_t>     atomictime;
    typedef Atomic<int>             atomicint;

//...
class Clock
{
public:
    Clock()
        : EmulationOn(false)
        , CpuTimestamp( 0 )
        , PpuTimestamp_( 0 )
        , NextNmi( timestamp_never )
    {
    }

    // Should threads be emulating?
    atomicbool              EmulationOn;

    // Interdependent system timestamps
    atomictime              CpuTimestamp;
    atomictime              PpuTimestamp_;

    // Upcoming important times
    atomictime              NextNmi;
};

As I mentioned, my test program (which is where these logs are coming from) did not have atomictime typedef'd that way and was not using the Atomic class, so that might be the problem here. However the MAIN version of my test program (without logging) does use Atomic and is still deadlocking, although it might not be deadlocking in the same place.

So I guess I need to fix the log version and rerun to see if the problem is what I think it is.

Bah. Sorry for the confusion everyone! I'll get back on this after work today and repost with a clarification of the problem. For now I guess you can disregard.

I doubt somehow that atomic is always the best way to protect variables (copy only?). Considering that the most dangerous action is

I've been doing a fair amount of reading on the subject, and it basically comes down to the way memory is pipelined on modern architectures. Even if you write to variables in a specific order... ie:

1
2

one = 1;
two = 2;

And even if the generated assembly writes to them in that order... they still may not be physically written in that order. Releasing a mutex [should] enforce that all writes are physically written. If you want to read on the subject, here is a link:

http://cbloomrants.blogspot.ca/2009/01/01-25-09-low-level-threading-junk-part.html

Disch (13742)

The same problem persists even after fixing the Atomic issue. So all information in above post is correct.

I wonder if I need to lock that cv mutex before waking the thread. That seems ridiculous, but I'm running out of ideas.

*tries*

EDIT:

Okay... it seems to be working now, but I'm a little baffled as to why it wasn't working before.

I made the following changes:

void SyncSystemPreemptive::Wait(function<bool()> condition, SyncSystem* switchto)
{
    SuperLog( "%c: Thread waiting...\n", logid);
    switchto->WakeThis();
    if(!condition())
    {
        threadlib::unique_lock<threadlib::mutex> lk(mMutex);
        mCv.wait( lk, [&] () -> bool { /*switchto->WakeThis();*/ return condition(); } );  // <- removed this
    }
    SuperLog( "%c: ...waiting complete\n", logid);
}

void SyncSystemPreemptive::WakeThis()
{
    SuperLog( "%c: Thread Waking...\n", logid);
    threadlib::lock_guard<threadlib::mutex> lk(mMutex); // <- added this
    mCv.notify_one();
}

It's a little frustrating, but at least it seems to be working now.

Thanks for the help everyone!

EDIT2:
Removing the lock_guard in WakeThis also seems to work. So I guess the additional call to WakeThis in the wait condition is what was causing the problem. That doesn't make any sense to me, but whatever.

Last edited on

Cubbi (4774)

I guess the additional call to WakeThis in the wait condition is what was causing the problem. That doesn't make any sense to me, but whatever.

Why was there a call to (eventually) notify_one() from within the wait condition at all? The wait condition is supposed to only check the condition, as quickly as possible, and go back to sleep if it was spurious.

a mutex lock combined with the actual variable being volatile would ensure reads/writes across threads

'volatile' is irrelevant, it has nothing to do with threads. Atomics are also unnecessary: the mutex establishes the necessary synchronization of all non-atomic memory accesses.

Last edited on

webJose (2948)

Just a quick note. In Microsoft compilers, it appears that volatile also creates a synchronization context for the variable. This means that any volatile variable is also thread-safe when using the Microsoft compiler.

See http://msdn.microsoft.com/en-us/library/12a04hfd.aspx .

Disch (13742)

Why was there a call to (eventually) notify_one() from within the wait condition at all?

Paranoia, mostly. I wanted to avoid deadlocks due to me not having poked the other thread.

I agree it doesn't need to be there, which is why I removed it. But I still don't see how its presence was causing this problem. I'd think all it would do is hurt performance.

'volatile' is irrelevant, it has nothing to do with threads

volatile means that writes are actually written to memory and reads are actually read from memory. It prevents memory accesses from being optimized away (by keeping the variable in a register or something).

While it's not everything necessary to make a variable threadsafe, it's at least partially related.

Atomics are also unnecessary: the mutex establishes the necessary synchronization of all non-atomic memory accesses.

Nope. Atomics are absolutely necessary. If I take them out the program deadlocks almost immediately.

Mutexes establish the necessary synchronization in the current thread, but not program wide.

Read that article I linked. That guy really goes into detail.
Also, I have a C++11 STL reference book which says the same.

webJose wrote:
Just a quick note. In Microsoft compilers, it appears that volatile also creates a synchronization context for the variable

I've read this as well, but it doesn't seem to be the case for VS2010 express. I've tried several times to remove that Atomic class, and each time it throws a wrench in the works.

Last edited on

webJose (2948)

Ok, no worries. I am not actually following this closely as I arrived when it was marked as fixed. I just wanted to let people know about this singularity with Microsoft.

Cheers.

Cubbi (4774)

Nope. Atomics are absolutely necessary. If I take them out the program deadlocks almost immediately.

That means it makes the error easier to reproduce, that is a good thing. If you can make a complete, compilable testcase, it could be tracked down. As posted, it appears to be unnecessarily complicated -- or perhaps underspecified.

Mutexes establish the necessary synchronization in the current thread, but not program wide.

They wouldn't be useful if they only synchronized "the current thread" (what would that even do?). The mutex lock and unlock operations synchronize two threads, which is all you have.

Disch (13742)

I'll come up with a minimal repro example when I get home from work tonight. But I'm 99% sure you are making common false assumptions about how mutexes work.

They wouldn't be useful if they only synchronized "the current thread" (what would that even do?)

I think we might be talking about two different things.

A mutex ensures that:
1) No more than one thread has the mutex locked at a time

2) Upon unlock, it is guaranteed that all writes have been completed. Any other thread which reads those variables will read back the newly written value (assuming, of course, the other threads are actually reading memory and not a cached/optimized copy in a register -- this is why "volatile" is important)

The key here is task #2. Without that guarantee, the processor might still be in mid-write when the mutex is released, which would result in other threads reading a partially complete value.

It is not possible for unlocking a mutex in one thread to guarantee that all writes in all threads have been completed. If thread A unlocks a mutex, that means thread A's writes have been completed. There is no way thread B's writes can have that same guarantee unless they are also unlocking a/the mutex.

Seriously... read that article. That guy explains it a lot better than I am.

http://cbloomrants.blogspot.ca/2009/01/01-25-09-low-level-threading-junk-part.html

Cubbi (4774)

Seriously... read that article

I did, it's the first (and the least focused) part of a multipart blog post on basics of (mostly Windows-specific) lockfree programming. Which is fine, but it's not particularly relevant.

Disch (13742)

So then you read the section where he talks about memory barriers and why they're important.

Since a mutex acts as a memory barrier, that makes it necessary to use for variables which are used to synchronize threads.

Cubbi (4774)

yes, since mutex introduces synchronization between two threads, it is sufficient to guarantee that the value stored in one thread is available in another.

volatile has nothing whatsoever to do with this (except in MSVC), and atomics/barriers, if used carefully, offer an alternative, lock-free, synchronization mechanism.

Last edited on

rollie (304)

I may have an idea for you Disch:


1 P:  About to wait for frame to start...                  <- PPU finished frame
2 P: Thread waiting...                                     <- PPU thread now waiting
3 C: Thread Waking...                                      <- CPU thread is poked (still waiting, this just means notify_one has been called and it should recheck its condition)
4 P:  checking:  on=1, ts=2147483647 OUT=0   &clk=023DF788 <- PPU thread condition is checked, 'OUT=0' so thread remains waiting
5 C: Thread Waking...                                      <- poke
P: Thread Waking...                                      <- poke
6 C: ...waiting complete                                   <- CPU thread now wakes up and resumes
7 <OFF> = 0                                                <- EmulationOn changed to false, to signal this frame is done
8 N:  End of frame...                                      <- Frame end
9 N:  Start of frame...                                    <- Next frame starts
10 P: Timestamp reset T=0  &clk=023DF788                    <- PPU resets timestamp  !!!!IMPORTANT notice timestamp=0
11 P: Thread Waking...                                      <- poke PPU thread
12 <ON> = 1                                                 <- EmulationOn is true, new frame is starting
(reg write)  2000=10                                     <- CPU needs to sync with PPU on reg writes
13 C: Thread waiting...                                     <- so CPU waits
14 P: Thread Waking...                                      <- PPU is poked
15 P: Thread Waking...                                      <-  ... twice
16 P:  checking:  on=1, ts=2147483647 OUT=0   &clk=023DF788 <- PPU's condition is checked !!!! NOTE ts != 0

In the non-working code, the PPU starts waiting, tries to wake up the CPU thread [3], moves on to checking condition before the PPU thread's "wait" call [4], and then calling wait, which begins by calling 'WakeThis()'. You complete the WakeThis call, and begin calling your predicate (still in PPU thread). You calc result, and enter the Log function, but before the log function does anything with the input, we switch to the CPU thread, which attempts to wake PPU thread (which may not be sleeping yet) [6], and exits the Wait function [7]. The rest of the code is all CPU thread (note that I believe [10] is actually the CPU thread altering a variable in the PPU thread, so may be more properly prefixed with 'C'). When the CPU thread finishes, it calls WakeThis on the PPU thread [14], fails its condition, and then calls wake on the PPU again [15]. Now I'm not sure what [15] is from, because based on the mutex before wait() in Wait(), the CPU thread should NOT be able to call wait at all until the PPU thread releases that mutex. Assuming this is true, is the CPU thread attempting to wake the PPU thread as part of its condition? If so, I would expect the CPU thread then locks at the mutex before wait(), which may cause problems when another thread attempts to wake it (not sure if this is an expected case for the threading system). Alternately, if the mutex for some reason isn't blocking the CPU thread from continuing, it would again be waiting an entire iteration ahead of the PPU thread, which maybe causes some problems?

Maybe I'm grasping, but this might at least explain the output of ts=2147483647 when it's clearly been set to 0 before that. I can't say why removing the WakeThis() from the wait predicate fixes the problem either...

Last edited on

Disch (13742)

You might have lost me, rollie. XD

You're right about [10]... that is being done from the CPU thread so that is inappropriately marked.

I'm not sure why [15] is there either. I must be doubling up on WakeThis calls somewhere. I'd have to look into it (though I lack interest since it doesn't appear to be a problem).

based on the mutex before wait() in Wait(), the CPU thread should NOT be able to call wait at all until the PPU thread releases that mutex.

My calls to wait are not mutex guarded. The mutex I put in a unique_lock and give to wait() afaik is useless. I only do it because the API seems to require it. I am not touching that mutex anywhere but in the Wait() function.

I believe its intention was to guard the variables (so I guess here, I could use that mutex to guard accesses to my timestamp - but I'm not using it for that purpose). From what I understand, condition_variable has the mutex unlocked while waiting, locks it before checking the predicate, then unlocks it again if it goes back to waiting.

@Cubbi:

I think we're both saying the same thing. I know volatile does not make a variable threadsafe. I only use it as extra assurance to prevent variable caching. Since the mutex acts as a memory barrier, I realize that I probably don't need to make the variable volatile, but whatever.

My original point was that you said I didn't need to use the Atomic class:

you wrote:
Atomics are also unnecessary: the mutex establishes the necessary synchronization of all non-atomic memory accesses.

The atomic class is the mutex. Without the Atomic class, there's no mutex, so no synchronization.

The only other mutexes are those used by the logger (irrelevant) and the ones given to wait() (also irrelevant, since they are only locked in one thread).

Cubbi (4774)

Without the Atomic class, there's no mutex, so no synchronization.

Huh? Atomicity and mutual exclusion are two different concepts, and synchronization is yet another, different, concept.

the ones given to wait() (also irrelevant, since they are only locked in one thread).

If that is true, that condition variable is quite liikely useless (or causes undefined behavior), but it would help to see a complete testcase.

Last edited on

Disch (13742)

Huh? Atomicity and mutual exclusion are two different concepts, and synchronization is yet another, different, concept.

I feel like we're going in circles here.

All 3 of those different concepts do 1 thing in particular: they form a memory barrier. They guarantee that once a variable in one thread is written, other threads will see the newly written value.

Yes, they are all different, but they all have that one thing in common. That one thing is what I'm using my Atomic class for. Since using a memory barrier directly is OS dependent, I am avoiding it. And since atomicity is only supported in the language through the std::atomic class (not supported by my compiler), I can't use it.

That means my only option for variable synchornization is using a mutex. Which I am wrapping inside my Atomic class to ensure that all accesses to that variable are mutex guarded.

If that is true, that condition variable is quite liikely useless

I don't understand why you'd think that. The condition variable allows a thread to sleep while it waits for a certain condition to be true.

The mutex provided to the condition variable is supposed to be used to guard accesses to the condition. However I don't need to use it that way because my condition is already guarded through another mutex via the Atomic class.

it would help to see a complete testcase.

I started to make a small repro case, but I gave up due to lack of interest. I think I outlined my setup pretty clearly above. If anything else about it is still unclear to you, I'm happy to clarify.

Last edited on

coder777 (8439)

They guarantee that once a variable in one thread is written, other threads will see the newly written value.

No. if two (or more) threads looking at the same memory address they see the changes immediately once a thread did it. For this nothing else but doing the operation is required.

Atomic for a POD is really needless. simple operations with PODs resolves to a single processor instruction. More atomic it can't get.
Atomic is only for complex variables to act like POD.

This is certainly a race condition. You just can't tell when a certain instruction is executed. Atomic doesn't help in this cases (it just makes sure that a value is completely read/written), but it doesn't adjust the order. (Well, somehow it does, since it slows down you program)

Topic archived. No new replies allowed.