Confused on Lock-Free Stack (Linked List) implementation

In C++ Concurrency in Action, 2e, The author describes a lock-free thread safe linked list implementation. Right now, he is describing the pop() method and how to safely delete the nodes in a sort of "garbage-collector" like method to make sure that no other threads called pop on the same instance. Here is some of that code for pop:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <atomic>
#include <memory>

template<typename T>
class lock_free_stack
{
private:
    std::atomic<unsigned> threads_in_pop;
    void try_reclaim(node* old_head);
public:
    std::shared_ptr<T> pop()
    {
        ++threads_in_pop;
        node* old_head=head.load();
        while(old_head &&
              !head.compare_exchange_weak(old_head,old_head->next));
        std::shared_ptr<T> res;
        if(old_head)
        {
            res.swap(old_head->data);
        }
        try_reclaim(old_head);
        return res;
    }
};


The important thing is that a counter is incremented atomically every time pop() is called. Then, the try_reclaim function will decrement said counter. The try_reclaim function simply "tries" to delete (free the memory of) old_head if there are no other threads executing pop(). If it can't do so, then it simply adds it to a list of nodes to be deleted at a later time when a single thread is the only thread that executed pop(). Here is try_reclaim's implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
    void try_reclaim(node* old_head)
    {
        if(threads_in_pop==1) //#1
        {
            node* nodes_to_delete=to_be_deleted.exchange(nullptr);
            if(!--threads_in_pop) //#2
            {
                delete_nodes(nodes_to_delete);
            }
            else if(nodes_to_delete)
            {
                chain_pending_nodes(nodes_to_delete);//#3
            }
            delete old_head; //#4 THIS IS THE PART I AM CONFUSED ABOUT
        }
        else
        {
            chain_pending_node(old_head);
            --threads_in_pop;
        }
    }

The implementations of the other functions called here are irrelevant (they just add nodes to a chain of nodes to be deleted), so I have omitted them. The part I am confused about in the code is #4 (marked). Here, the author calls delete on the old_head that was passed in. However, why doesn't he check to see if threads_in_pop is still zero at this point before deleting old_head? He double checks in lines #2 and #1 to make sure another thread isn't currently in pop(), so why doesn't he check again before proceeding to delete old_head? Couldn't another thread have possibly called pop() right after #3, thus incrementing the counter, and by the time the first thread reaches #4, then threads_in_pop would no longer be zero?

That is to say, couldn't it be possible that threads_in_pop is, for example, 2, by the time the code reaches #4? How could he safely delete old_head in that case? Could someone explain please?
Last edited on
why doesn't he check again before proceeding to delete old_head? Couldn't another thread have possibly called pop() right after #3, thus incrementing the counter, and by the time the first thread reaches #4, then threads_in_pop would no longer be zero? [...] How could he safely delete old_head in that case?
The issue is not safety. The calling thread is the only one that knows about old_head, so the pointer can be released at any point without any checks. I think what the author was trying to do is prevent lock contention on free(). Perhaps they reasoned that if threads_in_pop gave the same value twice in a row then the application was out of a parallelism-heavy section, so releasing memory now would not slow anything down.

Personally, I would not have bothered with try_reclaim(). I'd just keep a second lock-free list to store unused nodes, and pop from it when a new element is pushed to the outer stack.
...prevent lock contention on free(). Perhaps they reasoned that if threads_in_pop gave the same value twice in a row then the application was out of a parallelism-heavy section.


I'm confused. What is "lock contention" on free? There are no locks here since the data structure operates atomically and is lock-free, so what exactly do you mean?

Also, you mention that the calling thread is the only thread that knows about old_head, so does that mean that old_thread could have been deleted before line 5 in the source code? What about before line 3 in the source code?

Here is what the author said:

The basic problem is that you want to free a node, but you can’t do so until you’re
sure there are no other threads that still hold pointers to it. If only one thread ever
calls pop() on a particular stack instance, you’re home free...

...On the other hand, if you need to handle multiple threads calling pop() on the
same stack instance, you need some way to track when it’s safe to delete a node. This
means you need to write a special-purpose garbage collector for nodes...

...If there are no threads calling pop() , it’s perfectly safe to delete all the nodes cur-
rently awaiting deletion. Therefore, if you add the nodes to a “to be deleted” list when
you’ve extracted the data, then you can delete them all when there are no threads call-
ing pop() . How do you know there aren’t any threads calling pop() ? Simple—count
them. If you increment a counter on entry and decrement that counter on exit, it’s
safe to delete the nodes from the “to be deleted” list when the counter is zero. It will
have to be an atomic counter so it can safely be accessed from multiple threads...

...If the count of threads_in_pop is 1 when you’re trying to reclaim the node,
you’re the only thread currently in pop() , which means it’s safe to delete the node
you just removed
, and it may also be safe to delete the pending nodes. If the count
is not 1, it’s not safe to delete any nodes, so you have to add the node to the pending
list...

...This works reasonably well in low-load situations, where there are suitable quies-
cent points at which no threads are in pop() . But this is potentially a transient situa-
tion, which is why you need to test that the threads_in_pop count decrements to zero
d before doing the reclaim and why this test occurs before you delete the just-
removed node.


That second to last paragraph basically translates try_reclaim into English sequentially. Notice that he says (when referring to line 3 in the source code) that if you are the ONLY thread in pop, then it is safe to "delete the node you just removed," which is what line 14 does. My question is, if a thread has not yet executed line 14, and it is preempted to another few threads, which both call pop, and THEN delete is called, is it still safe?
Last edited on
I'm confused. What is "lock contention" on free? There are no locks here since the data structure operates atomically and is lock-free, so what exactly do you mean?
Memory management is not lockfree. Allocating and releasing memory involves at least manipulating a complex process-wide data structure, and at most a context switch into kernel.

Also, you mention that the calling thread is the only thread that knows about old_head, so does that mean that old_thread could have been deleted before line 5 in the source code? What about before line 3 in the source code?
I don't know what you mean.

...On the other hand, if you need to handle multiple threads calling pop() on the
same stack instance, you need some way to track when it’s safe to delete a node. This
means you need to write a special-purpose garbage collector for nodes...
This paragraph is absurd. The entire point of the while loop with the compare exchange is to ensure that the calling thread is popping from the linked list atomically. The loop can only end in one of two circumstances: EITHER the list is empty, OR the current thread contains the only pointer into the removed node. If anything else happens then the CPU is not working properly.

A lockfree linked list is like baby's first lockfree data structure. I don't know how the author managed to confuse themselves to this degree.
EITHER the list is empty, OR the current thread contains the only pointer into the removed node. If anything else happens then the CPU is not working properly.


That is what I thought. So I'm guessing the author is incorrect in this paragraph?

I don't know what you mean.


After reading your post, I think I may understand, correct me if I am wrong here: After the compare_exchange while loop, which is a read-write-modify operation, IF the code breaks out of the loop, that can only mean one thing: that the ORIGINAL head has been modified, and you now hold the only existing pointer to old_head, am I correct?

If that is the case, at what point is it alright to delete old_head inside of pop() (what line number in my above source code)?
Last edited on
OR the current thread contains the only pointer into the removed node.

any number of threads may execute head.load() at the same time and hold copies of the same head pointer in their local "old_head" variables.
One of them then does the CAS and moves on to delete the pointer (imagine there is no threads_in_pop test)
Then the rest of them attempt CAS, but the pointer they hold was freed, so they segfault if lucky
Last edited on
The entire body of try_reclaim() can be safely replaced with delete old_head;.
Last edited on
@Cubbi
One of them then does the CAS and moves on to delete the pointer (imagine there is no threads_in_pop test)
Then the rest of them attempt CAS, but the pointer they hold was freed, so they segfault if lucky


I see. Because the second argument to compare_exchange dereferences the pointer even if the compare_exchange function will fail because it is being passed as an argument. That makes sense now that I look at it.

But in that case, could you please tell me what happens here in the try_reclaim function?
delete old_head;

Why doesn't the author write this instead?
if(!threads_in_pop) delete old_head;

Why is that test not necessary before deleting old_head?

Last edited on
any number of threads may execute head.load() at the same time and hold copies of the same head pointer in their local "old_head" variables.
Ah, you're right. It's missing the old_head=head.load(); inside the body of the while. Again, very weird code.
Why doesn't the author write this instead?
if(!threads_in_pop) delete old_head;

other threads may come into pop() and call head.load(), but they came in after your thread's threads_in_pop==1, therefore after your CAS, therefore the pointer they got out of head.load() is not a copy of your old_head.
Ah, you're right. It's missing the old_head=head.load(); inside the body of the while. Again, very weird code.


I believe that the compare_exchange of the standard library does that already. It doesn't need to be in the loop.

If the atomic variable is equal to OPERAND1, then it is set to OPERAND2 and the function returns TRUE. If not, then OPERAND1 is set to the correct value of the atomic variable, and the function returns FALSE (which means the loop will try again). So I don't think you need to explicitly set old_head to head in the body.
other threads may come into pop() and call head.load(), but they came in after your thread's threads_in_pop==1, therefore after your CAS, therefore the pointer they got out of head.load() is not a copy of your old_head.


That makes sense. Thank you! I was banging my head against the wall trying to solve this problem for hours. I appreciate your response.
Cubbi: I still don't understand. If compare_exchange() sets the first argument to the value of the atomic if the exchange didn't happen, how could more than one thread hold the same pointer when they leave the loop? That doesn't make sense to me.
@helios
I think the problem is not with the thread holding the same pointer when they leave the loop, it's with the compare_exchange() call itself.

Suppose multiple threads get to this point:
 
node* old_head=head.load();


That means they will all hold the same pointer to the same head instance. Now suppose one of these threads completes its compare_exchange loop and moves on to delete the node. Since the original head has changed, it thinks that it holds the only pointer to the resource, since all other threads that hold the same pointer will fail in the compare_exchange loop, have their old_head updated to the current head, and thus they will be safe. However:

 
head.compare_exchange_weak(old_head,old_head->next))


This is just a regular function call. That means, when a preempted thread that calls compare_exchange_weak again, it must first pass its old_head and old_head->next to std::atomic::compare_exchange_weak.

Before the function is even evaluated, we have a problem. The second parameter of the function call dereference old_head and accesses next within it in order to pass it as an argument. This could be bad if the original thread has deleted that memory.

So even before the function is called and executed, old_head has to be dereferenced in order to call compare_exchange and pass it the parameters thus it could dereference deleted memory. This could lead to a segfault if lucky

In order for the dereference "old_head->next" to not accidentally dereference deleted memory, the original thread must make sure it is the only thread that has executed pop(), which is the purpose of the atomic counter. Thus, when threads_in_pop ==1, that means that it is the only thread holding that pointer. Even if other threads come on after that, increasing threads_in_pop, none of them will hold the same pointer because the original head was updated in the compare_exchange and thus they will load() the updated head in pop(). So once the original thread gets to that point in try_reclaim, it can safely delete it. If not, it must chain it onto a list of nodes to be deleted for another thread to delete when it is the only one left after the pop call.

At least that's what I think is right after reading Cubbi's post. I could be wrong.
Last edited on
Before the function is even evaluated, we have a problem. The second parameter of the function call dereference old_head and accesses next within it in order to pass it as an argument. This could be bad if the original thread has deleted that memory.
Let's decompose the code a bit. The relevant code is really
1
2
3
4
5
6
7
8
9
10
11
auto old_head = this->head;
while (true){
    if (!old_head)
        break;
    if (this->head != old_head){
        old_head = this->head;
        continue;
    }
    this->head = old_head->next;
}
delete old_head;
In order for a use-after-free situation to arise, two threads executing this code in parallel would need to be scheduled thusly:
1
2
3
4
5
6
7
auto old_head = this->head;  //A
auto old_head = this->head;  //B
if (this->head != old_head){ //A (check is false)
if (this->head != old_head){ //B (check is false)
this->head = old_head->next; //A
delete old_head;             //A
this->head = old_head->next; //B 
In other words, the execution of one thread's compare-exchange would need to be interleaved with that of another thread's, which would make it non-atomic.

What am I missing?

EDIT: Never mind, I get it.
Last edited on
In order for a use-after free

No, it doesnt have to be used within the body of the function call. I'm saying the fact that "old_head->next" is passed as an argument to the function means that it is dereferenced before it even reaches the body of the function. old_head has to refer to valid memory before "old_head->next" can be evaluated (dereferenced). It is dereferenced BEFORE you pass it to the compare_exchange function (when you do old_head->next in the argument list of the call to compare_exchange) and thus this dereference could be bad if old_heads memory has been deleted.
Last edited on
Topic archived. No new replies allowed.