The Proof Of Correctness Of The Treiber Stack And Its Aba Problem Mitigation Using Stam’S Schemes

2019-09-26 · Leonardo Benicio

Summary

A comprehensive technical exploration of the proof of correctness of the treiber stack and its aba problem mitigation using stam’s schemes, covering key concepts, practical implementations, and real-world applications.

Contents

The Ghost in the Machine: Why Your Lock-Free Stack Isn’t (Yet) Correct

Imagine a world without locks. No Mutex, no synchronized blocks, no sem_await. Just threads moving at the speed of light, cooperating on shared data with nothing more than a single, atomic instruction. This is the siren song of lock-free programming—a promise of performance without the plague of deadlocks, priority inversion, and convoying that plague traditional locking. It’s the dream of every systems programmer who has ever watched a 64-core machine waste 60 of them waiting for a single pthread_mutex_lock.

At the heart of this dream lies one of the most elegant, most studied, and most deceptively simple data structures: the Treiber Stack. Conceived by R. Kent Treiber in his seminal 1986 IBM research report, it is the “Hello, World!” of lock-free data structures. It’s a linked-list stack, and its fundamental operation is a thing of beauty. To push a new node, you simply read the current head, point your new node to it, and then use a single atomic instruction—Compare-And-Swap (CAS)—to swap the head to your new node, but only if the head hasn’t changed since you read it.

// Pseudo-code for Treiber Stack Push
void push(Node *new_node) {
    Node *old_head;
    do {
        old_head = top;  // Read the current top
        new_node->next = old_head;
    } while (!CAS(&top, old_head, new_node));
}

*It’s atomic. It’s non-blocking. It’s pure, concurrent genius. For a decade, it was the gold standard. A researcher could write a proof of its correctness—a formal argument showing that it was a “linearizable” stack, that its operations appear to happen atomically at a single point in time. The proof relied on a single, seemingly unshakeable assumption: the value you read with CAS is a unique identifier for a state of the world.*

But th… but that assumption is a castle built on sand. The world of concurrent memory is not static; it is a mercurial ocean where the same bit pattern can reappear after a node has been freed and reallocated. The elegant CAS‑based loop can silently succeed on a stale pointer, corrupting the entire structure. This is the ABA problem – the ghost in the machine that has haunted lock‑free programming for decades, transforming a beautiful solution into a ticking time bomb.

In this post, we will dissect the ABA problem from its atomic roots to its modern solutions. We’ll walk through concrete failure scenarios, explore why simple fixes like reference counting fail in the face of concurrency, and then dive into the sophisticated C++ and Rust mechanisms that finally exorcise the ghost. By the end, you’ll understand not just the problem, but the philosophy behind lock‑free correctness – and why even a “safe” lock‑free stack remains a challenging piece of engineering.

1. The Treiber Stack: A Closer Look

Before we can understand what breaks, we must first fall in love with what works – at least in isolation.

The Push Operation

The Treiber stack is a singly linked list where a shared global pointer top points to the head node. A push operation allocates a new node, sets its next pointer to the current head, and then atomically replaces top with the new node provided that the head hasn’t changed since it was read. The CAS instruction is a primitive supported by virtually all modern CPUs: it takes an address, an expected old value, and a desired new value. It atomically compares the memory at that address to the expected value; if they match, it updates the memory to the new value and returns true; otherwise it returns false and leaves the memory unchanged.

void push(Node *new_node) {
    Node *old_head;
    do {
        old_head = load(&top);           // acquire semantics implied
        new_node->next = old_head;
    } while (!CAS(&top, old_head, new_node));
}

Notice the loop. If another thread pushes a node between the load and the CAS, the expected value old_head will be stale, the CAS will fail, and the loop retries with a fresh read. This is a classic lock‑free technique: no lock is held, but progress is guaranteed because some thread will succeed eventually.

The push operation is non‑blocking: it does not rely on other threads to finish their operations. Even if a thread is suspended in the middle of a push, the CAS on another thread will either succeed or fail, and the system as a whole will make progress.

The Pop Operation

Pop is trickier. It reads the head, reads the next pointer of that node, and then attempts to CAS top from the old head to the next node. Again, if another thread pops between the read and the CAS, the CAS fails and we retry.

Node *pop() {
    Node *old_head, *new_head;
    do {
        old_head = load(&top);
        if (old_head == NULL) return NULL;
        new_head = load(&(old_head->next));
    } while (!CAS(&top, old_head, new_head));
    return old_head;
}

This looks symmetrical to push, but it hides a subtle danger: after we free the node that was popped, another thread might still have a pointer to it, leading to a use‑after‑free. For now, let’s assume that the popped node is never freed – maybe we return it to a management layer that never recycles memory. This assumption is the first crack in the armour.

The Proof of Linearizability

In a sequential execution, a stack behaves as a LIFO (Last‑In‑First‑Out) structure. In a concurrent execution, we want each operation to appear to take effect at a single point in time – the linearization point. For a Treiber push, the linearization point is the successful CAS: the push is considered to have happened exactly when the new node becomes the head. For a pop, the linearization point is also the successful CAS, which simultaneously removes the old head. Because these points are instantaneous (an atomic instruction), the stack operations are linearizable if the memory management is correct.

The proof works because the global pointer top is the sole reference to the stack. If we assume that each node address is unique forever (i.e., no memory reuse), then the CAS that matches the expected old_head can only succeed if the head was indeed that node at that instant. There is no way for a stale old_head to reappear after the node has been removed because that address will never be allocated again – or so the proof assumes.

But reality, especially in systems programming, does not permit such a luxury. We must free memory. And freeing memory opens the door to the ABA problem.

2. The ABA Problem: The Ghost Revealed

The ABA problem is a classic race condition that can plague any lock‑free algorithm using CAS on a pointer. It occurs when a location is read, then later modified, and then modified back to its original value between the read and the CAS. The CAS sees that the location still contains the original value, assumes nothing has changed, and performs an update – but the world has changed and the assumption is wrong.

Let’s illustrate with a concrete scenario for the Treiber stack.

A Fatal Sequence of Events

Imagine a singly linked list with three nodes: A (head), B, C. The top pointer points to A.

top -> A -> B -> C -> NULL

Thread 1 decides to pop the top node (A). It reads old_head = A and then reads A->next which is B. At this point, Thread 1 is preempted – perhaps by a context switch or a cache miss.

Thread 2 now runs and pops A. It successfully CASes top from A to B. Now the stack is:

top -> B -> C -> NULL

Thread 2 then pushes two new nodes, D and E. It allocates D and E, pushes them. But due to the memory allocator, the address of the newly allocated node D happens to be the same as the old address of node A (because A was freed and its memory reused). So now the stack is:

top -> D (address = &A) -> E -> B -> C -> NULL

Thread 2 then pops D (which has the same address as A). Now the stack is back to:

top -> E -> B -> C -> NULL

Note that top points to E, not A. But crucially, the address &A is now the address of a node that is no longer part of the stack. However, the memory at that address now contains some other data (maybe part of a different data structure or a newly allocated node from another thread). But in our scenario, no one else is using that exact address.

Now Thread 1 resumes. It tries to CAS top from old_head (which it believes is still A, i.e., the address &A) to new_head (which is B). The CAS sees that top currently holds the value &A? Wait, top currently holds &E, not &A. So the CAS would fail, and Thread 1 would retry. ABA is not yet triggered.

But let’s make the scenario more insidious. Suppose that after Thread 2 pops D, it pushes another node F that happens to be allocated at the address &A again. Then top points to F (address &A). Now the stack is:

top -> F (address &A) -> ...

After that, Thread 2 pops F again (so top now points to the next node, say G). So top no longer points to &A. Still Thread 1’s CAS would see top = &G, not &A, and fail. To get ABA, we need the value stored at top to change back exactly to the expected old_head value while the old node has been freed and its memory reused.

Consider this sequence:

Thread 1 reads top = A. (A is node 0x1000)
Thread 2 pops A, freeing it. top = B.
Thread 2 pushes a new node X, which coincidentally gets the same address 0x1000. top = X (address 0x1000).
Thread 2 pops X, freeing it again. top = B again? Or top = C? We need top to become exactly address 0x1000 again.

Thus, after step 4, top must point to the same address 0x1000 that was previously A, and later X. But X was popped, so that address is now free again. Thread 2 can push a third node Y, which again gets address 0x1000. Now top = Y (address 0x1000). The expected value old_head is &A which equals 0x1000. The current top is 0x1000. The CAS sees a match! But what does top point to? It points to Y, a completely different node than the original A. Y may have different next pointers, different data, etc. Thread 1’s new_head is B (the old next from A). But B may no longer be reachable in the stack – it might have been freed as well. So CAS replaces top with B, linking in a stale node that may already be freed or corrupted. The stack is now broken, leading to undefined behaviour, crashes, or silent data corruption.

This scenario requires that the same memory address be reused three times: first as A, then as X, then as Y. This is not as improbable as it sounds. Memory allocators, especially slab allocators, recycle recently freed memory aggressively. A busy stack can cause a node to be freed and reallocated many times in a tight loop, making the pattern common.

The ABA problem is not limited to stacks. Any lock‑free data structure that reads a pointer, then later performs a CAS on that pointer, is vulnerable – including queues, hash tables, and binary search trees.

Why a Simple Tag Won’t Always Save You

The classic response to ABA is to use a tag or version counter paired with the pointer. Instead of using a plain pointer, we use a double‑word CAS (DCAS) that atomically compares and swaps both the pointer and a tag. Even if the pointer repeats, the tag increments each time the memory location is updated, so the two‑word combination remains unique.

On 32‑bit systems, a double‑word CAS (known as CMPXCHG8B) was available. On 64‑bit systems, CMPXCHG16B can atomically update a 16‑byte structure. This allowed data structures to embed a 64‑bit pointer and a 64‑bit tag (or 64‑bit pointer + version). But these instructions are not universally available (e.g., on ARM, no double‑word CAS exists before ARMv8.1). Moreover, if the pointer itself is 64 bits, we need 128 bits of atomic state – which may be too large for some lock‑free data structures that store additional metadata inside the pointer (e.g., low bits for flags).

Tagged pointers work well in many cases, but they only shift the problem: instead of worrying about pointer reuse, we must worry about tag reuse. If the counter wraps around, ABA reappears. For a 16‑bit tag wrapping in a busy system, that could happen in minutes. A 64‑bit tag would be safe for cosmic timescales, but you might not have 64 spare bits. A common solution is to use a 48‑bit pointer (the upper bits of x86_64 are reserved for user space) and a 16‑bit tag. But that still leaves a chance, albeit astronomically small, of wrap‑around during a concurrent window. Formal proofs often ignore wrap‑around, assuming that the thread will be preempted for an unnaturally long time – but in safety‑critical systems, this is unacceptable.

The ABA problem is not just a theoretical curiosity. It has caused real‑world bugs in production systems, including the Linux kernel (see the infamous atomic_ops library bug) and database engines. Understanding it is essential for anyone building lock‑free data structures.

3. Memory Reclamation and Dangling Pointers

Beyond the ABA problem, another challenge lurks: how to safely reclaim memory that has been removed from the data structure. In a lock‑free stack, after a pop succeeds, the popped node is logically removed. But other threads might still be holding references to that node – for example, a thread that read the node’s next pointer before it was popped. If we immediately free the node, those other threads may access freed memory, causing use‑after‑free bugs.

This is the memory reclamation problem. It is intimately related to ABA because freed memory can be recycled, leading to ABA. But even without ABA, a dropped reference can crash the system.

Reference Counting and Its Failure

A naive solution is to use reference counting: each node has a reference count; we increment it whenever a thread “holds” a reference (e.g., after reading the pointer), and decrement when done. When the count reaches zero, the node can be freed. However, reference counting itself is not lock‑free – atomic increments and decrements are expensive but possible. More critically, it doesn’t solve ABA: the reference count might be non‑zero, but the pointer could still be recycled if the count is not properly synchronised. Also, to decrement the count, you need a pointer to the node – but if the node is already freed by another thread, you can’t decrement safely.

Moreover, reference counting introduces cycle problems (though a stack is acyclic) and the need for a safe way to access the node’s counter without using the node itself (chicken‑and‑egg). In practice, reference counting is rarely used for lock‑free memory reclamation; more advanced techniques are required.

Non‑Blocking Reclamation Schemes

Several families of memory reclamation have been developed for lock‑free data structures:

Hazard Pointers (Maged Michael, 2004): Each thread publishes a list of pointers that it is about to dereference. Before freeing a node, a thread checks that no thread has listed that node as a hazard. If so, the node is not freed – it is placed on a “retire list” and freed later when safe.
RCU (Read‑Copy‑Update) (Paul McKenney, 2002): Reads are cheap (no locks, no atomics); writers publish new versions and wait until all pre‑existing readers have finished before reclaiming old versions. RCU is ideal for read‑mostly workloads but requires a grace period detection mechanism.
Epoch‑Based Reclamation (EBR) (Fraser, 2004): Threads annotate their epochs. A writer can only free a node after all threads that were active in the epoch when the node was retired have moved to a new epoch. This is less heavyweight than hazard pointers but can delay reclamation.
Quiescent State‑Based Reclamation (similar to EBR) used in the Linux kernel for RCU.

Each scheme has trade‑offs: performance, memory overhead, progress guarantees (blocking vs. wait‑free). We will examine hazard pointers in detail later, as they are a direct solution to ABA and memory reclamation for the Treiber stack.

4. Real-World Consequences

To appreciate why you should care, consider two real‑world incidents.

The Linux Kernel `atomic_ops` Bug

In the early 2000s, the Linux kernel used a non‑blocking memory allocator SLAB that relied on a lock‑free LIFO list. A bug was discovered that manifested as random memory corruption under heavy memory pressure. The root cause was the ABA problem: when a slab node was freed and immediately reallocated for a different purpose, the pointer value matched, CAS succeeded incorrectly, and the allocator’s internal linked list turned into a cycle. The bug was fixed by switching to a version‑tagged pointer (using the cmpxchg8b instruction on 32‑bit x86). This incident prompted widespread adoption of tagged pointers in kernel data structures.

The MongoDB WiredTiger Storage Engine

MongoDB’s WiredTiger engine uses lock‑free skip lists for its transaction cache. In earlier versions, occasional unrecoverable node corruption was traced back to an ABA scenario involving node reuse after deletion. The fix involved using hazard pointers and careful memory ordering. The bug led to data inconsistencies and forced a redesign of the reclamation subsystem.

These examples underline that ABA is not a textbook curiosity – it’s a threat to reliability in any high‑performance concurrent system.

5. Solutions to ABA and Memory Reclamation

Now let’s explore the most practical solutions, starting with the most direct: tagged pointers.

5.1 Tagged Pointers

The idea is to store a monotonically increasing counter alongside a pointer, and atomically update both together. On x86_64, CMPXCHG16B operates on a 16‑byte double quadword. We can pack a 64‑bit pointer and a 64‑bit tag into a 16‑byte structure.

typedef struct __attribute__((aligned(16))) {
    void *ptr;
    uint64_t tag;
} TaggedPointer;

bool CAS(TaggedPointer *dest, TaggedPointer old, TaggedPointer new) {
    // Use inline assembly for CMPXCHG16B
}

However, not all platforms support 16‑byte CAS. ARMv8.1 introduced CASP (Compare and Swap Pair), but earlier ARMv8‑A required a load‑linked/store‑conditional loop. On 32‑bit systems, CMPXCHG8B works, giving a 32‑bit pointer and a 32‑bit tag.

The tag must be incremented on every modification of the pointer – even when the pointer value remains the same (e.g., push‑pop‑push with same address). That way, the hoist that Thread 1 read (ptr, tag) = (A, X) will later find (A, X+1) after a reuse, and the CAS fails. The probability of a full tag wrap‑around within the window of a loop is negligible if the tag is large enough.

But tagged pointers have drawbacks:

They consume extra bits, potentially reducing the available address space or requiring alignment tricks.
They are not portable (no 16‑byte CAS on many architectures).
They do not solve the memory reclamation problem – you still need to know when it is safe to free a node. A tag doesn’t protect against use‑after‑free if a thread dereferences a node while it is being freed.

Hence, tagged pointers are often combined with a deferred reclamation scheme.

5.2 Hazard Pointers

Maged Michael’s hazard pointers provide a safe memory reclamation and ABA prevention mechanism. The idea is simple: each thread maintains a small array of hazard pointers that it publishes before accessing a shared object. Before freeing an object, a thread checks all hazard pointers: if the object is listed, it is not freed; it is deferred to a retired list and freed later when no hazard exists.

For the Treiber stack, the pop operation would:

Read top into a hazard pointer (publish it).
Read top->next.
Attempt CAS. If successful, the node is no longer in the stack; we can try to reclaim it.
Before freeing, we check if any other thread has published a hazard pointer pointing to that node. If yes, we must defer freeing.

This prevents ABA because once a node is removed from the stack, no thread can successfully CAS back to it if they use hazard pointers: the node’s address will not be recycled until every thread has indicated it is safe. But ABA can be avoided even without hazard pointers in the CAS itself? Actually, hazard pointers alone do not prevent ABA; they only ensure safe reclamation. To prevent ABA, we must ensure that the pointer value in top does not change to an old freed node and then back. Using hazard pointers, we never free a node that any thread might still be pointing to, so the reuse of that exact address is delayed until all threads have moved past it. This effectively prevents the ABA scenario because the node’s memory cannot be reused while any thread could still have a reference. However, if a node is freed and then later reallocated for a different data structure (not the stack), the address could still reappear. But hazard pointers only protect a specific data structure; cross‑structure reuse can still cause ABA. That’s why tagged pointers are more robust.

Therefore, a common combination is: use tagged pointers to protect against ABA from any source, and use hazard pointers (or epoch‑based reclamation) for safe memory reclamation. The two together provide robust correctness.

// Hazard pointer example for pop (simplified)
Node *pop(Stack *s) {
    Node *head;
    do {
        head = (Node*)atomic_load(&s->top);
        hazard_pointer_publish(&hp[thread_id], head);
        // re-read to ensure consistency
        Node *check = (Node*)atomic_load(&s->top);
        if (check != head) {
            hazard_pointer_clear(&hp[thread_id]);
            continue; // retry
        }
        Node *next = (Node*)atomic_load(&head->next);
    } while (!CAS(&s->top, head, next));
    hazard_pointer_clear(&hp[thread_id]);
    // now reclaim head
    retire_node(head, reclaim_func);
    return head;
}

5.3 Epoch-Based Reclamation (EBR)

EBR works with global epochs that advance periodically. Each thread announces the current epoch it is in. When a node is removed from the data structure, it is marked with the current epoch. The node can be freed only when all threads that were active in that epoch have moved past it (i.e., entered a newer epoch). This avoids scanning hazard lists, which can be more efficient.

In the context of ABA, EBR also prevents reuse of addresses while threads might still hold references, so it indirectly reduces the chance of ABA. But it does not fully eliminate it because addresses can be reused across different epochs if the memory allocator returns the same block.

Thus, an additional version counter is still recommended.

5.4 RCU (Read-Copy-Update)

RCU provides a lightweight read side with no fences (on some platforms) and a slightly heavier write side that waits for a grace period. In a lock‑free stack using RCU, a pop would be a reader that merely traverses the list; but removal requires a grace period. RCU is excellent for read‑intensive workloads, but it requires that mutations be rare. A Treiber stack, where every pop removes a node, is write‑heavy, so RCU may not be ideal.

5.5 Using Double CAS without Native Support

If your platform lacks CMPXCHG16B or CASP, you can emulate double‑word CAS with a lock‑based fallback or use a separate tag array indexed by the pointer’s bits. For example, some implementations maintain a global version number for each memory page and scrub the page when addresses are reused. This is complex and slow.

A more practical approach for many projects is to use a lock‑free data structure that avoids the ABA problem entirely, such as Michael and Scott’s queue, which uses a two‑pointer technique to avoid CAS on head and tail simultaneously.

6. Beyond the Stack: ABA in Other Data Structures

The ABA problem isn’t limited to stacks. Consider a lock‑free FIFO queue using a linked list, with a sentinel node. The dequeuing operation often reads the head and attempts to CAS it to the next node. If a node is freed and its address reappears, ABA can corrupt the queue. The ticket‑based ABA solution with tagged pointers is common.

Lock‑free hash tables that use open addressing with CAS may also suffer ABA if a bucket’s key changes from value X to Y and back to X. The CAS that expects X might succeed incorrectly. To avoid this, hash tables often use version counters per bucket.

Lock‑free binary search trees (e.g., the Harris‑based BST) are notorious for ABA because they operate on multiple nodes simultaneously (e.g., marking a node as logically deleted). The ABA problem appears when a node is deleted, its memory reused, and the same address appears in a different part of the tree. This can break the mark‑and‑splice algorithm.

7. Formal Verification Challenges

Proving that a lock‑free data structure is correct is notoriously difficult. The classic proof of the Treiber stack assumes no memory reuse. Once we introduce real memory management, the proof becomes much more complex. Model checkers like Spin and TLA+ can explore state spaces, but they often abstract away memory allocation or assume unlimited memory. Real‑world verification requires tools like VCC, VerCors, or KIV that can reason about separation logic and dynamic memory.

One approach is to prove that the memory reclamation scheme prevents ABA: e.g., that hazard pointers ensure a thread never reads a node that has been freed. Formal proofs of such schemes exist (e.g., for hazard pointers in the context of the Michael‑Scott queue). However, they are not trivial and often require a PhD‑level understanding.

For most practitioners, the pragmatic path is to rely on well‑tested libraries (e.g., concurrentqueue from moodycamel) rather than inventing your own lock‑free data structures from scratch.

8. Modern Lock-Free Programming in C++ and Rust

C++ Standard Atomics

C++11 introduced a powerful memory model and atomic operations. With std::atomic and std::atomic_compare_exchange_weak, you can write portable lock‑free code – but the ABA problem remains your responsibility. C++ does not provide a standard tagged pointer, but you can implement one using std::atomic<uint128_t> on platforms that support it (available via __uint128_t in GCC/Clang). For portability, you can fall back to a spinlock when double‑width CAS isn’t supported.

C++20 added std::atomic_ref and improved support for atomic shared pointers, but the standard still lacks a hazard pointer library. However, there are proposals for std::hazard_pointer (likely in C++26). Until then, you can use third‑party implementations like the one in Folly or libcds.

Rust’s Memory Model

Rust’s ownership model prevents many memory safety bugs by construction. However, when writing lock‑free code using std::sync::atomic::AtomicPtr, you can still have dangling pointers and ABA because Rust does not statically prevent memory reuse in concurrent contexts. The unsafe block is required for any shared mutation, and the programmer must enforce correctness.

Rust’s crossbeam library provides epoch‑based reclamation (EBR) and hazard pointers. For example, crossbeam_epoch allows manipulating atomic pointers while guaranteeing that nodes are not freed while they are still reachable. The combination of Rust’s strong type system and crossbeam’s reclamation lets you write lock‑free data structures with high confidence.

use crossbeam_epoch::{self as epoch, Atomic, Owned};
use std::sync::atomic::Ordering;

struct TreiberStack<T> {
    head: Atomic<Node<T>>,
}

struct Node<T> {
    data: T,
    next: Atomic<Node<T>>,
}

impl<T> TreiberStack<T> {
    pub fn push(&self, t: T) {
        let guard = &epoch::pin();
        let new_node = Owned::new(Node {
            data: t,
            next: Atomic::null(),
        });
        loop {
            let head = self.head.load(Ordering::Relaxed, guard);
            new_node.next.store(head, Ordering::Relaxed);
            if self.head.compare_exchange_weak(
                head, new_node, Ordering::Release, Ordering::Relaxed, guard
            ).is_ok() {
                break;
            }
        }
    }

    pub fn pop(&self) -> Option<T> {
        let guard = &epoch::pin();
        loop {
            let head = self.head.load(Ordering::Acquire, guard);
            if head.is_null() {
                return None;
            }
            let next = unsafe { head.deref().next.load(Ordering::Relaxed, guard) };
            if self.head.compare_exchange_weak(
                head, next, Ordering::AcqRel, Ordering::Relaxed, guard
            ).is_ok() {
                let data = unsafe { (*head.as_raw()).data.take().unwrap() };
                unsafe { guard.defer_destroy(head); }
                return Some(data);
            }
        }
    }
}

Note that compare_exchange_weak uses the guard to publish a hazard pointer for the head read, preventing it from being reclaimed. This effectively solves ABA by ensuring that the address of the old head cannot be reused while this thread holds a reference.

9. The Philosophical Lesson: Why Lock‑Free is Hard

The ABA problem reveals a fundamental tension in lock‑free programming: we want to use the smallest possible atomic granularity (a single CAS) to ensure liveness, but that granularity is too coarse to see the full history of modifications. We are trying to reason about a system where the only observable state is the value at a memory address, but that value is a bit pattern with no history. The ghost of ABA reminds us that concurrency is not just about visibility, but about identity.

Every lock‑free data structure must grapple with the question: “How do I know that the value I just read is still the same object, not a reincarnation?” The answer is always to augment the value with more information (tags) or to protect the lifetime of the object (hazard pointers, RCU, EBR). There is no free lunch.

Moreover, the performance benefits of lock‑free are often overestimated. Under high contention, CAS‑based loops can degrade due to retries and memory contention. Some studies show that a well‑designed spinlock can outperform lock‑free stacks for short critical sections. Lock‑free is not a performance panacea; it is a correctness property (freedom from deadlock, availability in real‑time systems). The ABA problem is one of the many pitfalls that make lock‑free programming a discipline for experts.

10. Conclusion: Exorcising the Ghost

The Treiber stack is an elegant data structure that epitomises the beauty of lock‑free programming. But as we have seen, its correctness hinges on an assumption that real systems cannot provide: that memory addresses are eternal and unique. The ABA problem – the ghost in the machine – can silently corrupt the stack when memory is recycled.

We have explored the ghost’s nature, watched it strike in gory detail, and learned how to exorcise it with tagged pointers, hazard pointers, epoch‑based reclamation, and the protective armour of languages like Rust. Each solution introduces its own complexities: hardware requirements, overhead, and concurrency reasoning.

As you venture into the world of lock‑free data structures, remember that the ghost never truly disappears. It merely waits for a moment of inattention – a missed tag increment, a too‑early free, a lazy memory ordering – to strike again. The only defence is rigorous thinking, formal verification where possible, and a healthy respect for the subtlety of concurrent memory.

So, the next time you fire up your editor to write a lock‑free stack, pause. Consider the ghost. And then choose your tools wisely – because your lock‑free stack isn’t (yet) correct without them.