Back to docs

Concurrency primitives

Compare-and-swap loops

Compare-and-swap (CAS) loops are the bedrock of lock-free programming. Instead of acquiring a mutex, a thread reads a value, computes the next state, and atomically swaps it in only if nothing else has touched it. When the swap fails, the thread loops and tries again with the freshly observed value. Done right, CAS loops scale across cores; done wrong, they livelock, leak ABA bugs, or burn the CPU.

1. The basic shape

Every CAS loop has the same four moves: load the current value, derive a candidate next value, attempt the atomic exchange, and on failure refresh the expected value and retry. The hardware guarantees that exactly one writer wins per cache-line generation, so progress is global even if any single thread starves.

2. Reference implementation

Below is a saturating counter that increments until it reaches a target. We use compare_exchange_weakbecause spurious failures are cheap on ARM and let the compiler emit a tighter LL/SC sequence. The expected value is passed by reference and updated for us on failure, so we never re-load explicitly inside the hot path.

// Lock-free counter using a CAS loop
#include <atomic>

std::atomic<long> counter{0};

long increment_until(long target) {
  long current = counter.load(std::memory_order_relaxed);
  for (;;) {
    if (current >= target) return current;
    long next = current + 1;
    // compare_exchange_weak updates 'current' on failure
    if (counter.compare_exchange_weak(
          current, next,
          std::memory_order_acq_rel,
          std::memory_order_relaxed)) {
      return next;
    }
    // Spurious failure or contention: retry with refreshed 'current'
  }
}

3. Pitfalls and mitigations

  • ABA: a value can change from A to B and back to A between your load and your CAS. Tag pointers or use hazard pointers when the protected datum is a freelist node.
  • Livelock: under heavy contention a CAS loop can spin without making global progress. Mix in an exponential backoff or a pause hint.
  • Memory order: prefer acq_rel on success and relaxed on failure unless you are publishing data through the same cache line.