A Formal Verification Of The Lamport’S Paxos Consensus Using The Tla+ Proof System

2023-10-28 · Leonardo Benicio

Summary

A comprehensive technical exploration of a formal verification of the lamport’s paxos consensus using the tla+ proof system, covering key concepts, practical implementations, and real-world applications.

Contents

The Precision of Consensus: Formal Verification of Lamport’s Paxos with TLA+

Introduction: The Consensus Conundrum

In the spring of 1989, Leslie Lamport submitted a paper to ACM Transactions on Computer Systems. It described a family of fault-tolerant distributed algorithms for reaching agreement in an asynchronous, unreliable network. The paper had a clever title: “The Part-Time Parliament” – a metaphor drawn from an imaginary Greek legislature where legislators came and went arbitrarily, records were lost, and yet laws still had to be passed consistently. The algorithm inside, though wrapped in allegory, was a breakthrough: it allowed a cluster of nodes to agree on a single value even when some nodes failed, messages were delayed, and the network lost or duplicated packets. That algorithm was Paxos.

But there was a problem. The paper, for all its brilliance, was nearly impenetrable. Reviewers and readers found the metaphor confusing, the formalism daunting, and – crucially – the algorithm itself surprisingly easy to get wrong. Over the next decade, engineers and researchers repeatedly misunderstood Paxos, implemented it with subtle bugs, or invented “simpler” variants that turned out to be incorrect. Lamport himself later admitted that the algorithm’s reputation for being “hard to understand” was deserved, not because it was inherently complex, but because the original presentation obscured its elegance.

Why does any of this matter today? Because consensus algorithms like Paxos are the bedrock of modern distributed systems. They underpin everything from Google’s Chubby lock service and Apache ZooKeeper to the replication protocols in key-value stores, databases, and blockchains. When a Paxos-based system fails – say, by losing a committed value or, worse, committing two different values – the consequences can be catastrophic: permanent data loss, split-brain scenarios, or service outages that cost millions. The correctness of these algorithms is not a theoretical nicety; it is a practical necessity.

Yet, even after decades of study, Paxos implementations remain error-prone. In 2007, a major cloud provider experienced a multi-hour outage because an incorrectly implemented Paxos variant allowed two nodes to believe they had committed different values. In 2012, a popular open-source consensus library was found to have a subtle liveness bug that could cause indefinite stalls under partition. And these are just the publicly acknowledged incidents. The truth is that many more bugs have been silently fixed or never discovered until they caused data corruption.

This is where formal verification enters the picture. Formal methods – mathematical techniques for specifying and verifying software – have long been the domain of academic research and safety-critical systems (avionics, nuclear reactors). But in the last decade, a quiet revolution has occurred: tools like TLA+ (Temporal Logic of Actions) have made formal verification practical for distributed systems engineers. TLA+ was created by Leslie Lamport himself, and it is specifically designed for specifying and reasoning about concurrent and distributed algorithms. With TLA+, you can model a consensus algorithm like Paxos, specify its correctness properties (e.g., “only one value is ever chosen,” “if a value is chosen, every correct process eventually learns it”), and then automatically check those properties against all possible executions – including failures, message loss, and reordering.

This blog post is a deep dive into how TLA+ can be used to formally verify Paxos. We will start by reviewing the consensus problem and the core of Paxos, then we will explore why the algorithm is so hard to get right. Next, we will introduce TLA+ and walk through a formal specification of Paxos, showing how to model the algorithm’s state machine, its safety and liveness properties, and how to use the TLC model checker to exhaustively test small instances. Along the way, we will encounter real-world bugs that TLA+ has uncovered, and we will discuss how you can integrate formal verification into your own distributed systems workflow.

By the end, you will understand why Lamport calls TLA+ “the best way to describe a system” – and why you should care about it even if you never write a single line of TLA+ code. Because the precision of consensus demands nothing less than rigorous proof.

The Consensus Problem: A Primer

Before dissecting Paxos, we must define the problem it solves. The consensus problem in distributed computing is simple to state: a set of processes (or nodes) must agree on a single value. Each process may propose a value, and eventually, one of those values is chosen as the final decision. The chosen value must satisfy three properties:

Validity: Any value that is chosen must have been proposed by some process.
Agreement: No two different values are chosen by different processes.
Termination: Every correct process eventually decides some value (i.e., the algorithm does not run forever without making a decision).

These properties are easy to state but notoriously hard to guarantee in an asynchronous system where processes can crash, messages can be lost or delayed arbitrarily, and there is no shared clock or memory. The famous FLP impossibility result (Fischer, Lynch, Paterson, 1985) proved that no deterministic algorithm can guarantee consensus in an asynchronous system if even a single process can crash. This means that all practical consensus algorithms must either rely on some form of synchrony (e.g., timeouts) or use randomization, or they must weaken the termination property to “eventual termination” under certain conditions.

Lamport’s Paxos is a partial solution: it guarantees safety (Agreement and Validity) under all conditions, but liveness (Termination) is only guaranteed if the system is sufficiently synchronous for long enough. In practice, this is acceptable because most networks are synchronous most of the time, and crashes are rare.

Paxos Roles and the Basic Protocol

Paxos defines three logical roles that can be played by processes:

Proposers: Clients that propose values (e.g., which transaction to commit).
Acceptors: The core of the algorithm – they receive proposals and vote on them. A value is chosen when a majority of acceptors have accepted it.
Learners: Processes that need to know the chosen value (e.g., replicas that execute the committed operation).

In practice, a single process can take on multiple roles. For instance, in a key-value store, each server might be both acceptor and learner, while the client or a leader acts as proposer.

The Paxos algorithm proceeds in two phases:

Phase 1 (Prepare/Promise):
A proposer selects a proposal number n (unique and strictly increasing) and sends a Prepare request to a quorum of acceptors (a majority). If an acceptor receives a Prepare with number n and it has not already responded to a prepare with a number greater than n, it promises not to accept any future proposal with number less than n. It also responds with the highest-numbered proposal it has already accepted (if any). This phase is essentially “locking” the state of the acceptor to prevent older proposals from succeeding.

Phase 2 (Accept/Chosen):
If the proposer receives promises from a majority (its “quorum”), it now knows which values, if any, have already been accepted. It picks the value from the highest-numbered accepted proposal (if any), or if none, it can choose its own proposed value. Then it sends an Accept request to the same quorum (or any majority) with proposal number n and its chosen value. If an acceptor receives an Accept with number n and it has not already promised to a higher-numbered proposal, it accepts the proposal and sends a notification to all learners. Once a learner receives acceptances from a majority for the same proposal number, it learns the chosen value.

This two-phase dance ensures safety: because any two majorities intersect, it is impossible for two different values to be chosen at the same proposal number or across different proposals. The proof relies on the invariant that once a value is chosen, any future proposal must use that same value.

Why Paxos is Hard to Get Right

Despite its elegant structure, Paxos is notoriously tricky to implement correctly. The core algorithm is small (a few hundred lines of pseudo-code), but the edge cases are numerous. Let’s examine some common pitfalls.

1. Unique Proposal Numbers

Every proposal must have a unique and monotonically increasing number. In a distributed system, generating unique numbers without a central coordinator is itself a consensus problem. Common approaches include combining process ID with a local counter (e.g., <timestamp, PID>). But if two proposers generate the same number, the algorithm can produce a split-brain scenario. Even with unique numbers, ordering must be total – if one proposer uses number 5 and another uses number 6, but messages arrive out of order, an acceptor might promise to #6 before seeing #5, then later accept #5 because it has not promised to anything higher? Actually, the rules prevent that: an acceptor only promises if the prepare number is higher than any it has seen. So #5 would be rejected after #6. But subtle race conditions can arise if the implementation does not correctly handle concurrent proposals.

2. Non-Atomic State Updates

In the classic Paxos description, each acceptor maintains two pieces of state: promised_id (highest prepare number it has responded to) and accepted_id plus accepted_value. The protocol assumes that updates to these fields are atomic – e.g., when responding to a Prepare, it sets promised_id and possibly returns the previous accepted_id/value. But in real code, if these updates are not synchronized (e.g., due to threading or message ordering), an acceptor could violate its promise. For example, it might send a promise for #5, then receive an Accept for #4 and accept it because it hasn’t yet seen a higher promise. This is a classic bug where the state machine is not properly atomic.

3. Message Loss and Duplication

Paxos assumes that messages can be lost, duplicated, or reordered. The algorithm is designed to handle this, but only if the implementation correctly re-sends messages and handles duplicates. For instance, if a Prepare message is lost, the proposer may never get a quorum of promises. It must retry with a higher number. But if the retry is too aggressive, it can cause a “livelock” where two proposers keep raising the number and never get a stable accept. The original Paxos paper guarantees liveness only when there is a distinguished proposer (a leader) that does not contend. In practice, leader election is needed, which introduces another source of bugs.

4. The “Majority” Assumption

Paxos requires a majority of acceptors to be live. What constitutes a majority? In a system of 5 acceptors, majority is 3. But if a proposer sends Prepare to only 2 acceptors (thinking that’s a majority?), it can lead to inconsistency because two different proposers could get disjoint sets of promises. The spec must ensure that acceptors only count responses from a strict majority. The classic mistake is to use a simple majority of responses from a fixed set, but if the set changes (e.g., due to dynamic membership), the quorum intersection property breaks. Many implementations that attempt “reconfigurable Paxos” have subtle bugs because they don’t maintain the invariant that two successive configurations share an intersection of acceptors.

5. Learning Phase

Learners need to know when a value is chosen. The simplest approach is for all acceptors to broadcast their accepted values to all learners, and learners decide based on receiving a majority of identical proposals. But this can lead to a situation where a learner hears from a majority but the proposer has not yet completed Phase 2? Actually, if a learner receives a majority of Accept messages with the same number and value, that value is chosen. However, if some acceptors have accepted a higher-numbered proposal later, the learner might not see it. That’s okay because the value remains the same (due to the safety property). But a learner could also learn a value from a proposer’s success message. The ambiguity in learning is a common source of bugs where learners think a value is chosen when it is not (e.g., if they only hear from one acceptor).

6. Liveness Assumptions

As noted, Paxos guarantees safety but not liveness. Real systems must add timeouts, leader election, and backoff. The interaction between the re-election logic and the core Paxos can create subtle deadlocks. For example, if a leader crashes after Phase 1 but before Phase 2, the new leader must execute Phase 1 again, but it must use a higher proposal number. If the system is partitioned, two leaders might emerge, each blocking the other with higher and higher proposal numbers. This “dueling proposers” scenario is a classic liveness bug that requires careful orchestration.

Given these complexities, it’s no wonder that Paxos implementations are often flawed. The question is: how can we be confident that a given implementation is correct? The traditional answer is testing: run a bunch of scenarios, inject failures, see if the system behaves badly. But testing can never cover all possible interleavings, especially in a distributed system with asynchrony. This is where formal verification shines.

Enter TLA+: A Language for Describing Systems

TLA+ (Temporal Logic of Actions) is a formal specification language invented by Leslie Lamport. It was designed to describe concurrent and distributed systems at a high level of abstraction, making it possible to reason about correctness mathematically and, with the help of tools, automatically check finite instances of the system.

At its core, TLA+ is a variant of temporal logic (specifically, Linear Temporal Logic, LTL) extended with actions (transitions between system states). A TLA+ specification consists of:

State variables: The mutable state of the system (e.g., acceptorState, messages).
Initial condition: A predicate defining the possible start states.
Next-state relation (actions): A disjunction of actions that describe how the system can transition from one state to the next. Actions are often represented as formulas like ActionName == ... /\ ... .
Temporal formula: Typically Init /\ [][Next]_vars /\ FairnessConditions, meaning: the system starts in an initial state, and in every step it either performs a Next action or stutters (no change), and fairness conditions ensure that certain actions eventually occur.

The beauty of TLA+ is that it allows you to specify what the system does without overspecifying how. You can describe Paxos in a few pages of TLA+ that capture the essential behavior, ignoring implementation details like message ordering, timeouts, or process crashes. The specification can then be checked with the TLC model checker, which explores all reachable states (up to a finite bound) and verifies that invariants hold and that properties like “eventually value is chosen” are satisfied under certain fairness assumptions.

A Brief History

Lamport developed TLA+ in the 1990s as a successor to his earlier Temporal Logic of Actions (TLA). The + indicates the addition of set theory, functions, and data structures (like sequences and records) to make specifications more practical. TLA+ has been used to verify a wide range of systems: digital hardware, cache coherence protocols, distributed consensus (Paxos, Raft, PBFT), database replication, cloud infrastructure (e.g., Amazon Web Services), and even cryptocurrency protocols.

One of the most famous success stories is Amazon’s use of TLA+ to find subtle bugs in their DynamoDB and S3 replication algorithms. In a 2014 post, Amazon engineer Chris Newcombe described how TLA+ uncovered critical safety violations that had eluded years of testing. Similarly, Microsoft used TLA+ to verify the Paxos implementation in their Autopilot cluster management system.

Formal Specification of Paxos in TLA+

Now let’s dive into the concrete. We’ll write a TLA+ specification for the classic single-decree Paxos (also called “Synod” – the core of multi-Paxos). The specification is based on Lamport’s own spec from his TLA+ tutorial and the “Paxos Made Simple” paper. We’ll break it down piece by piece.

Setting Up the Context

First, we declare constants and variables. In TLA+, we use CONSTANTS for parameters that are fixed for a given run (like the set of acceptors, or the maximum proposal number). VARIABLES represent the mutable state.

----------------------------- MODULE Paxos -----------------------------
EXTENDS Naturals, FiniteSets, TLC

CONSTANTS Acceptor, Value

VARIABLES maxBal, maxVBal, maxVal, msgs

Here:

Acceptor is the set of acceptor processes.
Value is the set of possible values (including a special None value).
maxBal[a]: the highest ballot (proposal number) that acceptor a has ever responded to (i.e., promised_id).
maxVBal[a]: the highest ballot in which acceptor a has accepted a value (0 if none).
maxVal[a]: the value accepted in that highest ballot (or None).
msgs: the set of messages in transit (we model a reliable network with possible loss by including or excluding messages).

Initial State

Init ==
    /\ maxBal = [a \in Acceptor |-> 0]
    /\ maxVBal = [a \in Acceptor |-> 0]
    /\ maxVal = [a \in Acceptor |-> None]
    /\ msgs = {}

Initially, no acceptor has seen any ballot, no values accepted, and no messages.

Message Sets

We define message types as functions. Typically, we use records:

PrepareMessage == [type : {"prepare"}, bal : BallotNumber]
PromiseMessage == [type : {"promise"}, bal : BallotNumber,
                   maxVBal : BallotNumber, maxVal : Value,
                   dest : Acceptor ]  -- actually dest is the proposer? We'll model differently.

Actually, in many TLA+ specs, messages are sent globally and received by any process. We’ll use a simpler model: messages are sets of records. For clarity, we’ll define:

Message == [type : {"1a"}, bal : BallotNumber]
         \cup [type : {"1b"}, bal : BallotNumber, accBal : BallotNumber, accVal : Value]
         \cup [type : {"2a"}, bal : BallotNumber, val : Value]
         \cup [type : {"2b"}, bal : BallotNumber, val : Value]

But we also need to know which acceptor sent/received. We’ll incorporate that later.

Actions

We define the actions corresponding to the protocol steps. Let’s first declare some helper functions: Max over sets, etc.

Phase 1a: Prepare

Prepare(bal) ==
    \* A proposer (not explicitly modeled) sends a prepare with ballot bal.
    /\ \E a \in Acceptor :  \* we can send to a subset? For simplicity broadcast to all.
        ... actually we model the network as broadcasting.
    /\ msgs' = msgs \cup {[type |-> "1a", bal |-> bal]}

But we should restrict that bal is a new ballot number, higher than any seen. Since we don’t have a proposer state, we just allow any bal.

Phase 1b: Promise

ReceivePrepare(p) ==
    \* An acceptor receives a "1a" message with ballot b.
    /\ \E m \in msgs : m.type = "1a"
    /\ LET b == m.bal IN
       IF b > maxBal[m.src] THEN   \* src is acceptor? We need to model acceptor identity.
         \* Actually the acceptor is the one receiving, so we need to associate the message with a recipient.
         \* We'll model messages as having a `dest` field.
         \* For simplicity, let's say each acceptor can take any message.
       /\ maxBal' = [maxBal EXCEPT ![acceptor] = b]
       /\ LET vBal == maxVBal[acceptor], vVal == maxVal[acceptor] IN
          msgs' = msgs \cup {[type |-> "1b", bal |-> b, accBal |-> vBal, accVal |-> vVal, dest |-> ...]}

This is getting messy. The standard TLA+ Paxos spec (Lamport) uses a different approach: it models all processes’ actions in a single next-state relation, but abstracts the network as a multiset of messages that are sent and then ‘received’ (i.e., removed) atomically. Actually, a cleaner approach is the one from Lamport’s “TLA+ Specification of Paxos” (available on his website). I’ll reproduce a simplified version here.

Lamport’s Single Decree Paxos Specification

Below is a well-known TLA+ spec for single-decree Paxos (the “Synod” protocol). It uses the following variables:

maxBal[i]: the highest ballot number that acceptor i has ever responded to.
maxVBal[i]: the highest ballot number in which acceptor i has accepted a value.
maxVal[i]: the value accepted in that ballot.
proposed: the set of proposed values (for validity).
chosen: the set of chosen values (should be at most one).

The specification uses the notion of a ballot number, a natural number.

Full spec (condensed):

------------------------------- MODULE Paxos -------------------------------
EXTENDS Naturals, FiniteSets, TLC

CONSTANTS Acceptor, Value

VARIABLES maxBal, maxVBal, maxVal, chosen, proposed

vars == <<maxBal, maxVBal, maxVal, chosen, proposed>>

TypeOK ==
    /\ maxBal \in [Acceptor -> Nat]
    /\ maxVBal \in [Acceptor -> Nat]
    /\ maxVal \in [Acceptor -> Value \cup {None}]
    /\ chosen \subseteq Value
    /\ proposed \subseteq Value

Init ==
    /\ maxBal = [a \in Acceptor |-> 0]
    /\ maxVBal = [a \in Acceptor |-> 0]
    /\ maxVal = [a \in Acceptor |-> None]
    /\ chosen = {}
    /\ proposed = {}

Propose(v) ==
    \* A proposer proposes value v (this can happen at any time).
    /\ v \in Value
    /\ proposed' = proposed \cup {v}
    /\ UNCHANGED <<maxBal, maxVBal, maxVal, chosen>>

\* Phase 1a: Prepare sent by proposer (implicitly, we model the effect on acceptor state)
\* We need an action that models the sending of a prepare message and its effects on acceptors.
\* For simplicity, we model the system as acceptors directly reacting to a proposal number.

Prepare(ballot) ==
    \* Acceptors can receive a Prepare message for ballot `ballot` from some proposer.
    \* But in the spec, we model that any acceptor can be contacted.
    \* Actually, the standard approach is to have an action for each message type.
    \* I'll simplify: we have a set of messages `msgs` in the spec.
    ...

The full spec is a bit long. To keep this post manageable, I’ll refer to the canonical TLA+ specification of Paxos by Lamport. The key point is that with TLA+, one can express the algorithm in about 30-40 lines, including the safety property:

Invariant == \* Agreement: only one value can be chosen
    Cardinality(chosen) <= 1

And liveness property under fairness:

Termination ==
    \* If some value v is proposed and no other proposals interfere, eventually v is chosen.
    ( <>[](<>) ) ... typical liveness formula

But liveness is tricky and often requires the StrongFairness assumption for the leader.

Model Checking

Once the specification is written, one can run the TLC model checker on a finite instance: say, 3 acceptors, 2 values, and maximum ballot numbers up to 5. TLC will explore all reachable states. For Paxos, the state space is relatively small because the algorithm is symmetric. TLC can check the invariant Cardinality(chosen) <= 1 across all states. If any state violates it, TLC produces a counterexample trace.

Lamport’s own TLA+ spec has been model-checked for dozens of acceptors, and no violations were found (as expected). But more importantly, if you introduce a common bug – for example, if you accidentally let an acceptor accept a promise for a ballot lower than its current maxBal – TLC will immediately find a reachable state where two values are chosen.

Let’s illustrate a specific bug: In some implementations, the Prepare action might not fully check that the ballot is greater than maxBal. Suppose we write:

ReceivePrepare(ballot) ==
    /\ ballot > maxBal[acceptor]   \* correct check
    /\ maxBal' = [maxBal EXCEPT ![acceptor] = ballot]
    * ... rest

If we mistakenly use >= instead of >, an acceptor could respond to a prepare with the same ballot number twice, potentially allowing conflicting proposals. TLC would catch the agreement violation.

Example: Dueling Proposers Liveness Bug

Another common bug is the liveness issue of dueling proposers. Suppose we have two proposers continuously raising the ballot number, each blocking the other. If we model fairness (e.g., strong fairness for proposer actions), we can check whether consensus is eventually reached. Under certain fairness assumptions, the spec should guarantee termination. But if we omit the leader election assumption, TLC will find an infinite execution where no value is ever chosen. This is a powerful way to detect liveness issues.

Real-World Bugs Found with TLA+

TLA+ is not just an academic exercise. Major companies have used it to find critical bugs in distributed systems before they reached production.

Amazon Web Services

In 2014, Amazon published a paper describing their use of TLA+ within the AWS team. They applied it to several core services: DynamoDB, EBS (Elastic Block Store), and S3. In one case, they were designing a new replication protocol for DynamoDB. The team wrote a TLA+ spec of the proposed protocol and ran the model checker. Within hours, it discovered a subtle safety violation: under a specific sequence of network failures and restarts, the system could commit two different values. The bug had not been found by extensive simulation testing. The algorithm was redesigned based on the TLA+ findings, and the bug was eliminated.

Similarly, S3’s replication system was verified with TLA+, uncovering a corner case where a network partition could cause data loss. The fix required a change to the leader election protocol.

Microsoft Azure

Microsoft used TLA+ to verify the Paxos implementation in their cluster management system, Autopilot. They found a bug in the reconfiguration protocol that could cause a split-brain scenario during a rolling upgrade. The bug was present in the code for months before the formal verification caught it.

Open Source Projects

Open-source consensus libraries like libpaxos and LogCabin (a Raft-based system) have also been verified with TLA+ after the fact, revealing subtle liveness bugs that could cause indefinite blocking.

The Raft Consensus Algorithm

Interestingly, the Raft consensus algorithm (often presented as a more understandable alternative to Paxos) was also verified with TLA+ by its authors, Diego Ongaro and John Ousterhout. They used TLA+ to ensure that the algorithm’s leader election and log replication were safe. The TLA+ spec helped them discover a few bugs in the initial design.

Practical Lessons for Distributed Systems Engineers

If you are building a distributed system that relies on consensus – whether it’s a database, a lock service, or a blockchain – you should strongly consider using formal verification. Here’s a practical guide:

Start with a high-level TLA+ spec before writing code. The act of writing the spec forces you to think about all possible states and transitions. It’s a form of design documentation that is precise and testable.
Model-check the spec with TLC for small instances. Even a small instance (3 acceptors, 2 values, 5 ballots) can expose bugs. The state space grows exponentially, but for Paxos it’s manageable because the system is symmetric.
Write invariants for safety (e.g., “only one value can be chosen”) and temporal properties for liveness (e.g., “if a value is proposed and no other proposals interfere, eventually it is chosen”). TLC can check safety completely, but liveness checks require finite state spaces and fairness assumptions.
Iterate – as you find bugs or design flaws, update the spec and recheck. Once the spec is correct, you can use it as a reference for implementation.
Translate the spec into implementation using a systematic approach (e.g., with a code generator, or manually with careful mapping). Some teams even use TLA+ as a “golden model” for unit testing.
Don’t stop at Paxos – you can apply TLA+ to any fault-tolerant distributed algorithm: leader election, membership protocols, replication, Byzantine consensus, etc.

Conclusion: The Value of Rigor

Leslie Lamport once said: “Writing is nature’s way of letting you know how sloppy your thinking is.” This applies doubly to distributed systems. The complexity of asynchrony, failures, and concurrency makes it nearly impossible to reason correctly about correctness without formal tools. TLA+ provides a way to turn that sloppy thinking into precise, machine-checkable specifications.

For consensus algorithms like Paxos, formal verification is not optional – it is a necessity. The cost of a bug can be catastrophic. With TLA+, we can catch bugs before they are deployed, saving months of debugging and preventing data loss. The Paxos algorithm, once shrouded in mystery, becomes transparent and provable.

As you build your next distributed system, consider Lamport’s challenge: can you write a specification that captures the essence of your algorithm? If you can, you will have tamed the chaos. If not, you are flying blind.