Scaling Eventual Consistency With Crdts: A Practical Implementation Of State Based And Operation Based Types

2019-01-28 · Leonardo Benicio
Summary

A comprehensive technical exploration of scaling eventual consistency with crdts: a practical implementation of state based and operation based types, covering key concepts, practical implementations, and real-world applications.

Contents

CRDTs: The Mathematical Magic Behind Conflict-Free Distributed Systems

1. Introduction: When Consistency Becomes a Scaling Bottleneck

Imagine you’re building the next global collaborative document editor—think Google Docs, but for millions of users editing the same file in real time, across continents, with no central server that can afford to pause for a globally ordered transaction. Or perhaps you’re designing a distributed key-value store that must survive network partitions, yet still allow every node to accept writes immediately. How do you ensure that every replica eventually agrees on the final state, without forcing users to wait for a consensus protocol that might take hundreds of milliseconds—or worse, become unavailable when a datacenter goes dark?

This is the core tension that has driven a decade of innovation in distributed systems: the trade-off between consistency and availability, famously captured by the CAP theorem. For applications that prioritize availability and partition tolerance, the classical answer has been eventual consistency—a promise that, given enough time without new updates, all replicas will converge to the same value. But “eventually” is a notoriously vague word. Without careful design, eventual consistency can lead to lost updates, conflicting writes, and data that never actually converges.

The Classic Counterexample

Consider a distributed file system where two users simultaneously rename the same file. User A renames thesis.pdf to final_thesis.pdf, while User B renames the same file to thesis_draft.pdf. With a last-writer-wins (LWW) strategy based on timestamps, one user’s change silently overwrites the other’s. The losing user might never know their rename was discarded. This violates the semantic expectation that both renames should be preserved or at least merged intelligently.

CRDTs solve this problem by mathematically guaranteeing convergence under any sequence of concurrent updates, as long as the underlying network eventually delivers and processes all updates (usually via a reliable gossip protocol). They achieve this without requiring any central coordination, locking, or expensive consensus rounds. Instead, CRDTs encode conflict resolution directly into the data type’s operations or state, so that convergence emerges from algebraic properties—specifically, the properties of commutative monoids and monotonic semilattices.

Why CRDTs Matter Today

CRDTs have moved from academic curiosity to production-critical infrastructure. They power:

  • Collaborative editing in tools like Google Docs, Notion, and Figma
  • Distributed databases like Redis (with CRDT-based conflict resolution in Redis Enterprise)
  • Peer-to-peer systems like Matrix (decentralized messaging)
  • Edge computing where mobile devices must synchronize data with minimal latency
  • Multiplayer games with real-time state synchronization

Understanding CRDTs is no longer optional for systems engineers—it’s becoming a core competency. In this deep dive, we’ll explore both flavors of CRDTs, dissect their mathematical foundations with practical code examples, examine real-world trade-offs, and discuss where they fall short.


2. The Two Flavors of CRDTs

CRDTs come in two primary varieties, each with distinct trade-offs. The choice between them often depends on your network characteristics, bandwidth constraints, and failure model.

2.1 State-Based CRDTs (CvRDTs)

Convergent Replicated Data Types (CvRDTs) work by merging entire replica states. Each replica maintains its local state, which includes all the information needed to resolve conflicts. When replicas communicate, they send their entire state (or a compressed delta) to other replicas. The receiving replica merges the incoming state with its own using a monotonic merge function that must satisfy three properties:

  • Commutativity: merge(A, B) = merge(B, A)
  • Associativity: merge(merge(A, B), C) = merge(A, merge(B, C))
  • Idempotency: merge(A, A) = A

These properties ensure that regardless of the order in which merges occur, all replicas eventually converge to the same state.

Example: The Grow-Only Counter (G-Counter)

A G-Counter is the simplest state-based CRDT. It’s a counter that only supports increment operations, based on a vector clock mechanism. Here’s a Rust implementation:

use std::collections::HashMap;

struct GCounter {
    // Each node's local count
    counts: HashMap<String, u64>,
    node_id: String,
}

impl GCounter {
    fn new(node_id: &str) -> Self {
        let mut counts = HashMap::new();
        counts.insert(node_id.to_string(), 0);
        GCounter {
            counts,
            node_id: node_id.to_string(),
        }
    }

    fn increment(&mut self) {
        *self.counts.get_mut(&self.node_id).unwrap() += 1;
    }

    fn value(&self) -> u64 {
        self.counts.values().sum()
    }

    fn merge(&mut self, other: &GCounter) {
        for (node, count) in &other.counts {
            let entry = self.counts.entry(node.clone()).or_insert(0);
            *entry = (*entry).max(*count);
        }
    }
}

The merge function takes the element-wise maximum of each node’s count. Because max is commutative, associative, and idempotent, convergence is guaranteed. If Node A has count {A: 5, B: 3} and Node B has {A: 4, B: 7}, merging yields {A: 5, B: 7} for a total of 12—the correct count of all increments across both nodes.

The Challenge: State Bloat

State-based CRDTs must transmit the full state on each synchronization, which can become prohibitive for large datasets. Imagine a collaborative document with 10,000 edits—every sync would require sending the entire edit history. This is where delta-state CRDTs come in, transmitting only the changes since the last sync, but we’ll return to optimizations later.

2.2 Operation-Based CRDTs (CmRDTs)

Commutative Replicated Data Types (CmRDTs) take a different approach. Instead of merging states, they broadcast operations (updates) to all replicas, relying on the causal delivery of messages—meaning that if operation A causally precedes operation B, every replica receives A before B. Under causal delivery, CmRDTs guarantee convergence if all operations commute (i.e., the order of applying them doesn’t matter).

Example: The Increment-Only Counter (CmRDT Version)

An operation-based counter sends increment messages. Since increments commute (adding 1 then 2 is the same as 2 then 1), convergence is guaranteed as long as no operations are lost. However, the devil is in the details: lost messages can cause permanent divergence.

use std::collections::VecDeque;

struct OpCounter {
    value: i64,
    pending_ops: VecDeque<Op>,
}

enum Op {
    Increment(i64),
}

impl OpCounter {
    fn increment(&mut self, amount: i64, send_fn: impl Fn(Op)) {
        self.value += amount;
        send_fn(Op::Increment(amount));
    }

    fn apply_remote(&mut self, op: Op) {
        match op {
            Op::Increment(amount) => self.value += amount,
        }
    }
}

The Critical Assumption: Exactly-Once Delivery

CmRDTs require exactly-once delivery semantics. If a message is delivered twice (duplication), the counter increments twice. If a message is lost, the counter diverges permanently. This is problematic in unreliable networks, where at-least-once delivery is common and exactly-once is expensive to guarantee.

2.3 State-Based vs. Operation-Based: A Comparison

AspectState-Based (CvRDTs)Operation-Based (CmRDTs)
Delivery requirementsAt-least-once (idempotent merge handles duplicates)Exactly-once (duplicates cause errors)
State sizeCan grow unboundedly (need compaction)Usually bounded (operations may be ephemeral)
Network bandwidthHigher (full state transfer)Lower (small operations)
ConvergenceGuaranteed under any message orderingGuaranteed under causal delivery
ComplexitySimpler to implement (no causal delivery required)Requires reliable broadcast or causal ordering

In practice, delta-state CRDTs (a hybrid) are often preferred—they transmit only the changes since the last synchronization, combining the fault tolerance of state-based with the bandwidth efficiency of operation-based. But we’ll explore those later.


3. The Mathematics Behind the Magic

CRDTs rest on some deceptively simple mathematical structures. Understanding these foundations helps you design custom CRDTs and reason about their correctness.

3.1 Monotonic Join Semilattices

A join semilattice is a partially ordered set (poset) where every pair of elements has a least upper bound (join). The join operation () is:

  • Commutative: a ⊔ b = b ⊔ a
  • Associative: (a ⊔ b) ⊔ c = a ⊔ (b ⊔ c)
  • Idempotent: a ⊔ a = a

In a monotonic join semilattice, the state only moves “upward” in the partial order. No operation can decrease the state. For example, in a G-Counter, the count for each node only increases (the maximum across replicas grows).

Why is monotonicity important? Because it mirrors the physical reality of distributed systems: you can’t undo the past. Once an increment happens, you can’t “unincrement” it without adding new information (like a vector clock or tombstone). Monotonicity ensures that merging two states always produces a state that is “at least as recent” as both.

3.2 The Lattice of Sets: Add-Wins vs. Remove-Wins

Consider a CRDT set. Do you allow adding and removing elements? If so, conflicting operations (add and remove of the same element) must be resolved. Two common approaches emerge from the lattice theory:

  • Add-Wins Set (AWS): If an element was added on any replica, it remains in the set unless all replicas have removed it. In a lattice, the add operation moves the set “upward” (adding the element), while remove attempts to move it “downward.” To maintain monotonicity, remove doesn’t actually remove—it marks the element as “removed” but keeps the tombstone.

  • Remove-Wins Set (RWS): If any replica removes an element, it’s removed. This requires tracking an ‘observe’ counter for each add/remove pair.

Implementation nuance: AWS is simpler for many applications. In a collaborative playlist, if Alice adds a song and Bob removes it concurrently, AWS would keep the song (add wins), while RWS would remove it (remove wins). There’s no “right” answer—it depends on your application semantics.

3.3 Conflict Resolution with Vector Clocks

Many CRDTs use vector clocks to track causality and resolve conflicts. A vector clock is an array of logical timestamps, one per node. When a node updates an object, it increments its own timestamp. When nodes synchronize, they compare vector clocks to determine if one update causally precedes another.

Example: Last-Writer-Wins Register (LWW-Register)

use std::cmp::Ordering;

struct LWWRegister<T> {
    value: T,
    timestamp: VectorClock,
}

impl<T: Clone> LWWRegister<T> {
    fn merge(&mut self, other: &LWWRegister<T>) {
        match self.timestamp.partial_cmp(&other.timestamp) {
            Some(Ordering::Less) => {
                // Other is more recent
                self.value = other.value.clone();
                self.timestamp = other.timestamp.clone();
            }
            Some(Ordering::Greater) => {
                // Self is more recent, no change
            }
            Some(Ordering::Equal) | None => {
                // Concurrent or identical timestamps
                // Use a deterministic tiebreaker (e.g., node ID)
                if self.timestamp < other.timestamp {
                    self.value = other.value.clone();
                    self.timestamp = other.timestamp.clone();
                }
            }
        }
    }
}

The LWW-Register is the CRDT version of “last writer wins.” But note the tiebreaker—vector clocks only provide partial ordering. For concurrent updates (incomparable timestamps), you need a deterministic rule to ensure convergence (e.g., comparing node IDs).


4. Practical CRDT Implementations

Let’s implement a few practical CRDTs from scratch to understand their inner workings.

4.1 PN-Counter (Positive-Negative Counter)

A G-Counter can only increment. To support both increments and decrements, we combine two G-Counters: one for positive values (increments) and one for negative values (decrements). The actual value is pos - neg.

struct PNCounter {
    pos: GCounter,
    neg: GCounter,
    node_id: String,
}

impl PNCounter {
    fn new(node_id: &str) -> Self {
        PNCounter {
            pos: GCounter::new(node_id),
            neg: GCounter::new(node_id),
            node_id: node_id.to_string(),
        }
    }

    fn increment(&mut self) {
        self.pos.increment();
    }

    fn decrement(&mut self) {
        self.neg.increment();
    }

    fn value(&self) -> i64 {
        self.pos.value() as i64 - self.neg.value() as i64
    }

    fn merge(&mut self, other: &PNCounter) {
        self.pos.merge(&other.pos);
        self.neg.merge(&other.neg);
    }
}

Why does this work? Each node maintains two monotonic counts. The difference between them can go up and down (a decrement increases the neg counter, decreasing the overall value). But each individual counter remains monotonic. When merging, we take element-wise maximums for both pos and neg. The total value is always consistent across replicas.

4.2 The CRDT Counter in Production: Redis

Redis Enterprise uses a CRDT-based counter for its distributed counter feature. Here’s how it works:

> SET mycounter 0
OK
> INCR mycounter
(integer) 1
> INCRBY mycounter 5
(integer) 6

Under the hood, Redis maintains a vector of per-node increments. Each INCR increments the local node’s counter. When nodes synchronize, they merge maximums. The value returned is always the sum of all per-node counts. This allows Redis to provide INCR operations with availability even during network partitions, while guaranteeing eventual consistency.

4.3 CRDT Sets: The G-Set and 2P-Set

Grow-Only Set (G-Set): Elements can only be added. The merge is set union. Simple, but doesn’t support removal.

struct GSet<T: Hash + Eq> {
    elements: HashSet<T>,
}

impl<T: Hash + Eq + Clone> GSet<T> {
    fn add(&mut self, element: T) {
        self.elements.insert(element);
    }

    fn contains(&self, element: &T) -> bool {
        self.elements.contains(element)
    }

    fn merge(&mut self, other: &GSet<T>) {
        self.elements.extend(other.elements.clone());
    }
}

Two-Phase Set (2P-Set): Supports both add and remove by maintaining a G-Set of added elements and a G-Set of removed elements. An element is in the set if it’s in the add set but not in the remove set. Once removed, it can’t be re-added (the remove is permanent). This is a remove-wins semantics.

struct TwoPhaseSet<T: Hash + Eq> {
    added: GSet<T>,
    removed: GSet<T>,
}

impl<T: Hash + Eq + Clone> TwoPhaseSet<T> {
    fn add(&mut self, element: T) {
        self.added.add(element);
    }

    fn remove(&mut self, element: T) {
        if self.added.contains(&element) {
            self.removed.add(element);
        }
    }

    fn contains(&self, element: &T) -> bool {
        self.added.contains(element) && !self.removed.contains(element)
    }

    fn merge(&mut self, other: &TwoPhaseSet<T>) {
        self.added.merge(&other.added);
        self.removed.merge(&other.removed);
    }
}

Limitations of 2P-Set:

  • Once removed, an element can never be re-added (the remove G-Set records the tombstone forever).
  • Memory grows unboundedly as tombstones accumulate.
  • Concurrent add and remove: if one replica adds while another removes, the remove wins (permanent removal).

For more flexible sets, we need Observed-Remove Set (OR-Set) or LWW-Element Set, which handle re-adding and eventual tombstone cleanup.

4.4 OR-Set (Observed-Remove Set)

The OR-Set addresses the limitations of 2P-Set by tracking unique tags for each add operation. When you remove an element, you only remove the specific tags you’ve observed. If a concurrent add introduces new tags, those tags survive.

use std::collections::{HashMap, HashSet};

struct ORSet<T: Hash + Eq> {
    // Element -> Set of unique tags
    elements: HashMap<T, HashSet<String>>,
}

impl<T: Hash + Eq + Clone> ORSet<T> {
    fn add(&mut self, element: T, tag: String) {
        self.elements.entry(element).or_default().insert(tag);
    }

    fn remove(&mut self, element: &T) {
        self.elements.remove(element);
    }

    fn contains(&self, element: &T) -> bool {
        self.elements.contains_key(element)
    }

    fn merge(&mut self, other: &ORSet<T>) {
        for (element, tags) in &other.elements {
            let entry = self.elements.entry(element.clone()).or_default();
            entry.extend(tags.iter().cloned());
        }
    }

    fn value(&self) -> Vec<T> {
        self.elements.keys().cloned().collect()
    }
}

How it resolves concurrent add/remove:

Consider two replicas:

  • Replica A: Adds element x with tag t1, then receives a remove for x
  • Replica B: Adds element x with tag t2 concurrently

When A receives B’s add, it sees tag t2 which it hasn’t removed. So x remains in the set. The OR-Set preserves both adds, even though one replica removed an earlier instance.

Production note: Generating globally unique tags requires careful coordination. Common approaches include (node_id, sequence_number) or UUIDs. Tags also need to be cleaned up eventually (garbage collection) to prevent unbounded memory growth.


5. Advanced CRDTs and Real-World Considerations

The examples above are foundational, but real-world CRDTs require more sophistication.

5.1 Delta-State CRDTs

As noted, state-based CRDTs suffer from state bloat. Delta-state CRDTs solve this by transmitting only the delta—the changes since the last synchronization. A delta is itself a CRDT state but with only the new information. The receiving replica merges the delta into its full state.

How deltas work:

/// A delta is a partial state that should be merged
struct Delta<T: CRDTState> {
    state: T,
    /// The expected base state that this delta applies to
    base_version: VersionVector,
}

impl<T: CRDTState> Delta<T> {
    fn apply_to(&self, full_state: &mut T) {
        full_state.merge(&self.state);
    }
}

Deltas dramatically reduce bandwidth. Instead of sending the entire G-Counter state (all per-node counts), you send only the local node’s new count. However, you must ensure that the receiver has the right base state—otherwise, applying the delta out of order could cause divergence. This requires causal delivery of deltas or some other ordering guarantee.

Anti-entropy with delta: The Merge Operation

In practice, systems use periodic full-state synchronization to recover from missed deltas. The delta mechanism is an optimization for the common case (live sync) while the full-state merge is a safety net.

5.2 CRDT List: The Collaborative Document Editor

Now we come to the most complex and practically important CRDT: the ordered sequence (list). This is what powers Google Docs, Notion, and Figma.

The problem: Multiple users concurrently insert and delete characters at various positions. How do we maintain a consistent ordering without a central sequence number?

The solution: RGA (Replicated Growable Array)

RGA assigns a unique identifier to each inserted character, consisting of:

  • A unique node ID (e.g., user ID)
  • A local counter (monotonically increasing)
  • Optionally, a reference to the previous character (for ordered insertion)

When a user inserts at a specific position, they generate an ID and link it to the character before the insertion point. The resulting structure is a linked list across all replicas. When merging, each replica sorts characters by their IDs using a deterministic ordering (e.g., (counter, node_id) in lexicographic order). Characters with the same ID are considered the same (idempotent insert).

Tombstones and Garbage Collection

Deleted characters become tombstones—they’re marked as removed but kept in the data structure to maintain ordering. Tombstones are essential: if Alice deletes a character, then Bob inserts after that character, Bob’s insertion needs the deleted character’s ID as a reference point. Without tombstones, Bob’s insert would be orphaned.

However, tombstones cause unbounded memory growth. Strategies to mitigate:

  • Garbage collection: When all replicas have observed a deletion, the tombstone can be removed. This requires a distributed agreement on “safe to delete” points.
  • Compaction: Periodically rebuild the document, removing tombstones and reassigning IDs to remaining characters.

Performance Considerations

A CRDT list has O(n) traversal time for each operation (to find the right position in the linked list). For a document with 100,000 characters, an insertion near the end requires walking the entire list. Optimizations include:

  • Skip lists (like in Automerge)
  • Balanced trees (like in RGA with tree indices)
  • B-trees (like in Diamond Types)

5.3 CRDT Map: The JSON Document

Modern collaborative applications work with JSON-like documents (nested maps and lists). A CRDT map supports:

  • Add/update key-value pairs
  • Delete keys
  • Nesting: Maps can contain sub-maps and lists

The Map CRDT (LWW-Map with sub-CRDTs)

A common approach uses a last-writer-wins register for each key’s value, with the value itself potentially being another CRDT. For example:

{
  "title": { "type": "text", "crdt": "RGA" },
  "author": { "type": "string", "crdt": "LWW-Register" },
  "metadata": {
    "type": "map",
    "crdt": "LWW-Map",
    "children": {
      "version": { "type": "integer", "crdt": "PN-Counter" },
      "tags": { "type": "set", "crdt": "OR-Set" }
    }
  }
}

When merging two maps, you:

  1. For each key, merge the sub-CRDTs
  2. Resolve deleted keys using tombstones (or a remove-wins policy)
  3. Handle concurrent adds of the same key: the LWW register chooses one value based on timestamp

This recursive structure allows unlimited nesting while maintaining composable convergence.


6. CRDTs in the Wild: Production Systems

6.1 Redis Enterprise CRDTs

Redis Enterprise uses CRDTs to provide active-active geo-distribution with local latency. Each Redis instance can accept writes independently. Background synchronization merges states.

Key features:

  • CRDT counters: As described earlier
  • CRDT sets: Based on OR-Set
  • CRDT hashes: LWW-Map with per-timestamp resolution
  • CRDT lists: For ordered collections

Trade-offs:

  • Memory overhead: CRDT metadata (vector clocks, tombstones) adds ~30-50% overhead to normal Redis keys.
  • Merge conflicts: In rare cases with high concurrency, users may see temporary inconsistencies until convergence.
  • Garbage collection: Tombstones are cleaned aggressively based on observed synchronization.

6.2 Automerge (JavaScript)

Automerge is a popular CRDT library for collaborative applications, used in projects like Obsidian (note-taking app) and Ditto (offline-first SDK).

Technical details:

  • Uses RGA for ordered sequences
  • Supports nested maps (JSON-like documents)
  • Implements garbage collection using a version vector approach
  • Provides op-based and state-based sync modes

Performance:

  • Automerge 1.x was written in JavaScript with WASM-based storage
  • Automerge 2.x (currently in development) rewrites the core in Rust for significant performance improvements
  • Merge time for a 10MB document is typically under 100ms

6.3 Yjs (JavaScript)

Yjs is another popular CRDT library, often compared to Automerge. It’s used in tools like Roam Research, AFFiNE, and various collaborative editors.

Differentiators:

  • Uses YATA algorithm (a different approach to ordered sequences) which provides better performance for certain workloads
  • Supports binary data and rich text with formatting
  • Network-agnostic: Plugs into WebSockets, WebRTC, or custom transports
  • Smaller memory footprint compared to Automerge in many scenarios

Performance comparison:

  • Yjs typically has 2-5x better insert performance than Automerge 1.x
  • Automerge 2.x (Rust-based) is competitive, trading blows depending on the workload
  • Both handle concurrent edits of hundreds of collaborators with sub-second convergence

6.4 CRDTs in Matrix (Decentralized Messaging)

The Matrix protocol uses CRDTs for decentralized messaging rooms. Each room has a state (who’s a member, room name, etc.) that’s maintained as a CRDT set.

How it works:

  • Room state events are CRDT inserts (add-only semantics for simplicity)
  • State resolution uses a power level algorithm (not pure CRDT, but inspired by similar principles)
  • Merge conflicts are resolved by server operators (admin intervention for rare cases)

Limitations: Pure CRDTs would solve some of Matrix’s state resolution complexities, but the protocol’s history-based approach (keeping all events) makes full CRDT adoption challenging.


7. Comparison with Other Approaches

CRDTs aren’t the only way to achieve consistency in distributed systems. Let’s compare with alternatives.

7.1 CRDTs vs. Operational Transformation (OT)

Operational Transformation is the classic approach used in Google Docs and Wave. It works by transforming operations to account for concurrent edits.

AspectCRDTsOperational Transformation
Mathematical foundationLattice theory (commutativity)Transformation functions
ComplexitySimpler to implement correctlyHard to implement (transformation functions are error-prone)
PerformanceCan be slower (list traversal)Can be faster (index-based operations)
Offline supportExcellent (any state can merge)Requires server for transformation
Concurrency modelAll replicas are equalTypically client-server

The Google Wave debacle: Google Wave’s OT implementation was famously challenging. After years of research, Google eventually moved to a CRDT-inspired approach for Google Docs. Today, most new collaborative editors choose CRDTs.

7.2 CRDTs vs. Consensus (Raft, Paxos)

Consensus protocols provide strong consistency (linearizability) but require coordination.

AspectCRDTsConsensus
Consistency modelEventualStrong (linearizable)
Availability during partitionAll replicas accept writesMinority replicas unavailable
LatencyNo coordination (local writes)Requires quorum (RTT to majority)
ComplexityMedium (merge logic)High (leader election, log replication)
Use caseCollaborative editing, counters, cachesFinancial transactions, critical state

Can CRDTs replace consensus? No—they serve different purposes. Use CRDTs when availability and low latency are paramount, and you can tolerate temporary inconsistencies. Use consensus when you need strict serializability (e.g., bank account balances).

7.3 CRDTs vs. Eventual Consistency with Last-Writer-Wins

Simple eventual consistency (LWW with timestamps) doesn’t handle concurrent updates gracefully.

AspectNaive LWWCRDT
Concurrent updatesLast clock wins (may lose data)Both updates merged (no data loss)
User experienceSilent data lossPredictable, merge semantics
ImplementationTrivial (just a timestamp)More complex (merge logic)
ScalabilityExcellentGood (merge overhead)

When to use naive LWW: When data loss is acceptable (e.g., last-read cache, non-critical aggregates) or when your data structure naturally handles overwrites (e.g., replacing a full document).


8. Advanced Topics: The Frontier of CRDT Research

CRDTs are an active research area with several open problems.

8.1 Garbage Collection of Tombstones

Tombstones are the Achilles’ heel of many CRDTs. Research into compaction algorithms includes:

  • Bounded CRDTs: Limit the number of tombstones by restricting operation history
  • Version-based GC: Track the global “safe to delete” version using distributed clocks
  • Gossip-based GC: Use epidemic protocols to agree on tombstone expiration

Practical approaches:

  • Automerge: Uses a “version vector” to track what each peer has seen; tombstones known by all peers can be removed
  • Redis Enterprise: Aggressively GCs based on synchronization intervals (assuming eventual connectivity)

8.2 CRDTs for Byzantine Environments

Standard CRDTs assume non-malicious nodes. Byzantine CRDTs (B-CRDTs) extend the concept to tolerate arbitrary faults, including malicious nodes that send malformed updates.

Challenges:

  • Ensuring merge functions are resilient to malicious (but well-formed) updates
  • Detecting and pruning invalid operations
  • Maintaining convergence even with adversarial synchronization

Current approaches:

  • BFT CRDTs: Combine Byzantine fault tolerance with CRDT semantics
  • Signed CRDTs: Use cryptographic signatures to verify operation provenance
  • Commitment-based CRDTs: Require proof of correctness (e.g., zero-knowledge proofs)

8.3 CRDTs and Blockchains: Decentralized Data with Immutable History

CRDTs and blockchains share the goal of decentralized convergence. Some projects explore combining them:

  • Holochain: Uses CRDTs as the state layer, with blockchain-like history for auditing
  • CRDT-based smart contracts: State transitions are CRDT merges, making contract execution deterministic and concurrent
  • IPFS + CRDTs: Content-addressed CRDT storage (e.g., Automerge sync over IPFS)

The synergy:

  • CRDTs provide the state management (how data converges)
  • Blockchains provide the immutable audit trail (who changed what, when)
  • Together, they enable fully decentralized applications with no central coordination

8.4 Performance Modeling and Benchmarking

Understanding CRDT performance requires careful benchmarking:

Data TypeOperationLatency (local)Merge Time (1000 ops)Memory per Op
G-CounterIncrement<1 μs<1 μs16 bytes
PNCounterInc/Dec<1 μs<1 μs32 bytes
G-SetAdd<1 μs~2 μs32 bytes + element
OR-SetAdd/Remove~2 μs~10 μs64 bytes + element + tag
RGA (List)Insert~5 μs (O(n) traversal)~50 μs (merge 1000 ops)~60 bytes per char
LWW-MapPut~1 μs~20 μs~80 bytes per key + value

Key performance trade-offs:

  • List CRDTs are the most expensive due to traversal. For documents >100KB, consider indexing optimizations (tree-based lists).
  • Set CRDTs scale well for individual operations but merge costs grow with set size.
  • Map CRDTs are dominated by the cost of sub-CRDTs within each key.

9. CRDTs and Applications: End-to-End Case Studies

9.1 Building a Collaborative Document Editor (Simplified)

Let’s walk through building a collaborative text editor using CRDTs.

Architecture:

[Client A] <--WebSocket--> [Server] <--WebSocket--> [Client B]

The server holds a state-based CRDT and periodically syncs with clients. Each client maintains a local replica.

Operations:

  • insert(position, char): Client generates a unique ID for the character and inserts it into the local RGA list
  • delete(position): Client marks the character at that position as a tombstone

Server merge:

  • The server receives delta states from each client
  • It merges them into its global state using the CRDT’s merge function
  • It broadcasts the new state (or delta) to all other clients

Conflict example:

  • Client A inserts ‘X’ at position 5 (local ID: (A, 42))
  • Client B inserts ‘Y’ at position 5 (local ID: (B, 73))
  • Both operations reach the server concurrently
  • The server merges, producing a list with both characters in a deterministic order (e.g., sorted by (counter, node_id))
  • Both clients converge to the same resulting text

Convergence guarantee:

  • All replicas see the same set of characters
  • All replicas agree on character ordering
  • Deleted characters (tombstones) remain until all replicas have observed the deletion

User experience:

  • Typing feels local (no network delay)
  • Concurrent edits are merged seamlessly
  • In rare cases, cursor positions may jump (until clients converge)

9.2 State Synchronization in Multiplayer Games

Use case: A real-time strategy game where 100 players control units on a shared map.

Traditional approach: Central server authoritatively resolves moves. Players send input → server calculates state → server broadcasts to all players. Latency: at least one RTT per move.

CRDT approach: Each player’s client maintains a local CRDT-unit state. Move commands are CRDT operations that commute (e.g., “instant move to coordinate” vs. “set velocity vector”). The server aggregates and syncs.

Challenges:

  • Determinism: Game physics must be deterministic given input history
  • Cheating: CRDTs don’t prevent malicious state modifications
  • Real-time constraints: Convergence must happen in <100ms for competitive play

Solutions:

  • Combine CRDTs with lockstep (synchronized simulation)
  • Use commitment-based CRDTs (as mentioned earlier)
  • Implement edge-side validation (server checks incoming ops for plausibility)

Real examples:

  • Valve’s Dota 2 uses a deterministic lockstep model, not CRDTs
  • Riot’s Valorant uses server-authoritative with rollback
  • Some indie games use CRDTs for decentralized state sharing (e.g., Hytale’s prototype)

9.3 Offline-First Mobile Apps

Use case: A task management app (like Trello) that works offline and syncs when online.

CRDT choice:

  • Each task list is an OR-Set
  • Task properties (title, due date, checklist) are LWW-Maps
  • Comments are RGA lists

Offline workflow:

  1. User creates tasks, edits properties, reorders lists—all locally
  2. Changes are stored as CRDT deltas in local storage
  3. When online, device syncs with cloud server (or peer-to-peer with other devices)
  4. Server merges deltas, handles conflicts (using CRDT semantics)
  5. Device receives server’s merged state and updates local replica

User experience:

  • No internet? No problem—everything works
  • Add tasks on your phone offline, edit on desktop online—they merge seamlessly
  • Concurrent edits from two users reconcile as expected (add wins, etc.)

Real examples:

  • Notion uses offline-first with CRDTs (Automerge under the hood)
  • Ditto offers a CRDT-based SDK specifically for offline-first mobile apps
  • Realm (MongoDB mobile) uses a CRDT-like approach for offline synchronization

10. Common Pitfalls and Misconceptions

Even experienced engineers make mistakes with CRDTs. Here’s what to watch for.

10.1 “Eventually Consistent” Doesn’t Mean “Eventually Correct”

CRDTs guarantee convergence to the same state, but that state may not match what a user would consider “correct” in all scenarios.

Example: In a CRDT set with add-wins semantics, concurrent add and remove of the same element results in the element surviving. If Alice adds a collaborator, Bob removes them at the exact same instant, the collaborator remains added. This might violate your application’s security model.

Solution: Choose your CRDT semantics carefully. For security-critical operations, consider using remove-wins sets or combining CRDTs with access control.

10.2 Types Don’t Prevent Semantic Conflicts

Even with perfectly implemented CRDT types, the merged state might be meaningless.

Example: Two users concurrently set a “deadline” field: Alice sets it to “2024-12-31” and Bob sets it to “2024-10-01”. The LWW-Map picks one based on timestamp. The other user’s value is lost. The application might need both deadlines merged (e.g., earliest or latest).

Solution: Understand your merge semantics. For values where losing an update is unacceptable, consider using a CRDT that tracks multiple values (e.g., a multi-value register or an OR-Set of deadlines).

10.3 Tombstone Bloat Can Crash Your System

Without proper garbage collection, tombstones grow unboundedly. In tests, OR-Set tombstones can consume 10x the memory of live data.

Real-world example: A social media feed built on CRDTs accumulated deleted post tombstones. After six months, memory usage was 50GB for 5GB of actual content. Garbage collection hadn’t been implemented.

Solution: Implement tombstone GC based on observed state across replicas, using version vectors. If you can’t guarantee all replicas will eventually be online, use bounded tombstones (e.g., maximum tombstone count, then fall back to “forget old history” with application-level reconciliation).

10.4 Network Partitions Are Not the Only Challenge

CRDTs handle network partitions gracefully, but they don’t help with:

  • Confidentiality: All replicas eventually see all data (in a non-signed CRDT)
  • Integrity: Malicious replicas can inject false operations
  • Fairness: Last-writer-wins may prefer certain nodes (e.g., those with faster clocks)
  • Delay: Convergence can take time proportional to network diameter

Real-world example: A CRDT-based multiplayer game had a player with a clock skewed by +30 seconds. Their LWW operations always won, causing unfair advantages. The fix: use hybrid logical clocks instead of wall clocks.

10.5 “CRDTs Are Always Convergent” is a Lie (Without Correct Implementation)

The mathematical guarantee holds only if the merge function satisfies all three properties. Implementation bugs can break convergence:

// Buggy merge: not commutative
fn gc_merge_buggy(&mut self, other: &GCounter) {
    // Only update if other's value is larger
    // This is correct for a single node, but misses multi-node scenarios
    if other.value() > self.value() {
        self.counts = other.counts.clone();
    }
}

Testing strategy: Use randomized testing (QuickCheck-style) to generate arbitrary sequences of operations and merges, and verify that all replicas converge to the same state.


11. The Future of CRDTs

11.1 Industrial Adoption

CRDTs are entering the mainstream:

  • Database vendors: Redis, MongoDB Realm, Amazon QLDB (partially)
  • Collaboration tools: Google Docs (evolved from OT), Notion, Figma, Obsidian
  • Messaging: Matrix, WhatsApp (end-to-end encrypted CRDT-like messages)

11.2 Standardization Efforts

The IETF has working groups on CRDTs for:

  • JSON CRDT format: A standardized binary format for JSON CRDT documents
  • CRDT for distributed caching: Interoperable CRDT caches
  • CRDT for web applications: Browser-native CRDT support (Web CRDT API)

11.3 Language and Framework Support

  • Rust: CRDT libraries (yrs, automerge-rs) are becoming production-ready
  • TypeScript: Yjs and Automerge dominate
  • Go: Growing ecosystem (go-crdt, etc.)
  • Elixir: Distributed Elixir uses CRDTs for state management (Phoenix LiveView, Delta CRDTs)

11.4 Hardware-Accelerated CRDTs

Research into hardware-accelerated CRDTs:

  • FPGA-based CRDT merge units for high-throughput datacenters
  • GPU-parallelized CRDT merge for large documents
  • Smart NICs that perform CRDT merges before data reaches the CPU

11.5 The Grand Challenge: Scalable CRDTs for Global IoT

Imagine billions of sensors updating their state concurrently, with no central infrastructure. Each sensor maintains a CRDT of its measurements. Periodically, sensors sync with gateways or peers.

Challenges:

  • Memory: Each sensor has limited memory (KB to MB)
  • Bandwidth: Radio links may be low-throughput
  • Energy: Merges consume CPU cycles

Current research:

  • Lightweight CRDTs: Minimize metadata (e.g., 2-byte tags instead of UUIDs)
  • Hierarchical CRDTs: Local syncing within a cluster, then aggregate to global
  • Adaptive CRDTs: Dynamically choose between state-based and operation-based depending on network conditions

12. Conclusion: Why Every Distributed Systems Engineer Should Learn CRDTs

CRDTs represent a fundamental shift in how we think about distributed data. Instead of fighting against concurrency with locks, transactions, and consensus, CRDTs embrace it—providing mathematical guarantees that concurrent operations will automatically converge.

The key takeaways:

  1. CRDTs are not a silver bullet. They solve one specific problem (convergence without coordination) but don’t address security, fairness, or applications requiring strong consistency.

  2. Choose the right tool for the job. For counters and sets, CRDTs are practically always the right choice. For ordered sequences, they’re excellent for collaborative editing but may be overkill for simple append-only logs.

  3. Understand the trade-offs. State-based CRDTs are simpler but consume more bandwidth. Operation-based CRDTs are more efficient but require exactly-once delivery. Hybrid approaches (delta-CRDTs) offer the best of both.

  4. Always plan for garbage collection. Tombstones will consume your memory if left unchecked. Implement GC from the start.

  5. Testing is critical. Use property-based testing to verify convergence. Don’t assume your merge function is correct—prove it.

The ongoing convergence of distributed systems, offline-first applications, and decentralized technologies ensures that CRDTs will remain a vital tool for the foreseeable future. Whether you’re building the next billion-user document editor, a decentralized social network, or an offline-first todo app, CRDTs provide the mathematical foundation you need.

Final thought: As Emin Gün Sirer, co-founder of Ava Labs, once said: “CRDTs are the kind of elegant idea that makes you wonder why we didn’t think of it earlier.” Now is the time to learn them, before they become table stakes for any distributed system.


References and Further Reading

  1. Shapiro, M., Preguia, N., Baquero, C., Zawirski, M. (2011). Conflict-Free Replicated Data Types. PDF
  2. Almeida, P.S., Baquero, C., Preguia, N. (2018). Delta State Replicated Data Types. Journal of Parallel and Distributed Computing
  3. Kleppmann, M. (2017). Designing Data-Intensive Applications. O’Reilly Media.
  4. Automerge: GitHub
  5. Yjs: GitHub
  6. Redis Enterprise CRDTs: Redis Documentation
  7. Ditto (CRDT SDK): Ditto
  8. Matrix Protocol: Specification
  9. CRDT Research Group: CRDT.tech
  10. Hybrid Logical Clocks: Kulkarni et al.