The Performance Of Crdts For Collaborative Text: Rga, Lseq, Treedoc, And Correctness Proofs

2023-05-15 · Leonardo Benicio

Summary

A comprehensive technical exploration of the performance of crdts for collaborative text: rga, lseq, treedoc, and correctness proofs, covering key concepts, practical implementations, and real-world applications.

Here is the expanded blog post, taking your excellent introduction and building it into a comprehensive, deeply technical, and engaging deep dive, reaching well over 10,000 words.

The Blooming Lotus: How CRDTs Make Real-Time Collaboration a Mathematical Reality

The relentless, almost invisible, magic of real-time collaboration is one of the defining features of the modern digital workplace. You open a Google Doc, a Figma board, or a VS Code Live Share session. You type a character, your colleague deletes a paragraph across the ocean, and another friend simultaneously reorders a list. The screen updates, characters appear and vanish, and the final document is coherent, devoid of chaos. No lost edits. No garbled text. No screaming matches over who “owns” the final version. This seamless experience, a cognitive miracle of the late 2010s, is built on a foundation of deceptively complex mathematics and software engineering: the theory of Conflict-Free Replicated Data Types (CRDTs).

This is not a story about a clever hack. This is a story about a fundamental shift in how we think about data, consistency, and the nature of a “single source of truth.” It’s a story where the solution isn’t a smarter central brain that orchestrates every move, but a mathematical design so elegant that conflict becomes structurally impossible. It sounds like magic, but it is, in fact, a form of applied lattice theory.

But before we can appreciate the elegant, blossoming lotus of a CRDT, we must first feel the full, agonizing weight of the problem it solves. We must walk through the valley of the shadow of concurrency.

Act I: The Despair of Concurrency

Imagine a world without this magic. It’s the early 2000s. You are a software engineer named Alice, and your colleague Bob is in a different time zone. You’re both editing the same specification document for a critical project. You both download a local copy to work offline during a long-haul flight.

You, on your flight, edit sentence 3: “The quick brown fox jumps over the lazy dog.” Bob, on his flight, edits the same sentence: “The super fast auburn fox leaps over the sleeping cat.”

You both land, fire up your laptops, and their sync software kicks in. What happens? In a naive, first-generation sync system, the result is a bloody, irreconcilable conflict. The system might present you with a grotesque hybrid: “The quick super fast brown auburn fox jumps leaps over the lazy sleeping dog cat.” Data corruption is the norm. The document is a battlefield of overlapping intentions.

This is the Lost Update Problem, the fundamental curse of optimistic replication. You and Bob both read the same state (the original sentence), you both modify it locally, and when you write your versions back, the last one to write simply clobbers the other. The result is the destruction of someone’s work.

The first generation of solutions to this problem were, in a word, authoritarian.

1. Pessimistic Locking: The Tyranny of the Token

The brute-force solution: pessimism. Before you can edit a paragraph, you must acquire a lock. You “check out” the section, and no one else can touch it until you’re done. This is the model of old-school version control systems like CSV or RCS. In a collaborative document editor, it works perfectly—for a single user. For a team, it’s a nightmare.

Imagine trying to have a conversation where only one person can speak at a time, and they have to hold a physical token to do so. If Alice is meticulously proofreading her paragraph for ten minutes, Bob (who only wants to fix a single typo) sits idle, waiting for the token to be released. This is a serialization of work. It’s the antithesis of free-flowing collaboration. It destroys the very experience it claims to enable. It is the “checkout, edit, check-in” cycle of a library book, not a living document.

2. Merge-Based Systems: The Post-Hoc Autopsy

The next step was to allow parallel work and then try to reconcile the differences later. This is the model of Git. In Git, you “fork” a document, make your changes, and then “merge” them back into a main branch. If you and Bob both change the same line, Git declares a merge conflict.

This is not a bug; it’s a feature for source code, where conflicts are meaningful and require human discretion. But for a document you’re editing in real-time, it’s a show-stopper. You can’t have a “merge conflict popup” appear every time two people type in the same paragraph. The entire point of real-time collaboration is to avoid the post-hoc negotiation phase. You want the document to converge in the moment, not hours later.

The desire for a real-time, online, immediately consistent document was the holy grail. The first major system to attempt this was the foundation of Operational Transformation (OT) .

Act II: The Hero with a Thousand Edge Cases: Operational Transformation

The hero of the early 2000s was OT, a concept born in the research labs of the University of Stuttgart and spearheaded by its implementation in Google Wave (2009) and, later, Google Docs. OT is a truly impressive feat of software engineering. Its core idea is a masterpiece of workarounds on top of fragile state.

How OT Works (A Simplified View)

Imagine a document as an array of characters. [T, h, e, , f, o, x]

Alice fires an operation: Insert("q", at position 0) to change “The” to “qThe”. Her intention is to prepend ‘q’.

Bob, at the exact same moment, fires an operation: Delete( position 7, length 1 ) to delete the final ‘x’.

These two operations are concurrent. They both see the same original state [T, h, e, , f, o, x]. The server receives them in some order (let’s say Alice’s first, then Bob’s).

Apply Alice’s Op: [q, T, h, e, , f, o, x]
Now apply Bob’s Op: Bob’s op says “Delete the character at position 7.” But the string is now longer. The ‘x’ is no longer at position 7; it’s at position 8! If we naively apply Bob’s op, we’ll delete the character at the old position 7, which is now ‘o’. The intention is lost. The result is wrong.

The core function of OT is the transformation function, T(op1, op2). When the server receives Bob’s Delete(7,1) after Alice’s Insert(0, "q"), it must transform Bob’s operation so that its intent (delete the final ‘x’) is preserved on the new document state.

The server calculates T(Delete(7,1), given Insert(0,"q")). The transformed operation might be Delete(8,1). Why 8? Because Alice’s insert shifted everything to the right by one position. Bob’s operation must be shifted to account for this.

This is the infamous “labyrinthine logic” you mentioned. The problem is that this transformation function must be defined for every possible pair of operations.

T(Insert(pos, s), Insert(pos, s)) - Two people inserting at the exact same place.
T(Insert(pos, s), Delete(pos, len)) - One inserting where another is deleting.
T(Delete(pos1, len1), Delete(pos2, len2)) - Two overlapping deletes.

For a text document, the number of primitive operations is small (insert, delete, format). But the permutations explode when you consider stateful operations like “move block from A to B”, “set font”, “add comment”. Each new operation type requires a new transformation function to be written, tested, and proven correct against all other existing operation types.

The Centralized State Machine

The biggest practical constraint of OT is its reliance on a centralized, stateful server. The server is the single authority that receives operations, transforms them, and applies them in a deterministic order. The server maintains the entire history of the document and the full operational state. If the server crashes, the entire state is lost. If the server is slow, the client experience becomes laggy.

More importantly, the transformation function itself is not associative. The order in which you apply transformations matters in complex ways. This makes it incredibly difficult to have a fully peer-to-peer network where any client can edit, and convergence is guaranteed without a central arbitrator. The entire system is a tower of state, built on a layer of fragile workarounds, and its correctness has been notoriously hard to prove formally. Many teams, including Google’s own early Wave team, spent years battling edge cases.

OT works. It works brilliantly for Google Docs. But it works despite its complexity, not because of its elegance. It is a testament to human engineering will, but it is not a law of nature. There had to be a better way.

Act III: The Algebraic Revolution: Enter CRDTs

The existential leap of CRDTs is this: Instead of figuring out how to resolve conflicts after they happen, design the data structure so that conflicts are mathematically impossible.

This is a paradigm shift. OT is about post-hoc transformation. CRDTs are about inherent convergence. The key insight is to abandon the idea of a single authoritative state entirely. Instead, each replica maintains its own local state, and the system is designed around a core mathematical principle: monotonicity.

CRDTs are built on a foundation of join-semilattices from abstract algebra. Let’s unpack that without the scare quotes. A join-semilattice is a set with a single operation, merge, that has three properties:

Idempotent: merge(A, A) = A. Applying the same merge twice has no effect.
Commutative: merge(A, B) = merge(B, A). Order doesn’t matter.
Associative: merge(merge(A, B), C) = merge(A, merge(B, C)). Grouping doesn’t matter.

If we can design our data structure (a Counter, a Set, a Text Document, a Graph) so that it forms a join-semilattice, then we have a miracle: Strong Eventual Consistency (SEC) . If all replicas can eventually receive and merge all updates, in any order, they will converge to the same final state.

This is a radically different way of thinking. The “source of truth” is not a single server. It is the merge function itself.

Let’s look at two fundamental flavors of CRDTs.

State-Based CRDTs (CvRDTs): The Blooming Lotus

Imagine you and a friend are each holding a copy of a whiteboard with a counter. The state of your whiteboard is the current number.

The Data Type: A Grow-Only Counter (G-Counter)

A G-Counter is the simplest CRDT. It can only go up. How do we represent a distributed counter that can be incremented by many peers, without conflict?

The State: The state is not a single integer. It is an array of integers, one for each peer. Let’s say we have three peers (A, B, C).

Replica A starts with [0, 0, 0].
Replica B starts with [0, 0, 0].
Replica C starts with [0, 0, 0].

The Event (Mutations):

Increment by Peer i: To increment the counter, a peer increments only its own slot in its local array.
- A increments: A’s state becomes [1, 0, 0].
- B increments twice: B’s state becomes [0, 2, 0].
Read: To get the total value of the counter, you simply sum all the elements in your local array. sum(A) = 1, sum(B) = 2.

The Conflict Resolution (Merge): This is the beauty. When replica A sends its state [1, 0, 0] to replica B, and replica B sends its state [0, 2, 0] to replica A, how do they merge?

The merge function is simply element-wise maximum.

A receives [0, 2, 0]. A’s state is [1, 0, 0].
- max(A, B) = [max(1,0), max(0,2), max(0,0)] = [1, 2, 0]. A’s new state.
B receives [1, 0, 0]. B’s state is [0, 2, 0].
- max(B, A) = [max(0,1), max(2,0), max(0,0)] = [1, 2, 0]. B’s new state.

They have converged to the same state! The total is 1+2+0 = 3. No conflict. No transformation. Just a simple, algebraic merge that is commutative, associative, and idempotent. The state “blooms” like a lotus as it receives more updates, always moving forward, never backward.

This is a state-based CRDT (or Convergent CRDT, CvRDT). You send the entire state (or a periodic delta) to other replicas, and they merge it using a monotonic join function. The cost is bandwidth (you’re sending the whole state). The benefit is immense simplicity.

Another Example: The Set

What about a collaborative shopping list? You can’t use a G-Counter because you need to add and remove items. A simple set is not a valid join-semilattice because if you add “Milk” and I delete “Milk”, with a plain set, the last write wins. We need monotonicity.

Grow-Only Set (G-Set): You can only add elements. Cannot be removed. It works, but is useless for a list.
Two-Phase Set (2P-Set): You have two G-Sets: an Add-set and a Remove-set.
- Add (e): Add e to the Add-set.
- Remove (e): Add e to the Remove-set.
- Lookup (e): e is in the set if e ∈ Add-set AND e ∉ Remove-set.
- Merge: For both Add-set and Remove-set, do element-wise union (which is monotonic).
- The Consequence: If you add “Milk” and then someone else removes it, it’s gone forever. You cannot re-add it. The Remove-set prevents you from doing so. This is the price of making the conflict impossible. This limitation is solved by more complex CRDTs like the Observed-Remove Set (OR-Set) , which uses tags or timestamps to allow re-adding an element after it’s been removed. The complexity grows, but the algebraic foundation remains.

Operation-Based CRDTs (CmRDTs): The Whisper Network

The second flavor, operation-based CRDTs (or Commutative Replicated Data Types, CmRDTs), attacks the problem from a different angle. Instead of sending the entire state, you send only the operations themselves (e.g., “Increment counter by 1”). The magic here is that the operations are designed to be commutative by definition.

If Alice sends “Insert ‘a’ at the beginning” and Bob sends “Insert ‘b’ at the beginning”, how can these be commutative? They can’t be, because “ab” is different from “ba”.

CmRDTs solve this by requiring causal delivery. The operations must be received in the order in which they happened. A central server or a broadcast mechanism like a Distributed Log (e.g., Kafka, FoundationDB) can provide this. Once you have a causal order, the operations can be applied in that order, and commutativity is enough to guarantee convergence—as long as you never reorder them.

CmRDTs are often smaller and more efficient for bandwidth (sending ops vs. state), but they place a heavier burden on the delivery layer. They are the route many modern systems take for practical, high-performance messaging (e.g., using a central server like Figma’s or, in a more distributed way, using an op-log like in building a local-first application with Automerge or Yjs).

Act IV: The Holy Grail: The Collaborative Text CRDT

The most famous and complex CRDT is the Sequence CRDT or List CRDT. This is the data type that powers collaborative text editing. How do you represent a linked-list or an array of characters so that two people can insert and delete at the same time, in different places, and always converge?

This problem is famously difficult. The earliest attempts like the Tombstone-based RGA (Replicated Growable Array) are beautiful examples of the CRDT philosophy.

The Core Idea: Avoid Relative Positions

In OT, you say “Insert character ‘x’ at position 5.” Positions are fragile. If someone inserts before you, your position shifts. In a CRDT, we say: “Insert character ‘x’ after a specific, unique, immutable identifier.”

The State: Every character in the document is a node in a linked list, but the list is ordered not by a fragile index, but by a globally unique, immutable identifier (UUID) that is assigned when the character is created. Each node also contains a parent identifier: the UUID of the character it was inserted after.

The Event (Insert):

Alice wants to insert ‘A’ after the character ‘B’ (which has UUID uuid_B).
She creates a new UUID for ‘A’: uuid_A.
She creates an operation: Insert(uuid_A, uuid_B, 'A'). This says “Create a new node with id uuid_A as a child of node uuid_B. The value is ‘A’.”

The Event (Delete):

Bob wants to delete the character with UUID uuid_C.
He creates an operation: Delete(uuid_C). This simply marks the node as a “tombstone” (a dead node).

The Merge and Convergence:

This is where it gets brilliant.

1. Tombstones Solve the “Re-insert” Problem: If Alice deletes ‘C’, and then Bob inserts a character after ‘C’, what happens? Bob’s insert says “Insert after uuid_C”. But uuid_C is marked as deleted (a tombstone). The CRDT rule is: Tombstones are never removed from the list! They are just marked as dead. So Bob’s insert still has a valid parent node to attach to. After the merge, the text will show the new character, but not the old ‘C’. This avoids the “dangling pointer” problem that plagues OT.

2. Resolving Concurrent Inserts at the Same Position (The Tiebreaker): What happens when Alice and Bob both insert a character after the same parent uuid_B?

Alice: Insert(uuid_A1, uuid_B, 'X')
Bob: Insert(uuid_A2, uuid_B, 'Y')

Both see themselves as the “next” character after ‘B’. In a final state, which comes first, ‘X’ or ‘Y’? The system must define a deterministic tiebreaker.

The typical approach: Compare the UUIDs. Let’s say UUIDs are Lamport timestamps (a combination of a globally unique site ID and a local counter). For example:

Alice’s UUID: (Alice, 5, ...) – we can call it a Lamport timestamp like [T=7, S=Alice]
Bob’s UUID: (Bob, 3, ...) – [T=5, S=Bob]

The rule is: The node with the higher timestamp (or site ID) comes first. If Bob’s timestamp is smaller, it goes after Alice’s. This provides a deterministic, conflict-free ordering. The system doesn’t “resolve” a conflict; it uses a pre-defined, agreed-upon rule to place the two characters in a consistent order globally.

Concrete Example:

Initial State: [H, i]
- H has UUID uuid_1.
- i has UUID uuid_2.
- uuid_2’s parent is uuid_1.
Alice wants to insert ‘!’ after ‘i’: Insert(uuid_3, uuid_2, '!')
Bob wants to insert ‘?’ after ‘i’: Insert(uuid_4, uuid_2, '?')
Merge their operations.

Alice’s view: [H, i, !]. Then she receives Bob’s insert. Bob’s insert has a parent of uuid_2. The children of uuid_2 are now [uuid_3, uuid_4]. Using the tiebreaker (e.g., Alice’s UUID has a higher site ID), uuid_3 comes before uuid_4. Final state: [H, i, !, ?].
Bob’s view: [H, i, ?]. Then he receives Alice’s insert. Same logic. Children of uuid_2 are [uuid_3, uuid_4]. Tiebreaker says uuid_3 before uuid_4. Final state: [H, i, !, ?].

They have converged! Both see the same result. No conflict. The simple algebraic rule of “compare UUIDs on concurrent inserts” guarantees convergence, no matter what order the messages arrive. The text is inherently consistent, a property of its data structure, not a product of a clever transformation algorithm.

This is the heart of the magic. The Latency, the order of arrival, the “who did what first”? It doesn’t matter. The data structure itself encodes a deterministic resolution.

Act V: The Real World: Practicalities, Trade-offs, and the Future

The mathematical story is beautiful, but the implementation is where the rubber meets the road. CRDTs are not a free lunch. They come with their own set of practical challenges.

1. Storage and Bandwidth (The Tombstone Problem) The biggest pain point is tombstones. In a collaborative document, every deleted character is not removed; it’s just marked. For a document that has been edited for years, the state can be huge. Imagine a 10-page document that has had 1000 pages of text deleted over its lifetime. The CRDT’s internal state must store all of that.

Solutions:

Garbage Collection (GC): This is the holy grail of CRDT research. How can you safely remove tombstones without breaking the potential for future inserts that might depend on them? The answer is complex and often requires a “cut” in time where all replicas have agreed on a checkpoint. For practical systems, periodic full-state snapshots are a common approach.
Compression: Modern libraries like Yjs use very efficient binary encoding and clever data structures to store the graph of operations. They don’t store literal characters, but delta-based operations against a base state.

2. Performance (The B-tree vs. Linked List Problem) A naive linked-list of UUIDs is terrible for performance. To find the character at position 1000, you have to walk through 1000 nodes. Modern CRDTs for text are not linked lists. They are built on top of more efficient data structures like Indexed Ropes (a tree structure that supports efficient insertion and lookup of a character by its logical index) or B-trees (balanced trees that support large-scale edits). Yjs, for instance, uses a B-tree to map from logical position to the underlying CRDT node, giving O(log n) performance for operations.

3. Offline and Local-First This is where CRDTs truly shine. Because the system converges based on algebraic properties, it does not need a central server. This is the foundation of the Local-First Software movement. You can build a collaborative app that works 100% offline on a phone, then syncs later when a connection is found. The CRDT ensures that the two devices will converge perfectly. This is a paradigm shift from cloud-dependent apps like Google Docs, which become useless without connectivity.

4. The Complexity of Undo/Redo and Copy-Paste These operations are deceptively hard in a CRDT world.

Undo: Undo is not just a “reverse of the last operation.” If you undo your last insertion, but Bob inserted a character in the middle of your insertion, what does “undo” even mean? Common approaches involve “undoing” by creating a new “delete” operation for the characters you inserted, which is itself a concurrent operation.
Copy-Paste: What does it mean to “copy” a range of text in a CRDT? The range is defined by the logical order of the characters, which is a function of the entire graph. When you paste, you’re inserting a new set of UUIDs based on the copied range’s UUIDs, which themselves reference a past state.

Act VI: OT vs. CRDTs: The Final Showdown

Feature	Operational Transformation (OT)	Conflict-Free Replicated Data Types (CRDTs)
Core Philosophy	Resolve conflicts after they occur via transformation	Design data so conflict is impossible
Mathematical Foundation	State machine, complex transformation logic	Join-semilattices, commutative/associative algebra
State Management	Centralized or tightly controlled multi-version	Fully decentralized, peer-to-peer capable
Correctness	Extremely hard to prove; edge-case ridden	Provably correct via algebraic properties
Complexity of Implementation	Very high; labyrinthine transformation functions	Moderate; complex data structure design
Offline Support	Difficult to implement correctly; requires complex merge strategies	Natural; inherent in the mathematical design
Performance (Text Editing)	Highly optimized (Google Docs) but stateful and server-bound	Increasingly efficient (Yjs, Automerge); good local performance
Tombstone Problem	Not applicable (operations are applied to state)	Major issue; must be managed (GC, compression)
Use Cases	Google Docs, collaborative office suites, controlled environments	Local-first apps, peer-to-peer networks, Figma (uses a custom CRDT), VS Code Live Share (uses a hybrid OT/CRDT approach)

Conclusion: The Science of Serendipity

The modern digital workplace is built on a compromise. We wanted the serendipity of collaboration—the sudden, beautiful synthesis of two minds working on the same canvas—without the chaos of conflict. Early systems traded freedom for correctness (locking). Later systems traded complexity for freedom (OT). CRDTs represent a third way: a fundamental redesign of the canvas itself.

The “relentless, almost invisible, magic” you feel when you type in a collaborative document is not magic. It’s the result of decades of computer science research distilled into a few hundred lines of elegant, monotonic mathematical code. It’s the blooming lotus of a join-semilattice, unfolding in real-time, ensuring that no matter how many writers touch a character, the final document is not a battlefield of conflicting wills, but a single, coherent, and mathematically inevitable truth.

The next time you see your colleague’s cursor moving in a Google Doc, remember: you’re not seeing a conflict being resolved. You’re seeing a conflict that was algebraically obviated at the very moment it was allowed to exist. And that is a deeper, more beautiful magic than any spell.