Implementing A Distributed Graph Processing System Using Pregel: Vertex Centric Bsp And Checkpointing

A comprehensive technical exploration of implementing a distributed graph processing system using pregel: vertex centric bsp and checkpointing, covering key concepts, practical implementations, and real-world applications.
Contents
The Unreasonable Effectiveness of Graphs (and the Brutal Reality of Scale)
We live in a world of connections. The social networks we scroll through, the recommendation engines that whisper “you might also like,” the web of documents that forms the internet, and the intricate networks of genes and proteins that define our biology—all are, at their core, graphs. They are the fundamental data structure for representing relationships, dependencies, and influence. A graph is not just a collection of nodes and edges; it is a lens through which we can understand the flow of information, the strength of communities, and the shortest path between two disparate ideas.
But here’s the uncomfortable truth: the graphs that matter are impossibly large. A social graph with billions of users and trillions of follow-relationships cannot be stored on a single machine, let alone processed by one. Think about a core operation like determining the influence of every user (PageRank) or detecting fraud rings (connected components). A naive, single-machine approach would take weeks, if not years, and would crash long before finishing.
This is the challenge of distributed graph processing. It is one of the most intellectually demanding and practically important problems in modern distributed systems. You must partition a massive, interconnected graph across dozens, hundreds, or even thousands of machines. Then, you must execute an iterative algorithm on this fragmented landscape, all while machines are failing, networks are lagging, and data is being shuffled across the cluster. A single straggler machine can hold your entire computation hostage. A single node failure can force you to restart a day-long job from scratch.
This is the problem that inspired Google’s engineers to create a radical new paradigm. Enter Pregel, a system that tamed the chaos of distributed graphs with a deceptively simple idea borrowed from parallel computing: the Bulk Synchronous Parallel (BSP) model. Pregel introduced a vertex-centric programming abstraction that made writing distributed graph algorithms feel almost like writing a single-machine program. It became the blueprint for a generation of graph processing systems. But to truly appreciate Pregel, we must first understand the beast it was built to conquer.
The Beast: Why Graph Processing at Scale is Hard
Graph algorithms are notoriously difficult to parallelize, and even more so to distribute. The core challenge lies in the nature of graphs themselves: they are inherently irregular. Unlike dense linear algebra or matrix operations where data can be neatly tiled and each element processed independently, graph computations often involve accessing the neighbors of a vertex, which are scattered across the graph. This leads to several fundamental problems:
1. Poor Locality
In a distributed system, the graph is partitioned across machines. When a vertex sends a message to its neighbor, that neighbor may live on a different machine. The cost of a remote network round trip is orders of magnitude higher than a local memory access. As the algorithm iterates, the communication pattern becomes unpredictable, leading to high network contention. For example, in a PageRank computation, every vertex sends its partial rank to all its out-neighbors. If those neighbors are spread across many machines, the shuffle traffic can saturate the network.
2. Load Imbalance
Graphs are power-law distributed: a small fraction of vertices (the “hubs”) have an enormous number of edges, while the vast majority have very few. In a naive partitioning, one machine might end up holding the vertex for a celebrity with millions of followers, while another holds low-degree users. That machine becomes a bottleneck—it has to process far more messages and updates than its peers. This is the straggler problem: the overall job is only as fast as the slowest machine.
3. Iterative Nature
Most graph algorithms are iterative: they repeat a local computation until convergence. In a distributed setting, each iteration involves a communication phase (all vertices send messages) and a computation phase (vertices update their state). The barrier synchronization between iterations means that every machine must finish its work before the next iteration can begin. If one machine is slow, all other machines sit idle waiting. This is especially painful when the algorithm requires hundreds of iterations to converge.
4. Fault Recovery
In a cluster of thousands of machines, failures are the norm, not the exception. A single machine crash during a multi-hour iterative job can corrupt the entire computation. Without careful checkpointing, you would have to restart from scratch. But checkpointing the entire graph state after each iteration is expensive—both in storage and I/O. The system must find a way to recover quickly without losing too much progress.
5. The Abstraction Gap
Before Pregel, engineers who wanted to run graph algorithms on a cluster had to use general-purpose distributed computing frameworks like MapReduce. But MapReduce is fundamentally limited for graph processing. Each iteration of an algorithm like PageRank would require multiple MapReduce jobs: one to emit the partial rank contributions, another to sum them, and a third to apply the damping factor. This not only wastes resources (each job writes to disk) but also forces the programmer to think in terms of key-value pairs rather than vertices and edges. The cognitive overhead is enormous.
These challenges motivated the development of a specialized system. Pregel’s genius was not in inventing new distributed computing primitives, but in providing a programming model that matched the way graph algorithms are naturally expressed, while hiding the complexity of distribution, fault tolerance, and communication.
The BSP Model: A Tried-and-True Foundation
At the heart of Pregel lies the Bulk Synchronous Parallel (BSP) model, originally proposed by Leslie Valiant in 1990. BSP is a parallel computing model that formalizes the execution of a program as a sequence of supersteps. Each superstep consists of three phases:
- Concurrent Computation: Each processor performs local computation on its own data. No communication occurs during this phase.
- Communication: Processors exchange data (messages) with each other. The communication is asynchronous within the superstep but must complete before the next phase.
- Barrier Synchronization: All processors wait until every processor has finished both computation and communication. Only then does the next superstep begin.
This model is appealing for distributed graph processing because it imposes structure on an otherwise chaotic process. By enforcing a global synchronization barrier, BSP eliminates many of the race conditions and non-determinism that plague asynchronous distributed algorithms. The programmer can think of the algorithm as a simple loop: in each superstep, each vertex processes incoming messages, updates its state, and sends messages to other vertices for the next superstep. The order of message delivery within a superstep is unspecified, but all messages sent in superstep S are guaranteed to be received at the start of superstep S+1.
BSP has a known cost model: the time per superstep is the sum of the maximum local computation time across all processors, the total communication volume (often measured in words transferred), and the number of synchronization steps. This allows developers to predict performance and identify bottlenecks.
Pregel adopted BSP but extended it with graph-specific features. In Pregel, each processor (called a worker) is responsible for a partition of the graph. The computation is expressed from the perspective of a single vertex. This is the vertex-centric programming model, which we will explore in detail.
Pregel: A System Overview
Pregel was designed and built at Google around 2009, as described in the seminal paper “Pregel: A System for Large-Scale Graph Processing” by Grzegorz Malewicz et al. (2010). It was used internally for many of Google’s graph processing tasks, such as PageRank, shortest paths, and clustering.
The system architecture is straightforward:
- Master: A single master node coordinates the job. It partitions the input graph, assigns partitions to workers, manages synchronization, and detects failures.
- Workers: Multiple worker machines each hold a portion of the graph’s vertices and edges. They execute the vertex compute function for every vertex in their partition during each superstep.
- Persistent Storage: The input graph is stored in a distributed file system (e.g., GFS or HDFS). The output (final vertex states) is written back to the same storage.
The master is responsible for:
- Reading the input graph and partitioning it (e.g., by hashing vertex IDs).
- Sending each worker its initial partition.
- Sending the “start superstep” signal to all workers.
- Waiting for all workers to report completion of a superstep.
- Broadcasting the “next superstep” signal.
- Handling worker failures by reassigning their partitions to live workers.
- Terminating the computation when all vertices vote to halt.
Workers are responsible for:
- Maintaining the local portion of the graph in memory (vertices, edges, messages).
- Executing the user-defined vertex compute function for every active vertex.
- Sending outgoing messages to other workers (via a network library).
- Keeping track of which vertices are active or inactive.
Pregel uses a master-slave architecture with a simple heartbeat mechanism for fault detection. If the master does not hear from a worker for a certain timeout, it marks that worker as failed and re-executes the incomplete supersteps on the remaining workers.
The Vertex-Centric Programming Model
The key innovation in Pregel is the vertex-centric programming model. The developer implements a single function that is called for every vertex in each superstep. This function has access to:
- The vertex’s current value (mutable state, e.g., the PageRank score).
- The set of outgoing edges (each edge has a target vertex ID and a mutable value).
- The messages received from other vertices in the previous superstep.
The function can:
- Modify the vertex’s state.
- Send messages to any vertex in the graph (not just neighbors) – but typically to neighbors via their edges.
- Modify outgoing edges (add/remove).
- Vote to halt – if a vertex decides it no longer needs to be active (converged), it votes to halt. The master considers the job done when all vertices have voted to halt and no messages are in transit.
Here is a pseudo-code definition of the vertex interface:
class Vertex:
# Input:
# vertex_id: unique identifier
# value: mutable state
# edges: list of outgoing edges (target_id, edge_value)
# messages: list of incoming messages from previous superstep
def compute(self, messages):
# Implement algorithm-specific logic
# Example: For PageRank, sum incoming contributions, apply damping, broadcast new rank.
pass
def send_message_to(self, target_id, message):
# Sends a message to another vertex (will be delivered in next superstep)
pass
def vote_to_halt(self):
# Signals that this vertex is done and can be ignored in future supersteps
pass
The beauty of this model is that the developer never thinks about distribution. They write the algorithm as if all vertices exist in a single global address space. The system handles the routing of messages, the load balancing via partitioning, and the synchronization barriers.
Two Fundamental Properties
Determinism: If the same input and same compute function are used, the output is deterministic (ignoring floating-point variations). This is because within a superstep, computation is local and message sending is queued; the order of vertex processing within a worker does not affect the messages sent (since messages are only aggregated at the end of the superstep). However, the order in which a vertex processes incoming messages is undefined, so algorithms should be designed to be commutative (e.g., sum of contributions) or handle unordered inputs.
Scalability: The system scales linearly with the number of vertices and edges, assuming the graph can be partitioned reasonably. In practice, the communication volume grows with the number of edges that cross machine boundaries (the “cut” size). Good partitioning minimizes this.
A Concrete Example: PageRank in Pregel
Let’s implement the classic PageRank algorithm in Pregel. PageRank assigns a numerical weight to each vertex representing its importance. The algorithm iterates until convergence:
- Initially, every vertex has rank = 1/N (N = total vertices).
- In each iteration, each vertex sends its current rank divided by its out-degree to all its neighbors.
- Each vertex sums the incoming ranks, applies the damping factor d (typically 0.85), and sets its new rank as (1-d)/N + d * sum(incoming).
- The algorithm stops when the total change across all vertices falls below a threshold.
In Pregel, we define the vertex compute function as follows:
def pagerank_compute(vertex, messages):
if superstep == 0:
# Initialization
vertex.value = 1.0 / total_vertices
# Send initial rank to neighbors
for edge in vertex.edges:
send_message_to(edge.target_id, vertex.value / len(vertex.edges))
else:
# Sum contributions from incoming messages
sum_contributions = sum(messages)
# Apply damping factor
new_rank = (1.0 - DAMPING) / total_vertices + DAMPING * sum_contributions
# Check for convergence (compare with previous value)
delta = abs(new_rank - vertex.value)
vertex.value = new_rank
if delta > THRESHOLD:
# Still changing: send new contributions
for edge in vertex.edges:
send_message_to(edge.target_id, vertex.value / len(vertex.edges))
# Stay active
else:
# Converged: vote to halt
vote_to_halt()
# Note: We must still send messages if we have outgoing edges?
# Actually, if we vote to halt, we won't be called again unless we receive a message.
# But for PageRank, if a vertex converges, it should still send its final rank to neighbors?
# The typical implementation sends one final update, then votes to halt.
# However, there's a nuance: if we stop sending, neighbors might converge prematurely.
# The standard approach: always send contributions as long as the vertex is active.
# Best practice: keep sending until no messages are sent at all (i.e., all vertices converge).
# If we didn't vote to halt, the vertex remains active for the next superstep.
A key detail: the vertex must send messages every superstep until it decides to halt. If it halts, it will not be called again unless it receives a message from another vertex (which would wake it up). So the convergence is cooperative: all vertices gradually stop sending messages, leading to global termination.
The master checks after each superstep whether all vertices have voted to halt and no messages are in transit. If so, the computation ends.
Complexity Analysis
- Each superstep: each vertex computes in O(1) time (assuming degree handling is O(degree)). So per superstep, time per worker is proportional to the number of vertices it holds.
- Communication: each active vertex sends a message to each neighbor. Total messages per superstep = number of edges if all vertices are active. In practice, the number of active vertices decreases.
- Number of iterations: typically 20-30 for PageRank to converge. With billions of vertices, the communication cost is the main bottleneck.
Pregel handles this by allowing combiners to reduce message volume, which we cover next.
Advanced Features: Combiners, Aggregators, and Mutation
Combiners
In many graph algorithms, the messages sent to a vertex can be combined without losing information. For example, in PageRank, messages are numeric contributions that need to be summed. If multiple vertices from the same source machine all send messages to the same target vertex, the system can sum them locally before sending them over the network. This reduces network traffic significantly.
Pregel allows users to define a combiner function that takes a list of messages destined for the same vertex and produces a single aggregated message. The combiner must be commutative and associative. The system applies the combiner on the sender side (before network transmission) and optionally on the receiver side (to combine multiple incoming messages from different senders).
For PageRank, the combiner is simply sum. For shortest paths, it’s min. For connected components, it’s min or max depending on the algorithm.
Combiners are optional but highly recommended. They can reduce communication by an order of magnitude in dense graphs.
Aggregators
Aggregators provide a mechanism for global computation. An aggregator is a value that is computed from all vertices’ contributions in a superstep and made available to all vertices in the next superstep. Examples:
- Summing the total residual error across all vertices to check global convergence.
- Counting the number of active vertices.
- Finding the maximum distance in a BFS.
Pregel supports two types of aggregators:
Per-superstep aggregator: Each vertex can contribute a value (e.g., its residual). The system aggregates (e.g., sums) all contributions and presents the result to each vertex in the next superstep. This allows vertices to see global properties, such as the total error.
Persistent aggregator: The value persists across supersteps, e.g., a counter of iterations.
Aggregators are powerful for implementing algorithms that need global information, like determining when to terminate or adapting algorithm parameters dynamically.
Mutation
Graphs are not static. Often algorithms need to add or remove vertices and edges dynamically. Pregel allows mutation of the graph structure during the computation. A vertex can:
- Add an outgoing edge (target_id, value).
- Remove an outgoing edge.
- Add a new vertex (with initial value).
- Remove itself (and its incident edges).
Mutations are not applied immediately; they are queued and applied at the end of the superstep. This ensures that the graph structure seen by compute() during a superstep is consistent. After the superstep, the graph changes are committed, and the new vertices/edges become visible in the next superstep.
This feature enables algorithms that grow the graph (e.g., crawling), or that prune irrelevant parts (e.g., removing dead ends in PageRank).
Fault Tolerance in Pregel
Fault tolerance is critical for long-running jobs. Pregel uses a checkpointing approach based on the barrier synchronization.
How Checkpointing Works
At the beginning of a superstep (or at a user-specified interval), the master initiates a checkpoint. Each worker saves the state of its vertices and edges to persistent storage (e.g., GFS). The master records that a checkpoint has been taken after superstep S.
If a worker fails during superstep S, the master detects it via heartbeat timeout. It then reassigns the failed worker’s partition to one or more live workers. Those workers load the most recent checkpoint (which is after superstep S-1) and replay the computation from that point. Since all messages sent in superstep S are lost when the failed worker went down, the master also instructs all other workers to replay superstep S from the checkpoint. This ensures that the computation is deterministic: every worker re-executes the same superstep S with the same input, regenerating the lost messages.
Key benefits:
- Only the most recent superstep needs to be replayed, not the entire job.
- The checkpoint overhead is proportional to the graph size, but it can be amortized over many iterations (e.g., checkpoint every 10 supersteps).
Semantic Guarantees
Pregel provides exactly-once message delivery semantics from the perspective of a superstep. Because of checkpointing and replay, every message sent in superstep S is received exactly once at the start of superstep S+1, even in the presence of failures (as long as the system has enough replicas to recover the lost state).
This is a strong guarantee that simplifies algorithm correctness: the developer does not need to worry about duplicates or lost messages.
Handling Master Failure
The master itself can be made fault-tolerant using standard techniques (e.g., ZooKeeper leader election). Since the master’s state (which superstep, which partitions assigned to which workers) is relatively small, it can be replicated. If the master fails, a new master is elected and restores the state from the checkpoint.
Limitations of Checkpointing
The main drawback is the cost of writing the full graph state to disk. For graphs with billions of vertices, checkpointing can take minutes. In practice, engineers choose a checkpoint interval that balances recovery time with overhead. For extremely large graphs, incremental checkpointing or lineage-based recovery (like Spark’s RDDs) might be more efficient, but Pregel’s approach was sufficient for Google’s workloads.
Performance and Optimizations
Graph Partitioning
The performance of Pregel depends heavily on the quality of graph partitioning. The initial partition is typically done by hashing the vertex ID. This is simple but often leads to many cross-partition edges (high cut). A better approach is to use graph partitioning algorithms (e.g., METIS, ParMETIS) that try to minimize the number of edges crossing partitions, while balancing the number of vertices per partition. However, these algorithms are themselves expensive and may not be feasible for trillion-edge graphs.
Google used a technique called edge-cut partitioning: they assign vertex IDs to partitions such that the number of edges between partitions is minimized. For some use cases, they also used vertex-cut partitioning (where edges are assigned to machines, and vertices may be replicated), but that is more characteristic of later systems like PowerGraph.
Straggler Mitigation
Stragglers—workers that are significantly slower than others—are a major problem. Pregel uses a few techniques:
- Speculative execution: If a worker is taking too long, the master can start a backup copy of the slow worker’s partition on another worker. The first to finish is used, and the other is killed. This is similar to MapReduce’s speculative execution.
- Dynamic load balancing: Within a worker, it can redistribute vertices if it detects that some vertices (high degree) are taking disproportionate time. However, Pregel’s initial design did not include dynamic rebalancing; it left that to the partitioning step.
Message Batching
To reduce network overhead, Pregel batches messages before sending. A worker collects all messages destined for a particular remote worker and sends them in a single network packet. This reduces TCP connection overhead and improves throughput.
Combining and Aggregating
As mentioned, combiners can drastically reduce the number of messages. For PageRank, the number of messages sent over the network can be reduced by a factor equal to the average in-degree (since many incoming contributions are summed per vertex).
Memory Management
Vertices and edges are kept in memory. For graphs larger than available RAM, Pregel must spill to disk, which is very slow. Google’s clusters were large enough to keep their graphs in memory. This is a key limitation: Pregel assumes the graph fits in the aggregate memory of the cluster. For graphs that are truly massive (e.g., exceeding total cluster memory), alternative approaches (e.g., out-of-core or streaming) are needed.
Comparison with Other Systems
Pregel was the first major system to popularize the vertex-centric BSP model, but many successors have refined the approach.
Apache Giraph
Giraph is the open-source implementation of Pregel under the Apache Software Foundation, originally developed by Yahoo. It runs on top of Hadoop and uses ZooKeeper for coordination. Giraph is essentially identical to Pregel in its architecture, but with some modern improvements:
- It can read input from HDFS (any format) and write output to HDFS.
- It supports worker-to-worker communication via Netty (a high-performance NIO library).
- It includes built-in combiners and aggregators.
- It has been used for massive graph processing at Facebook and other companies.
Giraph inherits Pregel’s limitations (memory requirement, BSP overhead) but has been optimized with edge partitioning and other features.
Apache Spark GraphX
GraphX is a distributed graph processing library built on top of Spark’s RDD abstraction. It uses a vertex-cut partitioning strategy and a property graph model (vertices and edges have arbitrary attributes). GraphX does not strictly follow the BSP model; instead, it implements graph operations using Spark’s join and aggregation transformations, which are more flexible but can incur overhead due to shuffle.
GraphX advantages:
- Seamless integration with other Spark components (MLlib, SQL, streaming).
- Resilient to failures via RDD lineage (no checkpointing needed).
- Supports both vertex-centric (Pregel API) and graph-parallel (triangle counting, connected components) operations.
Disadvantages:
- Higher latency per iteration due to Spark’s lineage tracking and potential spilling to disk.
- Not as pure in the BSP model; the Pregel API is emulated, which can be slower than native Pregel implementations.
PowerGraph (GraphLab)
PowerGraph (now part of GraphLab / Dato / Turi) introduced a GAS (Gather-Apply-Scatter) model that is more expressive than Pregel for algorithms with high-degree vertices. It uses vertex-cut partitioning (splitting a high-degree vertex across multiple machines) to avoid the “celebrity vertex” bottleneck. This allows algorithms like Alternating Least Squares (ALS) for matrix factorization to run efficiently, which are hard to express in Pregel.
PowerGraph also supports asynchronous execution, avoiding global barriers when possible. This can lead to faster convergence for some algorithms.
Comparison Table
| Feature | Pregel/Giraph | GraphX | PowerGraph |
|---|---|---|---|
| Model | Vertex-centric BSP | Vertex-centric + relational | GAS (vertex-cut) |
| Partitioning | Edge-cut | Edge-cut + vertex-cut | Vertex-cut |
| Consistency | Bulk synchronous (barrier) | Bulk synchronous (Spark) | Asynchronous possible |
| Fault tolerance | Checkpointing | RDD lineage | Checkpointing + lineage |
| Memory requirement | All graph in memory | All graph in memory | All graph in memory |
| High-degree handling | Poor (stragglers) | Moderate (skewed joins) | Good (distributes vertex) |
| Typical use | PageRank, SSSP, Connected Comp | General graph + ML | ML, recommendation, graph mining |
Real-World Applications of Pregel
Pregel has been used for a wide variety of practical problems at Google:
- PageRank and Web Graph Analysis: Computing importance scores for billions of web pages to power Google’s search ranking.
- Shortest Path Computation: For Google Maps, computing the fastest routes across the entire road network.
- Connected Components: Identifying clusters of related pages (e.g., for detecting spam farms).
- Semi-Clustering: Finding communities in social graphs for friend recommendations.
- Social Graph Mining: Analyzing the structure of Google+ (now defunct) to detect influence and engagement.
Outside of Google, companies like Facebook use Giraph to compute the “Friends of Friends” recommendations, LinkedIn uses it for “People You May Know”, and Twitter has used it for trending topics and graph analytics.
Limitations and Lessons Learned
While Pregel was groundbreaking, it is not a silver bullet. Its limitations drove the development of later systems:
- Memory Bound: The entire graph must fit in the aggregate RAM of the cluster. For graphs larger than that, Pregel becomes unusable. This motivates the need for out-of-core or streaming systems.
- Barrier Synchronization: The global barrier can be wasteful for algorithms that converge at different rates or where some parts of the graph stabilize quickly. Asynchronous systems (like PowerGraph’s asynchronous mode) can be faster.
- High-Degree Vertices: Pregel is inefficient for graphs with power-law degree distributions. A single vertex with billions of edges can overload a single worker. Vertex-cut partitioning mitigates this.
- Complexity of Real Graphs: Real graphs are dynamic; edges are added and removed constantly. Pregel’s mutation feature is limited and expensive for incremental updates. Streaming graph processing systems (e.g., Naiad, Flink) are better suited for continuous updates.
- Expressiveness: Some algorithms (e.g., collaborative filtering via matrix factorization) are hard to express as vertex-centric computations. The GAS model is more expressive.
Despite these limitations, Pregel’s influence is undeniable. It showed that a simple abstraction could make distributed graph processing accessible and efficient for a wide class of problems.
The Future of Graph Processing
Where do we go from here? Modern graph processing systems are pushing the boundaries in several directions:
- GPU-Accelerated Graph Processing: Using GPUs to massively parallelize vertex computation, though memory and bandwidth constraints remain.
- Shared-Nothing vs. Shared-Memory: Hybrid approaches that use RDMA (Remote Direct Memory Access) to build shared-memory abstractions over distributed clusters, as in systems like Grappa.
- Out-of-Core Processing: Systems like GraphChi and TurboGraph process graphs from disk, enabling trillion-edge graphs on a single machine.
- Streaming and Dynamic Graphs: Frameworks like Timely Dataflow (Naiad) support iterative and streaming graph computations with low latency.
- Graph Neural Networks: A new wave of graph processing is driven by deep learning on graphs, requiring specialized systems (e.g., PyTorch Geometric, DGL) that combine message passing with automatic differentiation.
Pregel planted the seed. Now the garden is full of diverse, powerful trees.
Conclusion
Pregel was not the first distributed graph processing system, nor the last. But it was the one that got the abstraction right. By embracing the BSP model and exposing a vertex-centric API, Pregel allowed developers to write graph algorithms that were both elegant and scalable. It gracefully handled failures, load imbalance, and network communication, hiding the gritty reality of distributed systems behind a clean interface.
The “unreasonable effectiveness of graphs” remains true today. As our world becomes more connected—through IoT, social media, biological networks, and knowledge graphs—the need for efficient, scalable graph processing will only grow. Pregel showed us that the path forward is not to build more complex systems, but to find the right abstractions that simplify complexity.
And that is the real lesson of Pregel: sometimes the most powerful tool is not the fastest engine, but the one that makes the impossible problem feel almost trivial.