Distributed Systems

Content Delivery Networks: DNS-Based Routing, Anycast, Edge Caching, and the Economics of CDN Peering · 2026-03-04
How Akamai, Cloudflare, and Fastly keep the web fast — DNS-based request routing, anycast IPs, consistent hashing at the edge, and the business of CDN peering that makes it all economically viable.
Edge Computing: The Fog/Mist/Cloud Continuum, K3s, and the Computation Offloading Decision · 2026-02-16
How the edge-cloud continuum reshapes where computation happens — from K3s and MicroK8s for edge-native Kubernetes to 5G MEC, the offloading decision problem, and why latency and bandwidth dictate architecture at the edge.
Geo-Distributed Systems: WAN Latency, Multi-Leader Replication, and the Speed-of-Light Constraint · 2026-02-13
How the speed of light shapes the architecture of global-scale systems — from multi-leader and leaderless replication to CRDTs, Spanner, CockroachDB, and the fundamental tension between consistency and latency.
P2P Networks: BitTorrent's Incentives, IPFS's Merkle DAGs, and the Decentralized Web Vision · 2026-01-18
From tit-for-tat choking algorithms to content-addressed Merkle DAGs — how BitTorrent and IPFS engineered the two most successful decentralized protocols in internet history.
Distributed Hash Tables: Chord, Pastry, Kademlia, and the Structured Overlay Revolution · 2025-12-08
How consistent hashing, finger tables, prefix-based routing, and the XOR metric turned P2P networks from unscalable floods into efficient, provably correct structured overlays.
Epidemic Protocols: Gossip, HyParView, Plumtree, and the Mathematics of Infection-Style Dissemination · 2025-12-08
How push, push-pull, and pull gossip propagate information with tunable reliability guarantees — plus HyParView for membership and Plumtree for efficient broadcast in large-scale dynamic networks.
Distributed Snapshots: The Chandy-Lamport Algorithm, Lai-Yang, and the Foundations of Consistent Global State · 2025-10-31
How do you capture a consistent snapshot of a running distributed system without stopping the world? The Chandy-Lamport algorithm, its non-FIFO extension by Lai and Yang, and the deep connection to checkpointing and deadlock detection.
Inside Vector Databases: Building Retrieval-Augmented Systems that Scale · 2025-10-26
How modern vector databases ingest, index, and serve embeddings for production retrieval-augmented generation systems without falling over.
Clock Synchronization: Lamport Clocks, Vector Clocks, Hybrid Logical Clocks, and the CRDT Connection · 2025-10-08
From scalar Lamport clocks that capture causality to vector clocks that characterize it precisely, through hybrid logical clocks that bridge physical and logical time — the intellectual lineage of distributed timekeeping.
Time in Distributed Systems: NTP, PTP, TrueTime, and the Impossibility of Perfect Synchronization · 2025-10-01
From Marzullo's algorithm in NTP to hardware timestamping in PTP and Google's TrueTime in Spanner — how distributed systems wrestle with the fundamental impossibility of perfectly synchronized clocks.
GPUDirect Storage in 2025: Optimizing the End-to-End Data Path · 2025-09-16
How modern systems move data from NVMe and object storage into GPU kernels with minimal CPU overhead and maximal throughput.
Algebraic Topology in Distributed Computing: Wait-Free Solvability and Simplicial Complexes · 2025-07-06
Discover how algebraic topology — simplicial complexes, Sperner's lemma, and homology — provides the deepest known framework for understanding what concurrent and distributed tasks are fundamentally solvable, as developed in Herlihy and Shavit's 'The Art of Multiprocessor Programming'.
From MapReduce to Spark: The Arc of Data-Parallel Systems · 2025-05-19
MapReduce taught fault-tolerant batch at scale; Spark generalized it with resilient distributed datasets (RDDs) and DAG scheduling.
Linearizability and Serializability: A Formal Hierarchy of Consistency Models · 2025-01-28
Build a rigorous understanding of consistency models from linearizability to eventual consistency, with formal definitions, counterexamples, and the practical implications for distributed database design.
Exactly-Once in Streaming: What It Means and How Systems Achieve It · 2025-01-22
Disentangle marketing from mechanisms: idempotence, transactions, and state snapshots behind ‘exactly-once’.
The FLP Impossibility Result: Why Distributed Consensus Is Fundamentally Hard · 2025-01-15
Explore the landmark Fischer-Lynch-Paterson result that proved no deterministic algorithm can achieve consensus in an asynchronous system with even one faulty process — and how the field evolved around this impossibility.
Latency-Aware Edge Inference Platforms: Engineering Consistent AI Experiences · 2023-03-12
A full-stack guide to designing, deploying, and operating low-latency edge inference systems that stay predictable under real-world constraints.
Designing CRDT-Powered Collaboration Platforms that Stay Consistent · 2022-08-17
Deep dive into how conflict-free replicated data types underpin realtime editors, whiteboards, and multiplayer apps without sacrificing UX.
State Machine Replication: Viewstamped Replication Protocol, Zab (ZooKeeper Atomic Broadcast), and the Consensus-Scalability Continuum · 2021-07-27
A deep exploration of state machine replication — how Viewstamped Replication and Zab enable fault-tolerant services through ordered command execution, and how the consensus-scalability continuum shapes modern distributed systems design.