Distributed-Systems

Inside Vector Databases: Building Retrieval-Augmented Systems that Scale · 2025-10-26
How modern vector databases ingest, index, and serve embeddings for production retrieval-augmented generation systems without falling over.
Distributed Systems: Consensus, Consistency, and Fault Tolerance · 2025-10-20
Fundamentals of distributed systems: failure models, consensus algorithms (Paxos, Raft), CAP theorem, consistency models, gossip, membership, CRDTs, and practical testing strategies like Jepsen.
The 100‑Microsecond Rule: Why Tail Latency Eats Your Throughput (and How to Fight Back) · 2025-10-04
A field guide to taming P99 in modern systems—from queueing math to NIC interrupts, from hedged requests to adaptive concurrency. Practical patterns, pitfalls, and a blueprint you can apply this week.
The Quiet Calculus of Probabilistic Commutativity · 2025-09-27
A practical calculus for quantifying when non-commutative operations in distributed systems can be safely executed without heavyweight coordination.
The Hidden Backbone of Parallelism: How Prefix Sums Power Distributed Computation · 2025-09-21
Discover how the humble prefix sum (scan) quietly powers GPUs, distributed clusters, and big data frameworks—an obscure but essential building block of parallel and distributed computation.
TCP Congestion Control: From Slow Start to BBR · 2023-02-11
A comprehensive exploration of TCP congestion control algorithms, from classic approaches like Tahoe and Reno to modern innovations like BBR. Learn how these algorithms balance throughput, fairness, and latency across diverse network conditions.
Timeouts, Retries, and Idempotency Keys: A Practical Guide · 2022-09-08
Make your distributed calls safe under partial failure. How to budget timeouts, avoid retry storms, and use idempotency keys without shooting yourself in the foot.
Designing CRDT-Powered Collaboration Platforms that Stay Consistent · 2022-08-17
Deep dive into how conflict-free replicated data types underpin realtime editors, whiteboards, and multiplayer apps without sacrificing UX.
Raft Fast‑Commit and PreVote in Practice · 2020-11-09
What fast‑commit and PreVote actually change in Raft, how they affect availability during leader changes, and where the footguns are.
Safe Rollback Strategies for Distributed Databases · 2020-11-08
A comprehensive guide to designing, executing, and validating rollbacks in distributed database environments without compromising data integrity or customer trust.
Consistent Hashing: Distributing Data Across Dynamic Clusters · 2020-03-28
A deep dive into consistent hashing, the elegant algorithm that enables scalable distributed systems. Learn how it works, why it matters for databases and caches, and explore modern variations like jump consistent hashing and rendezvous hashing.
Tuning the Dial: Adaptive Consistency at Planet Scale · 2020-03-11
Inside the engineering of databases that adjust consistency on the fly without breaking user trust.
When Data Centers Learned to Sleep: Energy-Aware Scheduling in Practice · 2019-07-19
An engineer’s chronicle of how hyperscale fleets embraced energy-aware scheduling without sacrificing latency or trust.