Distributed-Systems
- The 100‑Microsecond Rule: Why Tail Latency Eats Your Throughput (and How to Fight Back)
· 2025-10-04
A field guide to taming P99 in modern systems—from queueing math to NIC interrupts, from hedged requests to adaptive concurrency. Practical patterns, pitfalls, and a blueprint you can apply this week.
- The Quiet Calculus of Probabilistic Commutativity
· 2025-09-27
A practical calculus for quantifying when non-commutative operations in distributed systems can be safely executed without heavyweight coordination.
- The Hidden Backbone of Parallelism: How Prefix Sums Power Distributed Computation
· 2025-09-21
Discover how the humble prefix sum (scan) quietly powers GPUs, distributed clusters, and big data frameworks—an obscure but essential building block of parallel and distributed computation.
- Timeouts, Retries, and Idempotency Keys: A Practical Guide
· 2022-09-08
Make your distributed calls safe under partial failure. How to budget timeouts, avoid retry storms, and use idempotency keys without shooting yourself in the foot.
- Raft Fast‑Commit and PreVote in Practice
· 2020-11-09
What fast‑commit and PreVote actually change in Raft, how they affect availability during leader changes, and where the footguns are.
- Tuning the Dial: Adaptive Consistency at Planet Scale
· 2020-03-11
Inside the engineering of databases that adjust consistency on the fly without breaking user trust.
- When Data Centers Learned to Sleep: Energy-Aware Scheduling in Practice
· 2019-07-19
An engineer’s chronicle of how hyperscale fleets embraced energy-aware scheduling without sacrificing latency or trust.