Blog
Posts, notes, and articles.
- 0 - /blog/inside-vector-databases-building-retrieval-augmented-systems-that-scale/ (section=blog)
- 1 - /blog/learned-indexes-when-models-replace-btrees/ (section=blog)
- 2 - /blog/the-100microsecond-rule-why-tail-latency-eats-your-throughput-and-how-to-fight-back/ (section=blog)
- 3 - /blog/the-quiet-calculus-of-probabilistic-commutativity/ (section=blog)
- 4 - /blog/the-hidden-backbone-of-parallelism-how-prefix-sums-power-distributed-computation/ (section=blog)
- 5 - /blog/gpudirect-storage-in-2025-optimizing-the-end-to-end-data-path/ (section=blog)
- 6 - /blog/mpi-vs.-openmp-in-2025-where-each-wins/ (section=blog)
- 7 - /blog/from-mapreduce-to-spark-the-arc-of-data-parallel-systems/ (section=blog)
- 8 - /blog/auditing-the-algorithm-building-a-responsible-ai-pipeline-that-scales/ (section=blog)
- 9 - /blog/scheduling-trading-latency-for-throughput-and-back-again/ (section=blog)
- 10 - /blog/exactly-once-in-streaming-what-it-means-and-how-systems-achieve-it/ (section=blog)
- 11 - /blog/tuning-cuda-with-the-gpu-memory-hierarchy/ (section=blog)
- 12 - /blog/seeing-in-the-dark-observability-for-edge-ai-fleets/ (section=blog)
- 13 - /blog/adaptive-feature-flag-frameworks-for-hyper-growth-saas/ (section=blog)
- 14 - /blog/amdahls-law-vs.-gustafsons-law-what-they-really-predict/ (section=blog)
- 15 - /blog/countdown-to-quantum-migrating-an-enterprise-to-post-quantum-cryptography/ (section=blog)
- 16 - /blog/sealing-the-supply-chain-zero-trust-build-pipelines-that-scale/ (section=blog)
- 17 - /blog/reverse-indexing-and-inverted-files-how-search-engines-fly/ (section=blog)
- 18 - /blog/latency-aware-edge-inference-platforms-engineering-consistent-ai-experiences/ (section=blog)
- 19 - /blog/keeping-the-model-awake-building-a-self-healing-ml-inference-platform/ (section=blog)
- 20 - /blog/timeouts-retries-and-idempotency-keys-a-practical-guide/ (section=blog)
- 21 - /blog/teaching-graphql-to-cache-at-the-edge/ (section=blog)
- 22 - /blog/designing-crdt-powered-collaboration-platforms-that-stay-consistent/ (section=blog)
- 23 - /blog/instrumenting-without-spying-privacy-preserving-telemetry-at-scale/ (section=blog)
- 24 - /blog/deterministic-monorepo-ci-platforms-engineering-consistency-at-scale/ (section=blog)
- 25 - /blog/cachefriendly-data-layouts-aos-vs.-soa-and-the-hybrid-inbetween/ (section=blog)
- 26 - /blog/raft-fastcommit-and-prevote-in-practice/ (section=blog)
- 27 - /blog/safe-rollback-strategies-for-distributed-databases/ (section=blog)
- 28 - /blog/merkle-trees-and-contentaddressable-storage/ (section=blog)
- 29 - /blog/tuning-the-dial-adaptive-consistency-at-planet-scale/ (section=blog)
- 30 - /blog/when-data-centers-learned-to-sleep-energy-aware-scheduling-in-practice/ (section=blog)
- 31 - /blog/speculative-prefetchers-designing-memory-systems-that-read-the-future/ (section=blog)

Inside Vector Databases: Building Retrieval-Augmented Systems that Scale
2025-10-26How modern vector databases ingest, index, and serve embeddings for production retrieval-augmented generation systems without falling over.

Learned Indexes: When Models Replace B‑Trees
2025-10-04A practitioner's guide to learned indexes: how they work, when they beat classic data structures, and what it takes to ship them without getting paged.

The 100‑Microsecond Rule: Why Tail Latency Eats Your Throughput (and How to Fight Back)
2025-10-04A field guide to taming P99 in modern systems—from queueing math to NIC interrupts, from hedged requests to adaptive concurrency. Practical patterns, pitfalls, and a blueprint you can apply this week.

The Quiet Calculus of Probabilistic Commutativity
2025-09-27A practical calculus for quantifying when non-commutative operations in distributed systems can be safely executed without heavyweight coordination.

The Hidden Backbone of Parallelism: How Prefix Sums Power Distributed Computation
2025-09-21Discover how the humble prefix sum (scan) quietly powers GPUs, distributed clusters, and big data frameworks—an obscure but essential building block of parallel and distributed computation.

GPUDirect Storage in 2025: Optimizing the End-to-End Data Path
2025-09-16How modern systems move data from NVMe and object storage into GPU kernels with minimal CPU overhead and maximal throughput.

MPI vs. OpenMP in 2025: Where Each Wins
2025-07-04A practical guide to choosing between message passing and shared-memory parallelism for modern HPC and hybrid nodes.

From MapReduce to Spark: The Arc of Data-Parallel Systems
2025-05-19MapReduce taught fault-tolerant batch at scale; Spark generalized it with resilient distributed datasets (RDDs) and DAG scheduling.

Auditing the Algorithm: Building a Responsible AI Pipeline That Scales
2025-04-05How we operationalized responsible AI with automated audits, governance rituals, and transparent reporting.

Scheduling: Trading Latency for Throughput (and Back Again)
2025-02-12Queue disciplines, work stealing, and CPU affinity: how scheduler choices shape p50/p99, and when to bias for one over the other.

Exactly-Once in Streaming: What It Means and How Systems Achieve It
2025-01-22Disentangle marketing from mechanisms: idempotence, transactions, and state snapshots behind ‘exactly-once’.

Tuning CUDA with the GPU Memory Hierarchy
2024-11-27Global, shared, and register memory each have distinct latency and bandwidth. Performance comes from the right access pattern.

Seeing in the Dark: Observability for Edge AI Fleets
2024-08-16A practitioner's guide to instrumenting, monitoring, and debugging machine learning models running at the edge.

Adaptive Feature Flag Frameworks for Hyper-Growth SaaS
2024-08-15A comprehensive field guide to building resilient, data-db7735b feature flag platforms that keep hyper-growth SaaS releases safe, fast, and customer-centric.

Amdahl’s Law vs. Gustafson’s Law: What They Really Predict
2024-06-15When does parallelism pay off? Compare Amdahl’s and Gustafson’s models, see where each applies, and learn how to reason about speedups in practice.

Countdown to Quantum: Migrating an Enterprise to Post-Quantum Cryptography
2024-01-29Practical lessons from a multi-year effort to adopt quantum-safe cryptography without breaking production.

Sealing the Supply Chain: Zero-Trust Build Pipelines That Scale
2023-10-08An engineer’s map for rebuilding the software supply chain around zero-trust principles without stopping delivery.

Reverse Indexing and Inverted Files: How Search Engines Fly
2023-07-19Tokenization, postings lists, skip pointers, and WAND: a tour of the data structures that make full‑text search fast.

Latency-Aware Edge Inference Platforms: Engineering Consistent AI Experiences
2023-03-12A full-stack guide to designing, deploying, and operating low-latency edge inference systems that stay predictable under real-world constraints.

Keeping the Model Awake: Building a Self-Healing ML Inference Platform
2023-02-14A field report on taming production machine learning inference with proactive healing, adaptive scaling, and human empathy.

Timeouts, Retries, and Idempotency Keys: A Practical Guide
2022-09-08Make your distributed calls safe under partial failure. How to budget timeouts, avoid retry storms, and use idempotency keys without shooting yourself in the foot.

Teaching GraphQL to Cache at the Edge
2022-09-03A deep dive into making GraphQL play nicely with edge caches without breaking declarative APIs.

Designing CRDT-Powered Collaboration Platforms that Stay Consistent
2022-08-17Deep dive into how conflict-free replicated data types underpin realtime editors, whiteboards, and multiplayer apps without sacrificing UX.

Instrumenting Without Spying: Privacy-Preserving Telemetry at Scale
2021-05-27How we rebuilt our telemetry pipeline to respect user privacy without sacrificing insight.