Performance
- Learned Indexes: When Models Replace B‑Trees
· 2025-10-04
A practitioner's guide to learned indexes: how they work, when they beat classic data structures, and what it takes to ship them without getting paged.
- The 100‑Microsecond Rule: Why Tail Latency Eats Your Throughput (and How to Fight Back)
· 2025-10-04
A field guide to taming P99 in modern systems—from queueing math to NIC interrupts, from hedged requests to adaptive concurrency. Practical patterns, pitfalls, and a blueprint you can apply this week.
- Tuning CUDA with the GPU Memory Hierarchy
· 2024-11-27
Global, shared, and register memory each have distinct latency and bandwidth. Performance comes from the right access pattern.
- Teaching GraphQL to Cache at the Edge
· 2022-09-03
A deep dive into making GraphQL play nicely with edge caches without breaking declarative APIs.
- Cache‑Friendly Data Layouts: AoS vs. SoA (and the Hybrid In‑Between)
· 2021-03-18
How memory layout choices shape the performance of your hot loops. A practical guide to arrays‑of‑structs, struct‑of‑arrays, and hybrid layouts across CPUs and GPUs.
- Speculative Prefetchers: Designing Memory Systems That Read the Future
· 2019-02-14
A field guide to building and validating speculative memory prefetchers that anticipate demand in modern CPUs and data platforms.
- Computer Architecture: A Quantitative Approach (6th ed.)
- Computer Architecture: A Quantitative Approach (6th ed.)
- Computer Systems: A Programmer's Perspective (3rd ed.)
- Computer Systems: A Programmer's Perspective (3rd ed.)
- Improving the Scalability and Performance of a Rails Application: A Case Study with Consul