Performance
- CPU Microarchitecture: Pipelines, Out-of-Order Execution, and Modern Performance
· 2025-12-04
An in-depth exploration of CPU microarchitecture: instruction pipelines, hazards, branch prediction, out-of-order execution, register renaming, superscalar and SIMD units, and how software maps to hardware for performance.
- Learned Indexes: When Models Replace B‑Trees
· 2025-10-04
A practitioner's guide to learned indexes: how they work, when they beat classic data structures, and what it takes to ship them without getting paged.
- The 100‑Microsecond Rule: Why Tail Latency Eats Your Throughput (and How to Fight Back)
· 2025-10-04
A field guide to taming P99 in modern systems—from queueing math to NIC interrupts, from hedged requests to adaptive concurrency. Practical patterns, pitfalls, and a blueprint you can apply this week.
- Queueing Theory for Systems Engineers: From M/M/1 to Heavy-Tail Distributions and Tail-at-Scale
· 2025-07-18
Master queueing theory as a practical tool for systems design: the M/M/1 model, Little's Law, Jackson networks, the dramatic impact of heavy-tailed service times on tail latency, and how to apply these insights to load balancers, microservices, and capacity planning.
- Kernel Bypass Networking: DPDK, io_uring, and the RDMA Revolution
· 2025-02-10
Dive into how modern systems escape the kernel networking stack for microsecond-scale performance: DPDK's poll-mode drivers, io_uring's submission rings, RDMA's one-sided operations, and the trade-offs each approach demands.
- Tuning CUDA with the GPU Memory Hierarchy
· 2024-11-27
Global, shared, and register memory each have distinct latency and bandwidth. Performance comes from the right access pattern.
- Lock-Free Data Structures: Concurrency Without the Wait
· 2024-07-18
Explore how lock-free algorithms achieve thread-safe data access without traditional locks. Learn the theory behind compare-and-swap, the ABA problem, memory ordering, and practical implementations that power high-performance systems.
- Memory Allocators: From malloc to Modern Arena Allocators
· 2023-09-14
A deep dive into memory allocation strategies, from the classic malloc implementations to modern arena allocators, jemalloc, tcmalloc, and custom allocators that power high-performance systems.
- TCP Congestion Control: From Slow Start to BBR
· 2023-02-11
A comprehensive exploration of TCP congestion control algorithms, from classic approaches like Tahoe and Reno to modern innovations like BBR. Learn how these algorithms balance throughput, fairness, and latency across diverse network conditions.
- Garbage Collection Algorithms: From Mark-and-Sweep to ZGC
· 2022-11-22
A comprehensive exploration of garbage collection algorithms, from classic mark-and-sweep to modern concurrent collectors like G1, Shenandoah, and ZGC. Learn how automatic memory management works and the trade-offs that shape collector design.
- Teaching GraphQL to Cache at the Edge
· 2022-09-03
A deep dive into making GraphQL play nicely with edge caches without breaking declarative APIs.
- CPU Caches and Cache Coherence: The Memory Hierarchy That Makes Modern Computing Fast
· 2022-07-12
A comprehensive exploration of how CPU caches bridge the processor-memory speed gap. Learn about cache architecture, replacement policies, coherence protocols, and how to write cache-friendly code for maximum performance.
- Virtual Memory and Page Tables: How Modern Systems Manage Memory
· 2022-05-19
A comprehensive exploration of virtual memory, page tables, and address translation. Learn how operating systems provide memory isolation, enable overcommitment, and optimize performance with TLBs and huge pages.
- Branch Prediction and Speculative Execution: How Modern CPUs Gamble on the Future
· 2021-08-15
Explore how modern processors predict branch outcomes and execute instructions speculatively, the algorithms behind branch predictors, the performance implications for your code, and the security vulnerabilities like Spectre that emerged from these optimizations.
- B-Trees and LSM-Trees: The Foundations of Modern Storage Engines
· 2021-07-14
An in-depth exploration of B-Trees and LSM-Trees, the two dominant data structures powering databases from PostgreSQL to RocksDB. Learn their trade-offs, internal mechanics, and when to choose each for your workload.
- CPU Caches and Memory Hierarchy: The Hidden Architecture Behind Performance
· 2021-06-22
A deep exploration of CPU cache architecture, from L1 to L3 caches, cache lines, associativity, replacement policies, and cache coherence. Learn how memory hierarchy shapes modern software performance.
- NVMe and the Storage Stack: The NVMe Command Set, Submission/Completion Queues, SPDK, and the Death of the SCSI/SATA Bottleneck
· 2021-05-31
A deep exploration of NVMe technology — how the command set and queue model eliminate the SCSI bottleneck, and why user-space storage via SPDK achieves microsecond-latency I/O on commodity flash.
- System Calls: The Gateway Between User Space and Kernel
· 2021-04-18
An in-depth exploration of how applications communicate with the operating system kernel through system calls. Learn about the syscall interface, context switching, and how modern OSes balance security with performance.
- Cache‑Friendly Data Layouts: AoS vs. SoA (and the Hybrid In‑Between)
· 2021-03-18
How memory layout choices shape the performance of your hot loops. A practical guide to arrays‑of‑structs, struct‑of‑arrays, and hybrid layouts across CPUs and GPUs.
- Compiler Optimizations: From Source Code to Fast Machine Code
· 2020-09-23
A deep dive into how modern compilers transform your code into efficient machine code. Explore optimization passes from constant folding to loop vectorization, and learn how to write code that compilers can optimize effectively.
- Unikernels: Specializing the OS for a Single Application, from MirageOS to IncludeOS and the Performance-Security Trade-offs
· 2020-05-26
A deep exploration of unikernel architecture — how compiling an application directly into a specialized operating system kernel produces dramatic performance and security benefits while challenging decades of OS design orthodoxy.
- Speculative Prefetchers: Designing Memory Systems That Read the Future
· 2019-02-14
A field guide to building and validating speculative memory prefetchers that anticipate demand in modern CPUs and data platforms.
- Computer Architecture: A Quantitative Approach (6th ed.)
- Computer Architecture: A Quantitative Approach (6th ed.)