Performance

CPU Microarchitecture: Pipelines, Out-of-Order Execution, and Modern Performance · 2025-12-04
An in-depth exploration of CPU microarchitecture: instruction pipelines, hazards, branch prediction, out-of-order execution, register renaming, superscalar and SIMD units, and how software maps to hardware for performance.
Learned Indexes: When Models Replace B‑Trees · 2025-10-04
A practitioner's guide to learned indexes: how they work, when they beat classic data structures, and what it takes to ship them without getting paged.
The 100‑Microsecond Rule: Why Tail Latency Eats Your Throughput (and How to Fight Back) · 2025-10-04
A field guide to taming P99 in modern systems—from queueing math to NIC interrupts, from hedged requests to adaptive concurrency. Practical patterns, pitfalls, and a blueprint you can apply this week.
Tuning CUDA with the GPU Memory Hierarchy · 2024-11-27
Global, shared, and register memory each have distinct latency and bandwidth. Performance comes from the right access pattern.
Lock-Free Data Structures: Concurrency Without the Wait · 2024-07-18
Explore how lock-free algorithms achieve thread-safe data access without traditional locks. Learn the theory behind compare-and-swap, the ABA problem, memory ordering, and practical implementations that power high-performance systems.
Memory Allocators: From malloc to Modern Arena Allocators · 2023-09-14
A deep dive into memory allocation strategies, from the classic malloc implementations to modern arena allocators, jemalloc, tcmalloc, and custom allocators that power high-performance systems.
TCP Congestion Control: From Slow Start to BBR · 2023-02-11
A comprehensive exploration of TCP congestion control algorithms, from classic approaches like Tahoe and Reno to modern innovations like BBR. Learn how these algorithms balance throughput, fairness, and latency across diverse network conditions.
Garbage Collection Algorithms: From Mark-and-Sweep to ZGC · 2022-11-22
A comprehensive exploration of garbage collection algorithms, from classic mark-and-sweep to modern concurrent collectors like G1, Shenandoah, and ZGC. Learn how automatic memory management works and the trade-offs that shape collector design.
Teaching GraphQL to Cache at the Edge · 2022-09-03
A deep dive into making GraphQL play nicely with edge caches without breaking declarative APIs.
CPU Caches and Cache Coherence: The Memory Hierarchy That Makes Modern Computing Fast · 2022-07-12
A comprehensive exploration of how CPU caches bridge the processor-memory speed gap. Learn about cache architecture, replacement policies, coherence protocols, and how to write cache-friendly code for maximum performance.
Virtual Memory and Page Tables: How Modern Systems Manage Memory · 2022-05-19
A comprehensive exploration of virtual memory, page tables, and address translation. Learn how operating systems provide memory isolation, enable overcommitment, and optimize performance with TLBs and huge pages.
Branch Prediction and Speculative Execution: How Modern CPUs Gamble on the Future · 2021-08-15
Explore how modern processors predict branch outcomes and execute instructions speculatively, the algorithms behind branch predictors, the performance implications for your code, and the security vulnerabilities like Spectre that emerged from these optimizations.
B-Trees and LSM-Trees: The Foundations of Modern Storage Engines · 2021-07-14
An in-depth exploration of B-Trees and LSM-Trees, the two dominant data structures powering databases from PostgreSQL to RocksDB. Learn their trade-offs, internal mechanics, and when to choose each for your workload.
CPU Caches and Memory Hierarchy: The Hidden Architecture Behind Performance · 2021-06-22
A deep exploration of CPU cache architecture, from L1 to L3 caches, cache lines, associativity, replacement policies, and cache coherence. Learn how memory hierarchy shapes modern software performance.
System Calls: The Gateway Between User Space and Kernel · 2021-04-18
An in-depth exploration of how applications communicate with the operating system kernel through system calls. Learn about the syscall interface, context switching, and how modern OSes balance security with performance.
Cache‑Friendly Data Layouts: AoS vs. SoA (and the Hybrid In‑Between) · 2021-03-18
How memory layout choices shape the performance of your hot loops. A practical guide to arrays‑of‑structs, struct‑of‑arrays, and hybrid layouts across CPUs and GPUs.
Compiler Optimizations: From Source Code to Fast Machine Code · 2020-09-23
A deep dive into how modern compilers transform your code into efficient machine code. Explore optimization passes from constant folding to loop vectorization, and learn how to write code that compilers can optimize effectively.
Speculative Prefetchers: Designing Memory Systems That Read the Future · 2019-02-14
A field guide to building and validating speculative memory prefetchers that anticipate demand in modern CPUs and data platforms.
Computer Architecture: A Quantitative Approach (6th ed.)
Computer Architecture: A Quantitative Approach (6th ed.)
Computer Systems: A Programmer's Perspective (3rd ed.)
Computer Systems: A Programmer's Perspective (3rd ed.)
Improving the Scalability and Performance of a Rails Application: A Case Study with Consul