Performance

GPUDirect Storage in 2025: Optimizing the End-to-End Data Path · 2025-09-16
How modern systems move data from NVMe and object storage into GPU kernels with minimal CPU overhead and maximal throughput.
Lock-Free Data Structures: Concurrency Without the Wait · 2024-07-18
Explore how lock-free algorithms achieve thread-safe data access without traditional locks. Learn the theory behind compare-and-swap, the ABA problem, memory ordering, and practical implementations that power high-performance systems.
Amdahl’s Law vs. Gustafson’s Law: What They Really Predict · 2024-06-15
When does parallelism pay off? Compare Amdahl’s and Gustafson’s models, see where each applies, and learn how to reason about speedups in practice.
Memory Allocators: From malloc to Modern Arena Allocators · 2023-09-14
A deep dive into memory allocation strategies, from the classic malloc implementations to modern arena allocators, jemalloc, tcmalloc, and custom allocators that power high-performance systems.
Garbage Collection Algorithms: From Mark-and-Sweep to ZGC · 2022-11-22
A comprehensive exploration of garbage collection algorithms, from classic mark-and-sweep to modern concurrent collectors like G1, Shenandoah, and ZGC. Learn how automatic memory management works and the trade-offs that shape collector design.
Branch Prediction and Speculative Execution: How Modern CPUs Gamble on the Future · 2021-08-15
Explore how modern processors predict branch outcomes and execute instructions speculatively, the algorithms behind branch predictors, the performance implications for your code, and the security vulnerabilities like Spectre that emerged from these optimizations.