Blog
Posts, notes, and articles.

Memory Allocation and Garbage Collection: How Programs Manage Memory
2025-02-20A deep dive into how programming languages allocate, track, and reclaim memory. Understand malloc internals, garbage collection algorithms, and the trade-offs that shape runtime performance.

Scheduling: Trading Latency for Throughput (and Back Again)
2025-02-12Queue disciplines, work stealing, and CPU affinity: how scheduler choices shape p50/p99, and when to bias for one over the other.

RISC-V: The Open ISA Revolution and the Cambrian Explosion of Processor Design
2025-02-11How a Berkeley research project became the Linux of instruction sets, rewiring the economics of custom silicon from embedded MCUs to vector supercomputers with the RVV extension and the CHERI security story.

Kernel Bypass Networking: DPDK, io_uring, and the RDMA Revolution
2025-02-10Dive into how modern systems escape the kernel networking stack for microsecond-scale performance: DPDK's poll-mode drivers, io_uring's submission rings, RDMA's one-sided operations, and the trade-offs each approach demands.

Processing-in-Memory: UPMEM, Samsung HBM-PIM, and the Near-Data Computing Paradigm
2025-02-10How moving compute to where the bits live rewrites the rules of memory-bound computation, from UPMEM's DRAM-scale PIM to Samsung's HBM-PIM and the programming model that still keeps us up at night.

Linearizability and Serializability: A Formal Hierarchy of Consistency Models
2025-01-28Build a rigorous understanding of consistency models from linearizability to eventual consistency, with formal definitions, counterexamples, and the practical implications for distributed database design.

Exactly-Once in Streaming: What It Means and How Systems Achieve It
2025-01-22Disentangle marketing from mechanisms: idempotence, transactions, and state snapshots behind ‘exactly-once’.

The FLP Impossibility Result: Why Distributed Consensus Is Fundamentally Hard
2025-01-15Explore the landmark Fischer-Lynch-Paterson result that proved no deterministic algorithm can achieve consensus in an asynchronous system with even one faulty process — and how the field evolved around this impossibility.

Neuromorphic Computing: Loihi 2, TrueNorth, Spiking Networks, and Where Neuromorphic Wins
2025-01-05A deep survey of neuromorphic computing from IBM TrueNorth and Intel Loihi 2 through spiking neural networks, STDP learning, event-driven computation, and the application domains where neuromorphic excels and where it falls short.

Optical Computing: Silicon Photonics, Optical Matrix Multiplication, and the Integration Challenges
2024-12-27A deep analysis of optical computing from silicon photonic interconnects through optical matrix multiplication for AI, examining the energy-latency promise against the formidable integration challenges.

Tuning CUDA with the GPU Memory Hierarchy
2024-11-27Global, shared, and register memory each have distinct latency and bandwidth. Performance comes from the right access pattern.

Write-Ahead Logging: The Unsung Hero of Database Durability
2024-09-10Dive deep into write-ahead logging (WAL), the technique that lets databases promise durability without sacrificing performance. Learn how WAL works, why it matters, and how modern systems push its limits.

Network Topologies for HPC: Fat-Trees, Dragonfly, Torus, and the Cost-Diameter-Bandwidth Optimization
2024-08-31A rigorous survey of HPC network topologies—fat-tree (InfiniBand), Dragonfly (Cray Cascade), torus (Blue Gene), Slim Fly—analyzing the fundamental tradeoffs in cost, diameter, bisection bandwidth, and fault tolerance.

Bloom Filters and Probabilistic Data Structures: Trading Certainty for Speed
2024-08-22Explore how Bloom filters, Count-Min sketches, and HyperLogLog sacrifice perfect accuracy for dramatic space and time savings—and learn when that trade-off makes sense.

Seeing in the Dark: Observability for Edge AI Fleets
2024-08-16A practitioner's guide to instrumenting, monitoring, and debugging machine learning models running at the edge.

Adaptive Feature Flag Frameworks for Hyper-Growth SaaS
2024-08-15A comprehensive field guide to building resilient, data-db7735b feature flag platforms that keep hyper-growth SaaS releases safe, fast, and customer-centric.

Lock-Free Data Structures: Concurrency Without the Wait
2024-07-18Explore how lock-free algorithms achieve thread-safe data access without traditional locks. Learn the theory behind compare-and-swap, the ABA problem, memory ordering, and practical implementations that power high-performance systems.

Amdahl’s Law vs. Gustafson’s Law: What They Really Predict
2024-06-15When does parallelism pay off? Compare Amdahl’s and Gustafson’s models, see where each applies, and learn how to reason about speedups in practice.

Concurrency Primitives and Synchronization: From Spinlocks to Lock-Free Data Structures
2024-03-15A comprehensive exploration of concurrent programming fundamentals, covering mutexes, spinlocks, semaphores, condition variables, memory ordering, and lock-free programming techniques that enable safe parallel execution.

Unicode and Character Encoding: From ASCII to UTF-8 and Beyond
2024-03-15A comprehensive guide to how computers represent text. Understand the evolution from ASCII through Unicode, the mechanics of UTF-8 encoding, and how to handle text correctly in modern software.

Interconnects: PCIe, CXL, NVLink, and the Emerging Composable-Disaggregated Architecture
2024-03-08A deep technical survey of modern interconnects—PCIe generations 1-6, CXL.io/cache/memory protocols, NVLink and NVSwitch—and how they enable composable-disaggregated infrastructure.

Transactional Memory: HTM, STM, and Why Intel TSX Kept Getting Disabled
2024-02-25A deep analysis of transactional memory—hardware (Intel TSX, IBM POWER), software (STM), the transactional lock elision pattern, and the bug saga that repeatedly forced Intel to disable TSX via microcode.

Simultaneous Multithreading: Resource Sharing, Security Implications, and the SMT Performance-Security Tradeoff
2024-02-01A deep dive into SMT/Hyper-Threading: how frontend and backend resources are shared between threads, the security vulnerabilities like PortSmash and TLBleed, and the evolving performance-security tradeoff.

Countdown to Quantum: Migrating an Enterprise to Post-Quantum Cryptography
2024-01-29Practical lessons from a multi-year effort to adopt quantum-safe cryptography without breaking production.