Storage
- Database Internals: Storage Engines, Transactions, and Recovery
· 2025-12-21
A deep technical walkthrough of how databases store data, ensure correctness, and recover from crashes — covering B-trees, LSM-trees, write-ahead logging, MVCC, isolation levels, and replication.
- Learned Indexes: When Models Replace B‑Trees
· 2025-10-04
A practitioner's guide to learned indexes: how they work, when they beat classic data structures, and what it takes to ship them without getting paged.
- GPUDirect Storage in 2025: Optimizing the End-to-End Data Path
· 2025-09-16
How modern systems move data from NVMe and object storage into GPU kernels with minimal CPU overhead and maximal throughput.
- Error-Correcting Codes: Reed-Solomon, LDPC, and How Distributed Storage Survives Failure
· 2025-05-18
Build error-correcting codes from the ground up: finite field arithmetic, Reed-Solomon encoding and decoding via Lagrange interpolation, LDPC codes and belief propagation, and how modern distributed storage systems use erasure coding to survive disk failures with minimal overhead.
- Write-Ahead Logging: The Unsung Hero of Database Durability
· 2024-09-10
Dive deep into write-ahead logging (WAL), the technique that lets databases promise durability without sacrificing performance. Learn how WAL works, why it matters, and how modern systems push its limits.
- File Systems and Storage Internals: How Data Persists on Disk
· 2023-09-22
A comprehensive exploration of file system architecture, from inodes and directories to journaling and copy-on-write. Understand how operating systems organize, protect, and efficiently access persistent data.
- Distributed File Systems: GFS Design, HDFS Architecture, the Colossus Evolution, and Single-Master Metadata Bottlenecks
· 2021-06-18
A deep exploration of distributed file systems — how Google's GFS pioneered the single-master model, how HDFS adapted it for the Hadoop ecosystem, and how modern systems have evolved beyond the single-master bottleneck.
- Persistent Memory Programming: DAX Mappings, PMDK Libraries, Crash Consistency Without Write-Ahead Logging, and the Optane Legacy
· 2021-06-14
A deep exploration of persistent memory — how DAX enables direct byte-addressable access to non-volatile memory, how the PMDK libraries solve the crash consistency problem at the instruction level, and the lessons of Intel Optane.
- NVMe and the Storage Stack: The NVMe Command Set, Submission/Completion Queues, SPDK, and the Death of the SCSI/SATA Bottleneck
· 2021-05-31
A deep exploration of NVMe technology — how the command set and queue model eliminate the SCSI bottleneck, and why user-space storage via SPDK achieves microsecond-latency I/O on commodity flash.
- Merkle Trees and Content‑Addressable Storage
· 2020-08-17
From Git to distributed object stores: how Merkle DAGs enable integrity, deduplication, and efficient sync.