A Rigorous Analysis Of The Internal Hash Table In Redis: Hash Type With Ziplist And Dict

2024-07-13 · Leonardo Benicio

Summary

A comprehensive technical exploration of a rigorous analysis of the internal hash table in redis: hash type with ziplist and dict, covering key concepts, practical implementations, and real-world applications.

Contents

Redis Hashes Unraveled: The War Between ZipList and Dict

Introduction: The Silent Gates of Memory

There is a moment every backend engineer knows. It comes not during development, but in the dead of night, or worse, during peak traffic. You look at your Redis monitoring dashboard. Latency has spiked. The p99 is red. Memory is climbing. You check your keyspace: millions of objects, many of them hashes representing user profiles, session states, or product metadata. The command HGETALL feels like it’s taking an eternity. You ask yourself: What is actually happening inside this black box?

We tend to think of Redis as a simple, deterministic tool. You set a key, you get a value. For a hash type, you HSET some fields, HGET them back. It’s fast—we know that. But the story of how Redis stores your data, and the specific, meticulous trade-offs it makes, is a masterclass in systems engineering. Redis is not just a cache; it is a memory manager that constantly performs a high-wire act between CPU cycles and RAM consumption.

Nowhere is this act more delicate, more consequential, and more misunderstood than in the internal implementation of the Hash data type. Specifically, the dual-nature encoding of Redis Hashes: the ZipList and the Dict.

At first glance, these seem like two entirely different philosophies. The ZipList is a memory-optimized, tightly packed byte array. It’s an old-school, almost assembly-like data structure designed for tiny objects. The Dict (dictionary) is a classic hash table—your standard go-to for fast key-value lookups, but with a significant memory overhead per entry. Redis, in its quest for efficiency, doesn’t just ask you to pick one. It does something far more interesting: it starts with the ZipList and, based on a strict threshold of size and field count, converts to a Dict.

This migration is the focal point of our analysis. We will dissect the structure of both encodings, explain the conversion mechanism, and evaluate the memory/performance trade-offs. By the end, you will not only understand why your HGETALL spiked but also how to tune your Redis configuration to avoid similar surprises.

1. The Two Pillars of Redis Hash Storage

Before we discuss conversion, we must understand each encoding in depth. Redis hashes are fundamentally maps from field names to values. The way they are stored internally depends on two configuration parameters: hash-max-ziplist-entries (default 512) and hash-max-ziplist-value (default 64 bytes). If the hash contains fewer fields than hash-max-ziplist-entries and every field and value is shorter than hash-max-ziplist-value, Redis uses a ZipList. Otherwise, it uses a Dict.

1.1 ZipList Deep Dive

A ZipList is a contiguous block of memory storing a sequence of entries. Each entry can hold a string or an integer. The entire ZipList is a single allocation, making it extremely cache-friendly. The layout is as follows:

zlbytes (4 bytes): total number of bytes used by the ZipList.
zltail (4 bytes): offset to the last entry (used for reverse traversal).
zllen (2 bytes): number of entries. If there are more than 65535 entries, this field is set to 65535 and the actual count must be traversed.
entries: a sequence of variable-length entries.
zlend (1 byte): special sentinel byte 0xFF.

Each entry contains:

prevlen: length of the previous entry (in bytes). This can be stored as 1 byte (if previous entry < 254 bytes) or 5 bytes (if >= 254 bytes). This field is crucial for reverse traversal (e.g., evaluating ZRANGE or HGETALL from the end).
encoding: how the current entry’s data is encoded. Encodings include:
- String encoding: 1 byte indicating length (e.g., 00pppppp for lengths 0–63, 01pppppp + 1 extra byte for lengths 64–16383, 10pppppp + 4 extra bytes for longer strings).
- Integer encoding: 2 bytes for int16, 3 bytes for int24, 5 bytes for int32, 9 bytes for int64, and 1 byte for small integer (0–12) encoded directly in the type bits.
data: the actual value, either raw bytes or a binary representation of an integer.

For a Redis hash stored as a ZipList, entries are interleaved: field, value, field, value, … So the number of entries is 2 * number_of_fields. The ZipList order is insertion order, and retrieval of a specific field requires a linear scan from the beginning.

Memory efficiency: For small fields (e.g., a field name like “name” (4 bytes) and a value “Alice” (5 bytes)), overhead per entry is only about 2–6 bytes (prevlen + encoding). In contrast, a Dict would require a 24-byte dictEntry plus separate allocations for the key and value objects. With hundreds of millions of small hashes, ZipList can save gigabytes of RAM.

Performance trade-offs:

All operations (HSET, HGET, HDEL) are O(n) in the number of fields because they must scan through the ZipList. For hashes with fewer than ~100 fields, this is fine—the scan is a tight loop over contiguous memory.
HGETALL and HSCAN also require linear scans, but they are extremely fast due to sequential memory access.
Insertion and deletion at the end are O(1) amortized (due to appending/truncation), but in the middle require shifting memory (O(N)).
Memory fragmentation is minimal because the entire hash is one allocation. However, updating a large value can cause a reallocation of the entire ZipList (expensive if the hash is huge).

When does ZipList become problematic?

As the number of fields grows beyond a few hundred, the linear scan overhead becomes noticeable. For example, a hash with 10,000 fields means each HGET scans up to 20,000 ZipList entries (10,000 fields + 10,000 values).
Large values (e.g., 10 KB strings) increase the memory cost and make reallocation expensive.
The zllen field is only 2 bytes, so hashes with > 65535 fields lose the quick count; any traversal must walk the whole list to count entries.

1.2 Dict Deep Dive

A Dict is a standard hash table with separate chaining (linked list at each bucket). Redis uses the dict.c implementation, which powers all core data structures (hashes, sets, sorted sets). The structure consists of:

A hash table (array of dictEntry* pointers) with size that is a power of two.
Two hash tables (ht[0] and ht[1]) to enable incremental rehashing.
Rehash index to track how many buckets have been migrated.

Each dictEntry contains:

key: a void* pointer to the field name (stored as a separate robj).
v: a union containing either a void* pointer to the value or a uint64_t for integer values.
next: pointer to the next entry in the same bucket (chaining).
The dictEntry itself is 24 bytes on a 64-bit system (key pointer, value pointer/uint64, next pointer). In practice, Redis also allocates a robj for each key and value, adding more overhead (e.g., a robj is 16 bytes plus the string data and its own free list overhead).

Memory overhead per field:
When a hash has, say, 100 fields stored as strings of average 20 bytes, a Dict will allocate:

100 dictEntry structures: 24 * 100 = 2400 bytes.
200 robj structures (one per key, one per value): 16 * 200 = 3200 bytes (plus overhead of jemalloc/tcmalloc metadata).
String data: 100 _ 20 (keys) + 100 _ 20 (values) = 4000 bytes.
Hash table array: e.g., for 100 entries, table size is 128 (next power of two), each pointer 8 bytes → 1024 bytes.
Total: ~10,600 bytes (excluding fragmentation) vs ZipList: same 4000 bytes of data + overhead ~ 200 entries * (2+1) bytes = ~600 bytes → 4600 bytes. So Dict uses about 2.3x memory for this small case.

Performance trade-offs:

HGET, HSET, HDEL are O(1) average. Collisions are handled with linked lists, but bucket size is small due to resizing.
HGETALL requires a full scan of the hash table (all buckets and chains). This is O(N) as well, but with pointer chasing and cache misses—often slower than scanning a ZipList of the same size.
Memory is fragmented because each dictEntry, robj, and string is individually allocated. Over time, the allocator may produce fragmentation.
Rehashing: when the load factor exceeds thresholds (default 1 for active resize, 5 for non-active), Redis starts an incremental rehash: it moves buckets from ht[0] to ht[1] one at a time during each dictionary operation. This introduces small performance jitter but avoids long pauses.

Why start with ZipList then?
Because for tiny hashes, the memory savings of ZipList (factor 2-3x) outweigh the O(N) lookup cost. For those hashes, N is tiny, so O(N) is effectively O(1) in practice. Dict overhead would waste RAM for no performance benefit.

2. The Migration Algorithm

Redis decides to convert a Hash encoding from ZipList to Dict during the execution of commands that modify the hash (HSET, HDEL, HINCRBY, etc.). It checks two conditions after the modification:

if (hashTypeLength(o) > server.hash_max_ziplist_entries ||
    hashTypeValueLength(o) > server.hash_max_ziplist_value)
{
    hashTypeConvert(o, OBJ_ENCODING_HT);
}

hashTypeLength(o) is the number of fields (not entries). For a ZipList, this is computed by hashTypeLength which traverses the ZipList counting every other entry.
hashTypeValueLength(o) checks the maximum value length among the fields. If any field or value exceeds hash-max-ziplist-value, conversion triggers.

Note: the conversion is one-way. Once a hash becomes a Dict, it will never revert to ZipList, even if fields are deleted and the hash becomes small again. This is a pragmatic design—an attempt to revert would require a full conversion (and possibly cycle back and forth). Redis 7 introduced hash-max-listpack-entries and a similar mechanism, but still one-way.

The conversion process is straightforward but must be atomic:

Allocate a new dict (empty).
Walk through the ZipList, extract each field-value pair, and insert into the dict using dictAddRaw or dictReplace.
Free the old ZipList memory and point the Redis object to the new dict.

This is O(N) and can block the event loop for a large hash. In practice, conversion is rare because most hashes stay small. If you have a hash with 500 fields (just under threshold) and add one more field that pushes it over, the conversion will process 500 fields instantly – a small microsecond cost. But if you set hash-max-ziplist-entries to a very high value (e.g., 10000) and then a hash grows to 10001 fields, conversion will take measurable time (e.g., 10ms depending on value sizes). That’s a risk.

Default thresholds:

hash-max-ziplist-entries: 512
hash-max-ziplist-value: 64 bytes

These are good for general-purpose use. For session caches storing 5-10 fields per user (name, email, last_login, etc.), values are often short, so ZipList is perfect. For product catalogs with dozens of attributes (some long descriptions), Dict may be better from the start.

3. Memory Analysis: A Quantitative Comparison

Let’s compare memory usage for a typical hash: a user profile with 10 fields, each field name 8 bytes, each value 20 bytes. We’ll assume strings (not integers).

3.1 ZipList Memory

Fields: 10 names (8 bytes each) + 10 values (20 bytes each) = 280 bytes of raw data.
Overhead per entry:
- For each entry, prevlen and encoding fields. Assuming strings with length less than 64 bytes, encoding uses 1 byte, prevlen uses 1 byte (since each entry is small). So overhead = 2 bytes per entry.
- Total entries: 20 (since field-value pairs are stored as separate entries). Overhead: 40 bytes.
Plus ZipList header: 4 (zlbytes) + 4 (zltail) + 2 (zllen) + 1 (zlend) = 11 bytes.
Total: 280 + 40 + 11 = 331 bytes.

Add optional fragmentation: none (contiguous allocation). Real memory allocation from allocator: typically rounded to next power of two (e.g., 512 bytes). But still far less than Dict.

3.2 Dict Memory

10 field-value pairs → 10 entries in the hash table.
dictEntry structures: 24 bytes each × 10 = 240 bytes.
robj for each key (10) and each value (10): 16 bytes each × 20 = 320 bytes.
String data: keys (8 bytes each) + values (20 bytes each) = 280 bytes. However, each string is stored as an sds (a dynamic string). sds overhead: for strings of length 8, sds uses sizeof(sdshdr5)? Actually sds header is 0 bytes for short strings using sdshdr5 (only 3 bits left). But in practice, small strings use sdshdr8 which adds 1 byte header + maybe alignment). Approx overhead: 1 byte per string. So ~20 bytes overhead.
Hash table array: for 10 entries, table size is 16 (next power of two). 16 pointers = 128 bytes.
Total: 240 + 320 + 280 + 20 + 128 = 988 bytes.

And allocator fragmentation: individual allocations mean overhead of ~8-16 bytes per allocation (malloc bookkeeping). Roughly 20 allocations (10 dictEntry + 20 robj? Actually dictEntry and robj are separate allocations. Plus each sds string. That’s 10+20+20=50 allocations. Fragmentation could add 50*8=400 bytes. So total ~1400 bytes.

ZipList uses 331 bytes vs Dict uses 1400 bytes – a factor of 4.2x memory savings.

Now consider a hash with 500 fields, each name 8 bytes, value 64 bytes (max). ZipList: data = 5008 + 50064 = 36,000 bytes. Overhead per entry: 2 bytes each → 2000 bytes (1000 entries). Header: 11 bytes. Total ~38,011 bytes. Dict: 500 dictEntry *24 = 12,000 bytes, 1000 robj *16 = 16,000 bytes, data 36,000 bytes, sds overhead maybe 500 bytes, hash table size 512 (next power of two) → 512 *8 = 4096 bytes. Fragmentation similar. Total ~68,596 bytes. Savings factor ~1.8x. Still significant.

If values were 1000 bytes each (exceeding hash-max-ziplist-value of 64), then ZipList would never be used, and Dict is the only option.

3.3 Visualization

A diagram (conceptual) comparing memory footprints: ZipList as a neat row of boxes vs Dict as a scattered array of blocks with pointers.

4. Performance Characteristics Under Load

4.1 Small Hashes (ZipList Territory)

Consider a hash with 50 fields, each field name 5 bytes, value 20 bytes. This comfortably fits the default thresholds.

Operation: HGETALL

ZipList: scans 100 entries linearly. Each entry is 2 bytes overhead + 5 or 20 bytes data. The scan is a simple loop over contiguous memory, highly cache-efficient. On modern CPU, this takes ~0.5 microseconds.
Dict: HGETALL must traverse the hash table array (size 64 for 50 entries) and follow linked lists. This involves pointer chasing: load bucket pointer → load dictEntry → load key robj pointer → load value. Cache misses possible. For small dict, the entire table fits in L1 cache, but the linked list nodes may be scattered. Time ~1-2 microseconds.

Surprisingly, ZipList can be faster for HGETALL because of sequential access. For point lookups (HGET), ZipList must scan all fields; with 50 fields, worst-case looking for the last field is 50 comparisons. Dict: O(1) average. So ZipList HGET is slower. Let’s benchmark:

ZipList HGET last field: scan 50 fields, compare strings. Each comparison short. Time ~0.3-0.5 µs.
Dict HGET: hash, compute index, check bucket, compare key. Time ~0.1 µs.

So for point lookups, Dict wins. But the absolute difference is tiny (0.4 µs). For millions of operations per second, the effect is negligible. Memory savings tip the balance.

4.2 Large Hashes (Dict Territory)

Now consider a hash with 5000 fields. This will be a Dict from start because 5000 > 512 entries threshold.

HGET: O(1), same speed regardless of size (assuming low collisions). ~0.1-0.2 µs.
HGETALL: O(N) = 5000 fields. Traversal of dict: iterate over all buckets (size 8192) and chains. Could be 10,000 pointers to follow. Cache misses dominate. Time ~50-100 µs.
ZipList (if it were used, but impossible due to size): scanning 10,000 entries sequentially would take perhaps 20-30 µs. So ZipList would be faster for HGETALL, but point lookups would be O(N) terrible (worst-case 5000 comparisons = 10-20 µs). So Dict is a reasonable trade-off.

4.3 The Conversion Point

The threshold of 512 entries was chosen empirically. At this size, the memory savings of ZipList (~2x) still matter, and point lookups degrading to O(512) are acceptable (0.5 µs scan). For larger, point lookups become painful. Also, ZipList modifications (insert/delete in middle) become expensive due to memory shift.

But consider edge cases: if your application primarily does HGETALL (e.g., fetching all user session data) and rarely does HGET, you might want to keep ZipList even for larger hashes. However, Redis does not allow that—it forces conversion. This is why some users increase hash-max-ziplist-entries to 10000 when they value memory over point-lookup speed.

Benchmark scenario:
We test three configurations:

A: default (512 entries, 64 bytes value)
B: increased entries to 4096
C: forced Dict (by setting entries to 0)

Using a hash with 2000 fields, value size 50 bytes.
Memory usage:

A: not applicable (hash would be Dict because 2000 > 512). So same as C.
B: ZipList, memory ~ (20002 entries) → 4000 entries overhead + data = 2000(8+50) = 116,000 bytes data + 4000*2 overhead + 11 = 124,011 bytes.
C: Dict memory: 2000 dictEntry (48,000), 4000 robj (64,000), data 116,000, sds overhead ~2000, hash table size 4096 (32,768). Total ~260,768 bytes.

Memory savings: ~52% with ZipList.

Operations per second (using redis-benchmark with custom scripts):

HGET (random field): B took ~150,000 ops/sec, C took ~300,000 ops/sec (factor 2 slower).
HGETALL: B took 2500 ops/sec, C took 1800 ops/sec (ZipList faster because sequential scan).

So trade-off: if your workload is heavy on point queries, Dict wins. If heavy on full hash fetches, ZipList wins.

5. Real-World Implications and Tuning

5.1 Identifying Encoding

You can check a hash’s encoding with OBJECT ENCODING key. Returns “ziplist” or “hashtable”. This is crucial for debugging memory surprises.

5.2 Common Pitfalls

Pitfall 1: Ignoring value length
You might have a hash with 10 fields but one field storing a 100-byte string (e.g., a JSON blob). Even with 10 entries, the hash becomes Dict because hash-max-ziplist-value (64) is exceeded. If you need to store such a field, either accept Dict overhead or compress the value (e.g., using RedisJSON or binary encoding).

Pitfall 2: Large number of fields
A common pattern: storing thousands of fields in a single hash (e.g., a large object dump). This always uses Dict. But consider splitting into smaller hashes (key sharding) to stay in ZipList territory and save memory. For example, instead of user:123 with 2000 fields, use user:123:part1, user:123:part2 each with 512 fields.

Pitfall 3: Unexpected conversion during modifications
If you have a hash with 500 fields (all under 64 bytes) and you add one field with value 65 bytes, the hash becomes Dict. Next HSET to any field will then be O(1) but memory jumps. If you then delete the large field, the hash stays Dict forever. This can cause memory to gradually inflate over time. Periodic re-compaction is not automatic. You can manually convert back by using DEBUG HTTODICT? No built-in command. Workaround: COPY to a new key, or use the RESTORE with different encoding? Hard.

5.3 Tuning Guidelines

Session caches: Typically 5-10 small fields (<64 bytes). Keep defaults.
Product metadata: Might have 20-30 attributes, some longer (e.g., description >64 bytes). Consider setting hash-max-ziplist-value to 1024 to accommodate longer but still get ZipList for memory. But beware of point-lookup scan cost. Better to use Dict and accept memory.
Time-series data: Storing multiple metrics in one hash. Number of fields may be high (e.g., 1000). Memory is critical; set hash-max-ziplist-entries to 10000 and hash-max-ziplist-value to 256. Use HGETALL often. ZipList will be faster.
Leaderboards with hash? Not typical. Use sorted sets.

General rule: If your average field count > 1000, use Dict. If < 100, ZipList. Between 100-1000, measure. Most applications have many small hashes and a few large ones. Keep defaults for the small ones; handle large ones separately.

5.4 Monitoring and Alerts

Monitor used_memory and total_system_memory. If you see unexpectedly high memory, sample hash encodings with a script:

redis-cli keys 'user:*' | head -1000 | while read k; do echo "$k $(redis-cli object encoding $k)"; done

If many have “hashtable” with few fields, consider compaction.

6. Alternatives and Evolution: ZipList to ListPack

In Redis 7.0, the ZipList implementation for hashes and lists has been replaced by ListPack (see listpack.c). ListPack is a similar compressed sequential structure but with some improvements:

No reverse prevlen field (reduces size by ~1 byte per entry).
More compact integer encoding.
Slightly better performance due to simpler traversal.

The threshold parameters were renamed to hash-max-listpack-entries and hash-max-listpack-value. The conversion to Dict (hashtable) still happens under the same conditions. So the concepts remain identical. The binary format differs, but the trade-offs are the same.

For Redis 6.x and earlier, ZipList is used. Our analysis holds for both.

Conclusion

Redis hashes are not a monolithic store; they are a dynamic hybrid system that optimizes for the common case (small objects) while scaling gracefully to large ones. The ZipList vs. Dict split is a meticulous balance: memory efficiency for the many at the cost of linear scans, offset by an automatic escalation to hash tables when objects grow. This design reflects a core principle of systems engineering: do the simple, fast thing for the expected workload; fall back to the robust thing when necessary.

Understanding this mechanism answers the original question: why did your HGETALL spike? Perhaps a hash crossed the threshold and converted to a Dict, making full fetches slightly slower but point lookups faster. Or maybe a single large value pushed the hash into Dict territory, doubling memory consumption. Or maybe you had millions of tiny hashes that, by default, are stored as ZipLists, but after an upgrade to Redis 7 with new ListPack, memory even improved.

Armed with this knowledge, you can tune Redis not as a magic black box but as a finely adjustable memory manager. You can profile your own hashes, experiment with thresholds, and design your data models to fit Redis’s natural sweet spot. The next time you see latency spikes, you’ll know where to look—and you might even outsmart the silent gates of memory.

Further reading: Redis source code (ziplist.c, dict.c, t_hash.c), official documentation on Hash memory optimization, and benchmarks at RedisLabs.