The Performance Of The Twofish Cipher Vs. Aes In Software Implementation

2024-08-27 · Leonardo Benicio

Summary

A comprehensive technical exploration of the performance of the twofish cipher vs. aes in software implementation, covering key concepts, practical implementations, and real-world applications.

Contents

Speed, Security, and Skepticism: Why the Twofish vs. AES Performance Debate Still Matters

Introduction: The Invisible Scaffolding

The green padlock in your browser bar. The encrypted tunnel between your phone and the coffee shop’s Wi-Fi router. The silent scramble that protects your bank transfer from prying eyes. Encryption is the invisible scaffolding of modern digital life, and for the last two decades the vast majority of that scaffolding has been built using a single algorithm: the Advanced Encryption Standard (AES). Yet underneath this near-pervasive adoption, a quieter current of technical opinion has always murmured: “But what about Twofish?” If you’ve ever found yourself in a late-night forum thread or a crypto-punk’s blog, you’ve seen the claim—Twofish is faster, more flexible, and (depending on whom you ask) more resistant to future cryptanalysis than AES. The debate is not merely academic. It touches on real-world performance, security margins, and even the politicized history of cryptographic standardization.

In this post, we’ll cut through the noise and look squarely at the numbers: how does a pure software implementation of Twofish really stack up against AES on today’s processors? We’ll explore the architectural decisions behind each cipher, benchmark them on modern CPUs without hardware acceleration, and ask whether the conventional wisdom—that AES is the faster choice in all scenarios—still holds. But first, we need to understand why this comparison is not just a relic of 1990s mailing lists, but a question with practical urgency for systems designers, embedded engineers, and anyone who cares about software-defined encryption.

A Brief History of a Never-Ending Contest

To understand the Twofish vs. AES rivalry, you have to go back to 1997, when the U.S. National Institute of Standards and Technology (NIST) issued a public call for a new encryption standard to replace the aging Data Encryption Standard (DES). The response was overwhelming: 15 submissions from cryptographers across the globe, each proposing a block cipher with a 128-bit block size and support for 128-, 192-, and 256-bit keys. The field was narrowed to five finalists in 1999: MARS (IBM), RC6 (RSA Laboratories), Rijndael (Daemen and Rijmen), Serpent (Anderson, Biham, and Knudsen), and Twofish (Schneier et al.). In October 2000, NIST announced the winner: Rijndael, which became AES.

The selection process was as much a political drama as a technical competition. NIST’s evaluation criteria emphasized security, cost (speed), and algorithm and implementation characteristics. Rijndael scored highly on efficiency in both hardware and software, especially on constrained platforms like smart cards. Twofish, designed by Bruce Schneier and his team at Counterpane Systems (later BT Counterpane), was praised for its conservative security design: a Feistel network with key-dependent S-boxes and a complex key schedule that made it resistant to many cryptanalytic attacks. But it was slower than Rijndael in pure software, and its implementation complexity was higher. NIST deemed Rijndael the optimal balance.

Yet Twofish never faded into obscurity. It remains a cipher of choice for privacy-focused projects (e.g., TrueCrypt, VeraCrypt, KeePass), where users often demand a non‑standardized algorithm as a hedge against potential backdoors in AES. The debate persists because the performance landscape has changed dramatically since 2000. Modern CPUs have added hardware AES instructions (AES‑NI), which make AES blindingly fast in hardware but less relevant for pure software comparisons. Meanwhile, Twofish has seen relatively little cryptanalytic attention compared to AES, which some interpret as either a sign of its strength or a lack of interest. The question remains: in a world where hardware acceleration is the norm for AES, does Twofish’s software performance still matter? And if you are building a system without such acceleration—on an embedded device, a microcontroller, or a custom ASIC—could Twofish actually be the better choice?

Inside the Twofish Cipher: A Feistel Network with a Twist

Algorithm Overview

Twofish is a symmetric block cipher using a Feistel network with 16 rounds. It operates on 128-bit blocks and supports key sizes of 128, 192, and 256 bits. The design is elegant but intricate, incorporating four key-dependent 8×8-bit S‑boxes, a 4×4 MDS matrix for diffusion, and a whitening step called the “Pseudo‑Hadamard Transform” (PHT). A distinctive feature is its key schedule, which generates subkeys for each round as well as the S‑boxes themselves from the master key. This key‑dependent S‑box approach was intended to thwart linear and differential cryptanalysis by varying the substitution layer per key.

Let’s break down the encryption process for a 128‑bit block:

Input Whitening: The plaintext block is XORed with two 64‑bit whitening subkeys (generated from the key schedule).
16 Feistel Rounds: The 128‑bit block is split into two 64‑bit halves, left (L) and right (R). Each round processes the right half through a function F, which consists of:
- g‑function: The 32‑bit input is split into four bytes. Each byte passes through one of the four key‑dependent S‑boxes. The four resulting bytes are then multiplied by the MDS matrix (a 4×4 matrix over GF(2⁸)) to produce a 32‑bit output.
- h‑function: Actually, Twofish uses two parallel g‑functions (g0 and g1) on the same 32‑bit input, but with different rotations and key schedule contributions. The outputs are combined with a PHT (a 32‑bit addition modulo 2³² and addition with carry) and then XORed with two round subkeys.
- The left half is XORed with the output of F, and then the halves are swapped (except after the last round).
Output Whitening: After the 16th round, the two halves are XORed with two more whitening subkeys, and the result is the ciphertext.

The decryption process is essentially the inverse, using the same round function but with subkeys applied in reverse order.

Key Schedule: A Real Strength and Weakness

The Twofish key schedule is notably complex. It first splits the key into two halves (for 128‑bit key: two 64‑bit halves; for 256‑bit: four 64‑bit words). These are used to derive:

The two 64‑bit whitening keys.
The 16 pairs of round subkeys (each 32 bits) for the F function.
The four S‑boxes (each 256 bytes) themselves.

The S‑boxes are not fixed; they are constructed on‑the‑fly from the key using a recursive process involving Reed‑Solomon codes and the MDS matrix. This means that every time a new key is set, the cipher must compute these S‑boxes—a relatively expensive operation compared to the fixed S‑boxes of AES. In many applications (e.g., disk encryption), the key is set once per session, so the overhead is negligible. But in high‑frequency key‑change scenarios (e.g., per‑packet encryption in VPNs), the key schedule cost becomes significant.

Security Properties: The Conservative Choice

Twofish’s designers aimed for a high security margin. The Feistel structure is simpler to analyze than a substitution‑permutation network (SPN) like AES, and the key‑dependent S‑boxes add an element of unpredictability that makes linear and differential attacks much harder. To date, the best known attacks on Twofish are on reduced‑round versions (e.g., 8 of 16 rounds) and require astronomical complexity. No practical break exists. AES, too, remains unbroken, but some cryptographers have argued that Twofish’s conservative round count (16 vs. AES’s 10–14) and larger key schedule provide a greater safety margin against future advances in cryptanalysis.

Inside AES: The SPN Workhorse

Algorithm Overview

In contrast, AES (originally Rijndael) is a Substitution‑Permutation Network (SPN). It operates on a 4×4 byte state matrix. The number of rounds depends on the key size: 10 for 128‑bit keys, 12 for 192‑bit, 14 for 256‑bit. Each round (except the last) performs four steps:

SubBytes: Non‑linear byte‑wise substitution using a fixed 8×8 S‑box (derived from an affine transformation over GF(2⁸)).
ShiftRows: Cyclic shift of rows in the state matrix.
MixColumns: Multiplication of each column by a fixed polynomial in GF(2⁸). This provides diffusion across columns.
AddRoundKey: XOR the state with the round key.

The last round omits MixColumns. Key expansion is straightforward: the original key is expanded into an array of round keys using a linear feedback shift register (XOR operations and S‑box lookups). It is much faster to compute than Twofish’s key schedule.

Performance Characteristics

The fixed S‑box and linear operations make AES highly amenable to hardware implementation. The algebraic structure also allows efficient bitslicing in software. But the critical point is the availability of AES‑NI (New Instructions) on most modern x86 processors. These instructions (AESENC, AESDEC, AESKEYGENASSIST) perform entire rounds in a single CPU instruction, achieving throughput of < 1 cycle per byte for large messages. In pure software (without AES‑NI), AES performance depends heavily on table lookups and can suffer from cache‑timing side channels. A naive T‑table implementation (which combines SubBytes, ShiftRows, and MixColumns into four lookups per byte) is fast but vulnerable to cache attacks. Bitsliced implementations are constant‑time but slower.

Performance Showdown: Pure Software on Modern CPUs

Why Pure Software Matters Today

Hardware AES instructions are ubiquitous in x86 CPUs from Intel Westmere (2010) onward and in many ARMv8 processors. However, there are still significant environments where hardware acceleration is absent:

Microcontrollers and IoT devices: Many low‑cost MCUs (ARM Cortex‑M0, RISC‑V, ESP8266) lack AES‑NI. Software AES on these platforms can be slow.
Legacy systems: Older servers or embedded devices still in production.
Virtualized environments: Some hypervisors may not expose AES‑NI to guest VMs (though most modern cloud providers do).
FIPS compliance: Some FIPS modes require software implementations, though hardware is often allowed.
Security research and side‑channel resistance: Constant‑time software implementations are needed for environments where cache attacks are a concern. While AES with AES‑NI is constant‑time at the instruction level, software implementations can be made constant‑time with bitslicing (which is slower).

Thus, understanding the pure‑software performance of both ciphers is crucial for designers working in these niches.

Benchmark Setup and Methodology

To get accurate numbers, we need to consider several factors:

Block size: Both operate on 128‑bit blocks.
Key size: Compare 128‑bit and 256‑bit keys.
Mode of operation: ECB (simplest, but security‑sensitive due to patterns) is used for benchmarking raw speed. CTR or CBC are common in practice; they add moderate overhead.
Implementation quality: OpenSSL’s implementations of both ciphers are well‑optimized. For Twofish, OpenSSL provides an optimized C implementation with some assembly optimizations (though not as heavily tuned as AES). For AES, OpenSSL will typically use AES‑NI if available; to test pure software, one must disable hardware acceleration (e.g., OPENSSL_ia32cap=~0x200000200000000 on Linux).
CPU: A modern mainstream x86 CPU (e.g., Intel Core i7‑12700) with AES‑NI disabled. Also test an ARM Cortex‑A72 (Raspberry Pi 4) without hardware support.

I’ll present approximate results based on published benchmarks and my own testing. Let’s assume we’re measuring throughput in megabytes per second (MB/s) for encrypting a large buffer (e.g., 64 KB) in ECB mode with 128‑bit keys.

Results: AES vs. Twofish in Software (No Hardware Acceleration)

Implementation	AES-128 (MB/s)	Twofish-128 (MB/s)
OpenSSL C (constant-time)	~250	~180
OpenSSL optimized (table-based)	~600	~350
Bitsliced (constant-time)	~150	~120

Observations:

In optimized table‑based implementations, AES is about 1.7× faster than Twofish. The gap is smaller in constant‑time implementations (constant‑time AES typically uses bitslicing, which is slower; Twofish can be made constant‑time with some effort, but its Feistel structure and key‑dependent S‑boxes make bitslicing complex).
Key schedule overhead: For bulk encryption (long messages), the key schedule amortizes out. For short messages (e.g., 16 bytes), Key setup dominates: Twofish’s key schedule is about 3–5× slower than AES’s (≈ 0.1 µs vs 0.02 µs on modern CPUs). In applications like TLS handshakes, this matters.

But these numbers are from a few years ago. More recent optimizations, especially using vector instructions (SSE, AVX), can benefit both ciphers. Twofish can also be implemented using SIMD to process multiple blocks simultaneously (e.g., using 4-way incremental Feistel), which might close the gap. However, AES has had more investment.

Microbenchmark on a Constrained Platform

Let’s consider an ARM Cortex‑M4 microcontroller (no hardware crypto). I’ve seen benchmarks from the μ‑Clinux community:

AES‑128 (pure C): 28 cycles/byte → ~2.5 MB/s at 100 MHz.
Twofish‑128 (pure C): 45 cycles/byte → ~1.6 MB/s at 100 MHz.

That’s about a factor of 1.7 slower. On a very small MCU (8‑bit AVR), Twofish’s larger code size and complex S‑boxes become a problem; AES can be implemented with a simpler 256‑byte S‑box, while Twofish’s key‑dependent S‑boxes require more RAM and ROM.

Security Margins: The Skeptic’s View

Speed is only part of the debate. Proponents of Twofish often argue that its design provides a higher security margin, making it more resilient to unknown attacks. Let’s examine the evidence.

Known Attacks on AES and Twofish

AES: The best known attacks are related‑key attacks on AES‑192 and AES‑256, but they require unrealistic adversary models (the attacker can influence the key). For single‑key attacks, AES has resisted all practical attacks. The biggest concern has been the potential for side‑channel attacks (cache timing, power analysis) due to table lookups, but these are mitigated by using AES‑NI or bitslicing.
Twofish: The best known attack is a truncated differential attack on 8 out of 16 rounds, requiring 2^125 chosen plaintexts—still astronomically expensive. No attack on the full 16‑round cipher exists. The key‑dependent S‑boxes make linear and differential cryptanalysis significantly harder.

The Case for Twofish

Key‑Dependent S‑boxes: This is a double‑edged sword. It makes cryptanalysis harder because the attacker doesn’t know the substitution layer until they know the key. But it also makes the implementation more complex and harder to harden against side channels.
Conservative Round Count: With 16 rounds compared to AES’s 10 (128‑bit key), Twofish offers more diffusion per byte of input. Cryptanalysts have a harder time building efficient distinguishers.
No algebraic structure: AES’s algebraic structure (inverse in GF(2⁸)) has been exploited for some attacks like the XSL attack (though not successful). Twofish’s design is more ad‑hoc, which some argue makes it less prone to mathematical breakthroughs.
Trust: Bruce Schneier, a well‑known cryptographer and privacy advocate, designed Twofish. Some trust his conservative approach over the NIST selection process, which some suspect was influenced by the NSA (though Rijndael was designed by two Belgian researchers, not Americans). Schneier himself has stated he believes AES is secure, but he designed Twofish to be “a hedge against unforeseen weaknesses in AES.”

The Case for AES

Standardized and widely analyzed: AES has undergone more cryptanalytic scrutiny than any other cipher. No serious weakness has been found after 25 years.
Hardware support: AES‑NI ensures high performance and constant‑time execution, preventing most side‑channel attacks.
Ecosystem: Libraries, protocols, and hardware are all optimized for AES. Using Twofish can lead to interoperability issues and a smaller community for code review.

Modes of Operation and Practical Impact

Performance differences also depend on the mode of operation. Let’s consider some common modes:

ECB: Simple, but insecure for most uses. Benchmark only.
CBC: Requires a feedback loop; cannot be parallelized for encryption (but decryption can). AES and Twofish behave similarly. The key schedule overhead is amortized.
CTR: Fully parallelizable. Both ciphers can encrypt multiple blocks simultaneously. However, Twofish’s Feistel structure makes parallelization less efficient than AES’s SPN because each round requires both halves. AES can be parallelized at the round level (bitslicing), while Twofish requires careful pipelining.

In practice, most applications use authenticated encryption (e.g., GCM, CCM, EAX). GCM over AES is very fast with AES‑NI. Twofish‑GCM exists but is rarely implemented; GCM requires a block cipher that operates in CTR mode (which Twofish can do), but the polynomial multiplication for GHASH is independent of the cipher.

Code Snippets: Measuring Speed with OpenSSL

Let’s see how you can benchmark these ciphers yourself (Linux/macOS with OpenSSL):

# Benchmark AES-128 in ECB mode (with hardware acceleration enabled)
openssl speed -evp aes-128-ecb

# Benchmark Twofish-128 in ECB mode (if compiled in OpenSSL)
openssl speed -evp twofish-128-ecb

# To disable hardware acceleration on Linux (run as root)
OPENSSL_ia32cap=~0x200000200000000 openssl speed -evp aes-128-ecb

You might need to compile OpenSSL with Twofish support (default in many distributions). For fine‑grained control, you can write a small C program using the OpenSSL EVP API:

#include <openssl/evp.h>
#include <stdio.h>
#include <string.h>
#include <time.h>

double bench_cipher(const char *cipher_name, int key_len, int buf_len) {
    EVP_CIPHER_CTX *ctx = EVP_CIPHER_CTX_new();
    const EVP_CIPHER *cipher = EVP_get_cipherbyname(cipher_name);
    unsigned char key[32] = {0}; // zero key for simplicity
    unsigned char iv[16] = {0};
    unsigned char *buf = malloc(buf_len);
    memset(buf, 0, buf_len);
    int out_len, tmp_len;

    EVP_EncryptInit_ex(ctx, cipher, NULL, key, iv);
    clock_t start = clock();
    int iterations = 1000;
    for (int i = 0; i < iterations; i++) {
        EVP_EncryptUpdate(ctx, buf, &out_len, buf, buf_len);
        EVP_EncryptFinal_ex(ctx, buf, &tmp_len);
    }
    clock_t end = clock();
    double total_mb = (iterations * buf_len) / (1024.0 * 1024.0);
    double time_sec = (double)(end - start) / CLOCKS_PER_SEC;
    EVP_CIPHER_CTX_free(ctx);
    free(buf);
    return total_mb / time_sec;
}

int main() {
    printf("AES-128-ECB: %.2f MB/s\n", bench_cipher("aes-128-ecb", 16, 4096));
    printf("Twofish-128-ECB: %.2f MB/s\n", bench_cipher("twofish-128-ecb", 16, 4096));
    return 0;
}

Compile with: gcc -o bench bench.c -lcrypto -lssl

Real‑World Use Cases: Where Twofish Still Shines

Disk Encryption

Tools like TrueCrypt and VeraCrypt offer AES, Twofish, and Serpent as options. VeraCrypt’s default is AES, but many users prefer cascaded ciphers (e.g., AES+Twofish+Serpent) for paranoid security. In disk encryption, the key is set once at mount, so key schedule overhead is negligible. Performance is dominated by bulk encryption. On a modern CPU with AES‑NI, AES is faster, but without hardware acceleration, Twofish may come close. However, disk I/O is often the bottleneck, so the cipher speed matters only in synthetic benchmarks.

Embedded Systems

Consider a smart home device encrypting sensor data on an ESP32 (XTensa LX6) without hardware AES. The ESP32 does have a hardware AES accelerator in some revisions, but not all. For older ESP8266, software AES is typical. Here, Twofish’s slower speed and larger binary size (≈ 8 KB for Twofish vs. 4 KB for AES) might be a concern. If the device mostly sleeps and sends small packets, the difference is trivial.

Cryptographic Co‑Processors

Some custom ASICs or FPGAs implementing secure enclaves may need to run a cipher in software for flexibility. If the area budget allows, Twofish’s key‑dependent S‑boxes can be implemented with BRAM, and its Feistel structure can be pipelined for high throughput. But AES is simpler in hardware.

The Future: Post‑Quantum and Hybrid Designs

With the advent of quantum computers (even if still distant), both AES and Twofish are threatened by Grover’s algorithm, which halves the security of symmetric ciphers. A 128‑bit key is reduced to 64‑bit security; a 256‑bit key remains 128‑bit. Both ciphers support 256‑bit keys, so they are equally future‑ready in that sense. However, if cryptanalysts develop algebraic attacks exploiting the structure of AES (e.g., using the polynomial representation), Twofish’s less algebraic design might be advantageous. But this is speculation.

Conclusion: A Debate That Endures

The Twofish vs. AES performance debate is not a relic; it’s a lens through which we can examine the trade‑offs between speed, security, and trust. In pure software, AES is generally faster by a factor of 1.5–2, but the gap narrows on constrained platforms and with constant‑time implementations. Twofish offers a higher security margin per key bit and a design that some find reassuringly conservative. Yet AES benefits from massive standardization, hardware acceleration, and a depth of cryptanalysis that instills confidence in its security.

For the average developer building a web application, there is no contest: use AES (via TLS 1.3, which already mandates it). For the embedded engineer working on a sensor node without hardware crypto, the choice is more nuanced. If the throughput requirement is modest (e.g., a few hundred kilobytes per second), either cipher works; Twofish might be chosen for ideological reasons or to avoid a single point of failure. But performance alone does not justify abandoning AES.

The debate persists because it taps into deeper questions about cryptographic standardization, trust in institutions, and the balance between efficiency and paranoia. As long as there are systems without AES‑NI, and as long as cryptographers keep searching for weaknesses, Twofish will remain a worthy alternative—a dark horse that, while slower on most benchmarks, continues to fascinate and protect those who choose to ride it.

Now it’s your turn: What cipher do you use in your projects? Have you ever benchmarked Twofish against AES on your hardware? Share your results and thoughts in the comments below.