Implementing A Fully Homomorphic Encryption Scheme With Boosted Bootstrapping (Fhew/Tfhe)

2022-09-11 · Leonardo Benicio
Summary

A comprehensive technical exploration of implementing a fully homomorphic encryption scheme with boosted bootstrapping (fhew/tfhe), covering key concepts, practical implementations, and real-world applications.

Contents

The Encrypted Oracle: Why Your Cloud Server Doesn’t Need to See Your Data

The Opening Hook

Imagine handing a locked, opaque box to a stranger. Inside is a detailed blueprint for a revolutionary new engine. You ask this stranger to perform a series of complex calculations on that blueprint—to redesign the cooling system, optimize the fuel intake, and stress-test the materials. The stranger, without ever once opening the box, uses their tools to manipulate the blueprints, make adjustments, and hand you back a new, different locked box. You unlock it with your key, and inside is your optimized, tested, and improved engine design.

This isn’t magic. It’s the elusive promise of Fully Homomorphic Encryption (FHE). For decades, this concept was the “holy grail” of cryptography—a way to compute on encrypted data without ever decrypting it. The implications are staggering. It would mean a cloud provider could process your medical records to find a treatment match without ever seeing your diagnosis. A financial institution could run anti-fraud algorithms on your encrypted transactions without knowing your balance. A government could analyze encrypted census data without compromising individual privacy. We could build a world where trust is replaced by mathematics, and the fundamental asymmetry of data access—“You must show me your secrets for me to help you”—is finally broken.

But for most of its history, FHE was a tantalizing theory, not a practical tool. The first viable scheme, proposed by Craig Gentry in 2009, was a monumental work of genius. However, an FHE operation was millions of times slower than the equivalent operation on plaintext. A simple database search that takes milliseconds on a plaintext server could take hours or days on an encrypted one. The primary culprit was a process called bootstrapping—the necessary periodic refreshing of the encryption to remove accumulated noise, which itself required evaluating the decryption circuit homomorphically. Noise is the price we pay for computation on ciphertexts. Each addition or multiplication multiplies the noise, and eventually the ciphertext becomes indecipherable. Bootstrapping resets the noise, allowing unlimited computation, but it is staggeringly expensive.

Enter FHEW and TFHE: two schemes that turned the bootstrapping bottleneck on its head. They demonstrated that by carefully engineering the bootstrapping procedure and exploiting the algebraic structure of the underlying rings, one can achieve bootstrapping times measured in milliseconds—not hours. This breakthrough, often referred to as “boosted bootstrapping,” has opened the door to practical, real-world FHE applications. In this post, we will dive deep into the mechanics of FHEW/TFHE, explain how boosted bootstrapping works, walk through a concrete implementation, and explore the landscape of applications that are now within reach.


1. The Burden of Noise: Why Bootstrapping Matters

Before we appreciate the elegance of boosted bootstrapping, we must understand the fundamental challenge that every FHE scheme must overcome. Classical public-key encryption (e.g., RSA or ElGamal) is malleable in the sense that multiplying two ciphertexts yields an encryption of the product of the plaintexts. However, this property is fragile: after a single operation, the ciphertext becomes unusable for further computation. Fully homomorphic encryption, on the other hand, supports an arbitrary number of additions and multiplications by introducing a carefully controlled noise term that grows with each operation.

The Noise Budget

In lattice-based FHE (the most practical family today), plaintext messages are encoded as small integers (often bits or elements of a small modulus) and added to a random error (noise) drawn from a Gaussian distribution. The security of the scheme relies on the hardness of the Learning With Errors (LWE) problem: recovering the secret key from noisy dot products is provably as hard as solving worst-case lattice problems.

Each homomorphic operation adds to the noise. For example, adding two ciphertexts doubles the noise variance; multiplying them squares it. After only a few multiplications, the noise exceeds a threshold, and decryption becomes impossible. Bootstrapping is the mechanism that reduces the noise back to a fresh level. It works by homomorphically evaluating the decryption circuit—i.e., taking an encrypted private key and an encrypted ciphertext, and producing a new encryption of the same plaintext but with much lower noise.

Gentry’s Blueprint

Gentry’s original scheme used a somewhat homomorphic encryption (SHE) scheme that could evaluate its own decryption circuit once, then added a “bootstrapping” step to make the scheme fully homomorphic. The clever trick was to include an encrypted version of the secret key in the public key. But the decryption circuit, when implemented over encrypted bits, was huge: it involved multiplication of polynomials of degree thousands. The result was a bootstrapping operation that took minutes or hours on a modern CPU.

Subsequent work improved bootstrapping by reducing the depth of the decryption circuit (e.g., using squashing or relinearization), but the core problem remained: the decryption circuit involved evaluating large-degree polynomials, which required many levels of multiplication. The breakthrough came when researchers realized that by moving from polynomial rings to torus arithmetic and using a tiny plaintext space (just 1 bit), the decryption circuit could be simplified dramatically. This insight gave birth to FHEW and later TFHE.


2. The FHEW and TFHE Revolution

FHEW (published by Ducas and Micciancio in 2015) and TFHE (Chillotti et al., 2016–2017) are essentially the same family: they both encrypt bits over the torus ( \mathbb{T} = \mathbb{R}/\mathbb{Z} ), use a ring variant of LWE (Ring-LWE or Mod-LWE), and perform bootstrapping in under 100 milliseconds for a single gate. TFHE improved upon FHEW by introducing a more efficient key-switching mechanism and a faster blind rotation. Today, TFHE bootstrapping can be done in under 0.1 ms on a GPU, enabling encrypted evaluation of entire neural networks with thousands of gates.

2.1 The Torus and TLWE

The fundamental object in FHEW/TFHE is the Torus Learning With Errors (TLWE) ciphertext. Instead of working over integers modulo a large prime, we work over the continuous torus ( \mathbb{T} = { \text{real numbers mod 1} } ). A plaintext bit ( m \in {0,1} ) is encoded as ( \overline{m} = m \cdot \frac{1}{2} ), i.e., 0 maps to 0.0 and 1 maps to 0.5 on the torus. A TLWE ciphertext is a pair ((a, b)) where [ b = \langle a, \mathbf{s} \rangle + e + \overline{m} \pmod{1} ] Here, ( a ) is a vector of random torus elements, ( \mathbf{s} ) is the secret key (a binary vector), and ( e ) is a small Gaussian noise. Decryption computes ( b - \langle a, \mathbf{s} \rangle \approx \overline{m} ).

Because the noise is small (say, in ([-1/16, 1/16])), the decrypted value is either near 0 or near 0.5, and we round to the nearest encoding. Note that the addition of ciphertexts corresponds to addition of the encoded plaintexts plus added noise. Multiplication is not directly supported; instead, we evaluate Boolean gates (AND, OR, NAND, etc.) via bootstrapping.

2.2 TRLWE and the Blind Rotation

To speed up bootstrapping, FHEW/TFHE uses TRLWE (Torus Ring-LWE) ciphertexts, where the vector ( a ) is replaced by a polynomial ring element. The ring is typically ( R = \mathbb{Z}[X]/(X^N+1) ) with ( N ) a power of two (e.g., 1024). A TRLWE ciphertext encrypts a polynomial ( m(X) ) of degree < N, whose coefficients encode bits.

The core of bootstrapping is the blind rotation: given an LWE ciphertext encrypting a message, and a TRLWE encryption of a lookup table, we compute a new TRLWE ciphertext that encrypts the same message but with fresh noise. The lookup table (or “gadget” polynomial) performs the rounding step: it maps a noisy phase to a clean bit. The blind rotation uses the secret key of the LWE ciphertext to rotate the polynomial such that the desired output appears at a fixed position.

The algorithm works in three phases:

  1. Sample Extraction: Extract the appropriate coefficient from the rotated polynomial to obtain a new LWE ciphertext.
  2. Key Switching: Convert the LWE ciphertext back to the original key if needed.
  3. Noise Reset: The output ciphertext has noise independent of the input noise, effectively resetting the budget.

The beauty of the blind rotation is that it costs only ( O(N) ) operations, where ( N ) is the ring size (typically 1024 or 2048). This is orders of magnitude smaller than the polynomial multiplication in earlier bootstrapping schemes. Furthermore, the whole rotation can be implemented using FFT (Fast Fourier Transform) or NTT (Number Theoretic Transform) for polynomial multiplication, giving a bootstrapping time of a few milliseconds.

2.3 Boosted Bootstrapping: Going Beyond One Bit

The term “boosted bootstrapping” generally refers to any technique that enhances the efficiency or functionality of the FHEW/TFHE bootstrapping procedure. In practice, it covers several important improvements:

  • Multivalue bootstrapping: Instead of bootstrapping single bits, we can bootstrap small integers (e.g., 2–4 bits) by modifying the lookup table to map a range of phases to multiple possible outputs. This reduces the number of gates needed for arithmetic.
  • Programmable bootstrapping: The lookup table can be arbitrary, allowing evaluation of any function of the input (e.g., an entire sub-circuit) during the bootstrapping step itself. This is a form of “function evaluation in the bootstrapping.”
  • Batched bootstrapping: Using packing (polynomial compressing many bits) to bootstrap multiple ciphertexts simultaneously.
  • Hardware acceleration: GPU or FPGA implementations that achieve sub-microsecond bootstrapping.

Let’s look at each of these in more detail.

Multivalue Bootstrapping

Standard TFHE boots a single bit. But many operations (like addition of integers) require multiple bits. One approach is to encrypt each bit separately and use a ripple-carry adder, which costs ( O(\ell) ) bootstraps per addition for ( \ell )-bit integers. Boosted bootstrapping allows us to directly compute on small integers (2–4 bits) by encoding the plaintext as a value in ( {0, \frac{1}{2^k}, \frac{2}{2^k}, \dots, 1-\frac{1}{2^k}} ) and using a lookup table that decodes the whole value. The bootstrapping step then “resets” the noise for the multi-bit value. This reduces the number of bootstrapping operations needed for basic arithmetic from linear in bit width to constant.

Programmable Bootstrapping

The most powerful enhancement is programmable bootstrapping. Normally, the bootstrapping lookup table (a polynomial) is fixed to round to the nearest bit. However, we can set the lookup table to any function ( f : {0,1} \to {0,1} ) (or more generally from a small domain to a small range). During the blind rotation, the output coefficient selected is precisely ( f(m) ). In effect, we evaluate ( f ) for free during the bootstrapping noise refresh. This turns bootstrapping into a “compute-and-refresh” operation. By chaining such operations, we can evaluate any Boolean circuit with a bootstrapping per gate, but each bootstrapping already includes the evaluation of the gate’s truth table. This is how TFHE achieves the ability to evaluate arbitrary functions with same cost as a single bootstrapping.

A natural extension is to evaluate functions of multiple inputs by first XORing or adding ciphertexts (which increases noise) and then applying a careful lookup table that covers combined results. For example, to compute AND of two bits, we can add the two ciphertexts (getting a phase near 0, 0.5, or 1.0) and then bootstrap with a lookup that maps 0 to 0, 0.5 to 0, and 1.0 to 1. This requires a three-region lookup rather than a two-region one, but it’s still feasible.

Batched and Vectorized Bootstrapping

Another line of work uses the fact that a single TRLWE ciphertext can encrypt an entire polynomial, i.e., a vector of plaintext coefficients. By performing the blind rotation simultaneously on all coefficients (using the algebraic structure of the ring), we can bootstrap many LWE ciphertexts at once. This is especially useful for SIMD-style applications like encrypted image processing or matrix multiplication. Modern libraries like TFHE-rs and Concrete implement batching, achieving throughput of millions of bootstraps per second on a CPU.


3. A Concrete Implementation Walkthrough

Let’s ground these concepts in a real example. We’ll implement a simple TFHE-style bootstrapping from scratch (in Python-like pseudo-code) to illustrate the key steps. We’ll then show how boosted bootstrapping deviates.

3.1 Building Blocks

We’ll use a ring ( R = \mathbb{Z}[X]/(X^N+1) ) with ( N=512 ). All operations are performed modulo 1 (torus) using floating-point approximations or integer arithmetic with scaling. In practice, we use a large integer modulus Q (e.g., ( Q=2^{64} )) and represent torus elements as integers in ([0, Q-1]). The noise scale is (1/Q).

Key generation:

  • Secret key ( \mathbf{s} \in {0,1}^n ) (for LWE) and ( \mathbf{sk}_{\text{TRLWE}} \in R ) (for TRLWE, usually a polynomial with small coefficients).
  • Public key: many random TRLWE ciphertexts encrypting 0 (for key-switching material).

LWE encryption of a bit ( m ):

  • Sample random vector ( a \in \mathbb{T}^n ), noise ( e \in \mathbb{T} ) (small).
  • Compute ( b = \langle a, \mathbf{s} \rangle + e + m/2 ).
  • Ciphertext: ( (a, b) ).

TRLWE encryption of a polynomial ( p(X) ):

  • Sample random polynomial ( a \in R ), noise polynomial ( e \in R ) (small coefficients).
  • Compute ( b = a \cdot \mathbf{sk}_{\text{TRLWE}} + e + p ).
  • Ciphertext: ( (a, b) ).

3.2 Blind Rotation: Pseudo-code

The blind rotation takes as input:

  • An LWE ciphertext ( (a, b) ) encrypting bit ( m ).
  • A TRLWE ciphertext ( (A, B) ) that encrypts a “test polynomial” ( T(X) ). Typically ( T(X) ) encodes the lookup table: all coefficients equal to 0 except the one corresponding to the plaintext value. For standard bootstrapping, we set ( T(X) = \sum_{i=0}^{N-1} (0.5) X^i ) (so all entries are 0.5). Then after rotation, the constant coefficient will be 0.5 if the phase rounds to ( m=1 ), else 0.

The algorithm:

  1. Compute the “phase” ( \phi = b - \langle a, \mathbf{s} \rangle ) approximately (but we don’t know ( \mathbf{s} ) directly; we use a bootstrapping key: a set of encryptions of each bit of ( \mathbf{s} )).
  2. For each coefficient of ( \phi ), we compute the rotation factor using a clever decomposition.

The actual procedure in TFHE uses a “CMUX” (controlled multiplexer) operation repeatedly. It leverages the fact that for each bit of the LWE secret key, we have a TRLWE encryption of that bit (the “bootstrapping key”). Then we use the following loop:

def blind_rotate(accumulator: TRLWE, LWE_ciphertext: (a, b), boot_key: list[TRLWE]):
    # accumulator initially holds test polynomial T(X) encrypted
    # We want to rotate accumulator by an amount derived from phase
    # Phase = b + sum a_i * s_i (mod 1), scaled to integer in [0, N-1]
    # We process bits of the scaling factor from most significant to least
    phase_int = round(b * N)  # approx integer representation of b part
    for i in range(n):
        # bit_i of s_i is encrypted in boot_key[i]
        # We need to multiply accumulator by X^{a_i * N} if s_i=1 else identity
        # Use CMUX: output = (s_i ? (rotated_accumulator) : accumulator)
        rotated = rotate_polynomial(accumulator, a_i * N // 2)   # approximate
        accumulator = cmux(boot_key[i], rotated, accumulator)
    # Final rotation by phase_int
    accumulator = rotate_polynomial(accumulator, phase_int)
    return accumulator

The CMUX operation evaluates a linear combination: accumulator = (1 - boot_key[i]) * accumulator + boot_key[i] * rotated. This can be done homomorphically because boot_key[i] is a TRLWE ciphertext encrypting the secret key bit. The product boot_key[i] * rotated is performed with a key-switch and external product, which is expensive but the number of CMUX calls is the dimension of the LWE secret (e.g., 500). The key-switching and external product are the main cost.

After the blind rotation, we extract a noisy LWE ciphertext from the constant coefficient of the final accumulator. That coefficient is approximately ( T[m] ), i.e., the output of the lookup table.

3.3 From Standard to Boosted Bootstrapping

The above algorithm refreshes a bit and evaluates a single gate. To achieve programmable bootstrapping, we simply set the test polynomial ( T(X) ) to encode an arbitrary function ( f ). For example, to compute NAND, we set [ T(X) = \begin{cases} 1/2 & \text{if coefficient index corresponds to } m=0 \ 0 & \text{for } m=1 \end{cases} ] But we need to handle two inputs. For two-bit operations, we first add two LWE ciphertexts to form an LWE ciphertext whose phase approximates ( (m_1 + m_2)/2 ) (mod 1). Then we run the blind rotation with a test polynomial that maps:

  • Phase near 0 → 0
  • Phase near 0.5 → 0
  • Phase near 1.0 → 1 (since 1/2 + 1/2 = 1 → 0.5 mod 1? Wait careful: If m1=1,m2=1, phase = (1+1)/2 = 1 mod 1 = 0? Actually addition of two ciphertexts gives phase = (m1 + m2)/2 mod 1. For two bits, m1,m2 in {0,1}: sum in {0,1,2}. Phase in {0,0.5,1} mod 1 → 0,0.5,0. So phase 0 corresponds to both 0 or both 1. We need to distinguish these. So we need to use three possible phases: 0, 0.5, and 0 again (but the last wraps). So we need to encode the function using two adjacent coefficients. This is possible but more complex.

Boosted bootstrapping for multi-bit values follows a similar idea: we encode a multi-bit integer into the torus as ( m/2^k ). The blind rotation then rotates by an integer corresponding to the whole phase, and the lookup table can decode the multiple bits. For example, for 2-bit plaintexts, we encode 0,1,2,3 as 0, 1/4, 1/2, 3/4. Then a single bootstrapping can perform a table lookup on a 2-bit input and produce a 2-bit output. This reduces the number of required bootstrapping operations per arithmetic operation significantly.

3.4 Optimizing the CMUX: FFT and NTT

The bottleneck in the blind rotation is the external product between a TRLWE ciphertext (the accumulator) and a TRLWE ciphertext (the bootstrapping key). Each external product requires polynomial multiplications, which are ( O(N \log N) ) using FFT. Modern implementations use the Number Theoretic Transform (NTT) over a modulus that supports roots of unity. By precomputing the NTT of the bootstrapping key polynomials, the external product becomes a pointwise multiplication in the frequency domain. Additionally, each CMUX involves two external products (for ( s_i \cdot \text{rotated} ) and ( (1-s_i) \cdot \text{accumulator} )), so each CMUX costs about 2 polynomial multiplications.

With ( n \approx 500 ) and ( N=1024 ), a bootstrapping involves ~1000 polynomial multiplications. Modern CPUs can do each NTT-based multiplication in ~1 μs, leading to ~1 ms bootstrapping. GPUs can do it faster.


4. Performance Benchmarks and Real-World Results

How fast is TFHE with boosted bootstrapping today? Let’s look at published results:

SchemeBootstrapping Time (CPU, single core)Notes
Original FHEW (2015)~66 msBit gate
TFHE (2016)~13 ms
TFHE-rs (2023)~0.4 ms (with AVX2)
Concrete (Zama)~0.1 ms (GPU)Batched
CuFHE (GPU)~0.02 ms per gateBatched

To put these numbers in perspective: a standard RSA encryption takes ~1 ms on a modern CPU. A single TFHE bootstrapping now takes less time than an asymmetric encryption. This means that evaluating an entire encrypted search on a database of 10,000 records (requiring, say, 10,000 AND gates) would take around 4 seconds on CPU or 0.2 seconds on GPU. That’s still slower than plaintext, but it’s in the realm of practicality for many use cases.

Boosted bootstrapping improves these numbers even further when considering multi-bit operations. For example, a 4-bit addition using programmable bootstrapping can be done in a single gate operation (with a larger table), achieving an effective cost of 0.1 ms per addition (CPU). This is competitive with early homomorphic encryption schemes that took seconds for a single addition.

4.1 A Simple Benchmark: Encrypted XOR

Let’s implement an encrypted XOR using two bootstraps (one for each input bit after combining). We’ll use a library like tfhe-rs (Rust) or concrete (Python). The code below is illustrative:

from concrete import fhe

@fhe.compiler({"x": "encrypted", "y": "encrypted"})
def xor(x: int, y: int) -> int:
    return x ^ y

inputset = [(0,0), (0,1), (1,0), (1,1)]
circuit = xor.compile(inputset)

# Generate keys
keys = circuit.keygen()
# Encrypt
cipher_x = circuit.encrypt(1)
cipher_y = circuit.encrypt(0)
# Evaluate
result = circuit.run(cipher_x, cipher_y)
# Decrypt
print(circuit.decrypt(result))  # Output: 1

Under the hood, concrete uses TFHE with programmable bootstrapping. The XOR is compiled into a small circuit where the final bootstrapping performs the XOR operation. The evaluation time is around 0.01 seconds on a modern laptop.

5. Applications: Where Boosted FHE Shines

The ability to bootstrap a gate in under a millisecond opens up many applications that were previously intractable:

5.1 Private Information Retrieval (PIR)

PIR allows a client to retrieve a record from a server’s database without revealing which record. Using TFHE, we can evaluate a circuit that for each possible index computes a blind selection. For a database with 2^20 entries, a naive PIR would require 2^20 bootstraps, which is too slow. However, with batched and programmable bootstrapping, we can compress the selection into a tree of gates. Recent results show that PIR with TFHE can achieve sub-second response times for databases of size 10^6.

5.2 Encrypted Neural Networks

Deep learning inference over encrypted data has been a dream for privacy-preserving AI. TFHE’s gate-level bootstrapping allows evaluating any activation function (ReLU, sigmoid) as a lookup table. For example, a small neural network with two hidden layers of 128 neurons each can be evaluated in under a second on a GPU. Companies like Zama offer frameworks for encrypted inference using TFHE.

5.3 Zero-Knowledge Proofs and Verifiable Computation

The bootstrapping circuit itself can be used to construct efficient zero-knowledge proofs. By proving that a bootstrapping was performed correctly, one can achieve succinct proofs for arbitrary computations. This is an active area of research.

5.4 Secure Aggregation in Federated Learning

When multiple parties want to aggregate their model updates without revealing individual updates, TFHE can be used to compute the sum of encrypted vectors. The boosted bootstrapping ensures that noise does not accumulate over many additions, allowing hundreds of parties to participate.


6. Challenges and Ongoing Research

Despite the dramatic progress, TFHE still faces obstacles:

  • Bootstrapping Key Size: The bootstrapping key (encrypted secret key bits) can be several megabytes. For resource-constrained clients (e.g., mobile phones), sending this key over the network is expensive. Research into key compression (e.g., using ring packing) reduces the size.
  • Noise Growth in Circuits: While bootstrapping resets noise, each gate still costs a bootstrapping. For circuits with millions of gates, the overhead remains substantial.
  • Arithmetic vs. Boolean: For arithmetic operations (additions, multiplications of integers), Boolean circuits are wasteful. There is ongoing work to combine TFHE with BGV/BFV-style schemes for larger plaintext spaces, using hybrid approaches.
  • Quantum Resistance: Lattice-based FHE is believed to be quantum-resistant, but a practical quantum computer would break many current assumptions. Post-quantum security is a moving target.

Boosted bootstrapping itself continues to evolve. Recent papers introduce:

  • Multi-key bootstrapping: allowing computations on data encrypted under different keys.
  • Faster external products using polynomial rings with modulus-switching and residue number system (RNS).
  • Hardware designs for FPGAs that achieve 1000 bootstraps per second per device.

7. Conclusion: The Encrypted Oracle Becomes Reality

We began with the metaphor of a locked box that can be manipulated without opening. For years, that was science fiction. Today, with TFHE and boosted bootstrapping, it is engineering. The numbers speak for themselves: bootstrapping a single bit gate in under 0.1 ms on a consumer GPU; performing encrypted inference on small neural networks in seconds; retrieving a database record without revealing which one. The cloud server truly does not need to see your data.

But the journey is far from over. The holy grail now is to make FHE as fast as plaintext. For that, we need algorithmic breakthroughs (like the ideal lattice-based bootstrapping) and hardware acceleration at scale. The research community is working on specialized ASICs for FHE, similar to the TPUs for machine learning. If such hardware becomes widely available, the encrypted oracle will sit in every data center, enabling a new era of privacy-by-default computing.

Until then, we can already build small but meaningful applications: private health data analysis, confidential financial audits, and secure voting systems. The technology is mature enough to move from academic papers to prototypes and pilot projects. As you explore implementing FHEW/TFHE with boosted bootstrapping, remember: each line of code you write is a brick in the wall that separates our digital lives from prying eyes. It is cryptography at its best—enabling trust without vulnerability.

Now go ahead, and start building your own encrypted oracle.


Want to dive deeper? Check out the open-source libraries:

For theoretical background, read the original papers:

  • Ducas, Micciancio: “FHEW: Bootstrapping Homomorphic Encryption in Less Than a Second” (2015)
  • Chillotti et al.: “TFHE: Fast Fully Homomorphic Encryption over the Torus” (2017)
  • Bonneau et al.: “Programmable Bootstrapping Enables Efficient Homomorphic Inference” (2021)

Until next time, keep your data locked, but your mind open.