ntt — negacyclic number theoretic transform

the NTT converts between coefficient form and evaluation form in R_q = F_p[x]/(x^n+1). multiplication in evaluation form is pointwise — n independent F_p multiplications instead of O(n²) convolution.

the problem

naive polynomial multiplication in R_q requires convolving two length-n coefficient vectors modulo x^n+1. cost: O(n²) F_p multiplications. at n = 1024, that is ~1 million operations.

the NTT reduces this to O(n log n) — roughly 3n total F_p multiplications for a full multiply (NTT + pointwise + INTT).

negacyclic vs standard NTT

the standard NTT computes evaluation at n-th roots of unity: the points ω^0, ω^1, ..., ω^{n-1} where ω^n = 1. this gives polynomial multiplication modulo x^n - 1 (cyclic convolution).

jali needs modular reduction by x^n + 1 (negacyclic convolution). the negacyclic NTT evaluates at the twisted points: ψ^1, ψ^3, ψ^5, ..., ψ^{2n-1} where ψ is a primitive 2n-th root of unity (ψ^{2n} = 1, ψ^n = -1).

the twisting trick

instead of implementing a separate negacyclic NTT from scratch, apply a pre-processing twist:

forward (coefficient → NTT):
  1. multiply coefficient a[i] by ψ^i        (twist)
  2. apply standard radix-2 NTT of length n  (butterfly network)

inverse (NTT → coefficient):
  1. apply standard radix-2 INTT of length n
  2. multiply coefficient a[i] by ψ^{-i}     (untwist)

the twist absorbs the x^n + 1 reduction into the root selection. after twisting, standard NTT butterflies produce the correct negacyclic result.

root of unity selection

Goldilocks p = 2^64 - 2^32 + 1 has multiplicative group of order p - 1 = 2^64 - 2^32 = 2^32 * (2^32 - 1).

the 2-adic valuation of p - 1 is 32: the largest power of 2 dividing p - 1 is 2^32. this means Goldilocks has a primitive 2^32-th root of unity.

g = generator of F_p*
ω_{2^k} = g^{(p-1)/2^k}         k-th root of unity (2^k-th)

for n = 1024:  ψ = ω_{2048} = g^{(p-1)/2048}     2n-th root
for n = 2048:  ψ = ω_{4096} = g^{(p-1)/4096}     2n-th root
for n = 4096:  ψ = ω_{8192} = g^{(p-1)/8192}     2n-th root

all three parameter sets require 2n | 2^32, which holds since 2n <= 8192 = 2^13 << 2^32. Goldilocks has abundant room for negacyclic NTT at these degrees.

twiddle table

precompute and store:

twist[i] = ψ^i                for i in 0..n       (twisting factors)
twiddle[j] = ω_n^{bit_rev(j)} for j in 0..n/2     (butterfly twiddle factors)
inv_twist[i] = ψ^{-i}         for i in 0..n       (untwisting factors)

total storage: 2.5n Goldilocks elements. at n = 4096 this is 80 KiB — fits in L1 cache.

cost model

operation              F_p muls    F_p adds    total (n = 1024)
─────────────────────  ────────    ────────    ─────────────────
twist (pre-multiply)   n           0           1024
NTT butterflies        n/2 log n   n log n     5120 + 10240
pointwise multiply     n           0           1024
INTT butterflies       n/2 log n   n log n     5120 + 10240
untwist + scale        n           0           1024

full ring multiply     ~3n muls                ~3072

the "3n muls" approximation counts the dominant cost. adds are cheaper and typically overlapped with muls in the pipeline.

in-place butterfly

the radix-2 DIT (decimation-in-time) butterfly:

butterfly(a, b, w):
  t = b * w
  (a, b) = (a + t, a - t)

one F_p multiply, two F_p adds. the entire NTT is n/2 * log₂(n) butterflies applied in bit-reversal order.

see also

  • ring — the RingElement type and operations that use NTT
  • encoding — serialization of NTT-form elements
  • automorphism — automorphisms are permutations in NTT form

Dimensions

ntt
Number Theoretic Transform: a discrete Fourier transform over a finite field, enabling fast polynomial multiplication in stark proof systems over the goldilocks field NTT replaces complex roots of unity with primitive roots in a prime field, achieving O(n log n) polynomial multiplication with exact…
nebu/specs/ntt
NTT specification the Number Theoretic Transform over the Goldilocks field. the algebraic engine behind STARK proving, polynomial arithmetic, and TFHE operations in cyber. roots of unity the multiplicative group F_p* has order p − 1 = 2³² × (2³² − 1). the factor 2³² provides a principal 2³²-th root…

Local Graph