batch inversion specification

invert N tower field elements simultaneously using Montgomery's trick. cost: 1 inversion + 3(N-1) multiplications. the algorithm applies identically at every tower level.

algorithm

batch_inv(a[0..N]):
  // phase 1: accumulate prefix products
  prefix[0] = a[0]
  for i in 1..N:
    prefix[i] = prefix[i-1] * a[i]

  // phase 2: invert the product of all elements
  inv_all = inv(prefix[N-1])                    // single inversion

  // phase 3: propagate inverses backward
  for i in (N-1) down to 1:
    result[i] = inv_all * prefix[i-1]           // a[i]^{-1}
    inv_all = inv_all * a[i]                    // update running inverse
  result[0] = inv_all

  return result

correctness

after phase 1: prefix[i] = a[0] * a[1] * ... * a[i].

after phase 2: inv_all = (a[0] * a[1] * ... * a[N-1])^{-1}.

in phase 3, at each step i (from N-1 down to 1):

result[i] = inv_all * prefix[i-1] = (prod_{j=0}^{N-1} a[j])^{-1} * (prod_{j=0}^{i-1} a[j]) = (prod_{j=i}^{N-1} a[j])^{-1} * (prod_{j=i+1}^{N-1} a[j])^{-1}... more precisely:
at the start of iteration i, inv_all = (prod_{j=0}^{i} a[j])^{-1} (after update)
result[i] = inv_all * prefix[i-1] = (prod_{j=0}^{i} a[j])^{-1} * (prod_{j=0}^{i-1} a[j]) = a[i]^{-1}

verification: a[i] * result[i] = 1 for all i.

cost analysis

operation	count
multiplications (phase 1)	N - 1
inversions (phase 2)	1
multiplications (phase 3)	2(N - 1)
total multiplications	3(N - 1)
total inversions	1

comparison with individual inversions

the cost of a single tower-recursive inversion at level k depends on the level. let I(k) denote this cost in terms of base-level operations. for F₂¹²⁸ (level 7), I(7) is substantial. batch inversion amortizes it:

N individual inversions: N * I(k)
batch inversion:         I(k) + 3(N-1) * M(k)

where M(k) is the multiplication cost at level k. since I(k) >> M(k) at every level, batch wins for N >= 2.

N	individual (units of I(k))	batch (units of M(k))	speedup (assuming I(k) = 50*M(k))
1	1 I(k) = 50 M(k)	1 I(k) = 50 M(k)	1x
10	500 M(k)	50 + 27 = 77 M(k)	6.5x
100	5000 M(k)	50 + 297 = 347 M(k)	14.4x
1000	50000 M(k)	50 + 2997 = 3047 M(k)	16.4x

binary field specifics

in characteristic 2, addition = subtraction = XOR. the algorithm is identical to the prime field version. the only difference is that the underlying mul and inv operations use tower arithmetic (Karatsuba and tower-recursive inversion) instead of modular arithmetic.

zero handling

if any a[i] = 0, the product prefix[N-1] = 0 and inv(0) is undefined. two strategies:

1. caller guarantees nonzero -- simplest. appropriate when inputs are known nonzero (e.g., evaluation domain points for FRI).

2. skip zeros -- replace zero elements with 1 in the prefix product:

batch_inv_safe(a[0..N]):
  is_zero[i] = (a[i] == 0)
  a'[i] = if is_zero[i] then ONE else a[i]
  result = batch_inv(a')
  for i in 0..N:
    if is_zero[i]: result[i] = ZERO
  return result

the definition 0^{-1} = 0 is non-standard but useful for batch operations where sparse zeros occur.

usage in binary proving

system	use case	typical N
binary PCS (zheng)	FRI evaluation denominators	2^16 - 2^20
binary STARK prover	constraint evaluation	2^18 - 2^24
polynomial interpolation	Lagrange basis denominators	varies

Dimensions

batch

nebu/specs/batch

batch inversion specification invert N field elements simultaneously using Montgomery's trick. cost: 1 inversion + 3(N−1) multiplications — amortized cost per element approaches 3 multiplications as N grows. algorithm correctness after phase 1: `prefix[i] = a[0] · a[1] · ... · a[i]`. after phase 2:…

kuro/specs/batch.md