inversion specification

computing a^{-1} in GF(2^n) for each tower level. two approaches: Fermat exponentiation and tower-recursive norm-based inversion.

Fermat's little theorem

in GF(2^n), every nonzero element satisfies a^{2^n - 1} = 1. therefore:

a^{-1} = a^{2^n - 2}

the exponent 2^n - 2 has a binary representation of n-1 ones followed by a zero:

2^n - 2 = (1111...10)_2    (n-1 ones, then 0)

naive square-and-multiply

inv(a):
  t = a
  for i in (n-2) down to 1:
    t = t²
    t = t * a
  t = t²                   // final squaring (bit 0 of exponent is 0)
  return t

cost: (n-1) squarings + (n-2) multiplications. for n = 128: 127 squarings + 126 multiplications.

addition chain optimization

the exponent 2^n - 2 = 2 * (2^{n-1} - 1). computing a^{2^k - 1} has a doubling structure:

a^{2^1 - 1}  = a
a^{2^2 - 1}  = (a^{2^1 - 1})^{2^1} * a^{2^1 - 1}
a^{2^4 - 1}  = (a^{2^2 - 1})^{2^2} * a^{2^2 - 1}
a^{2^8 - 1}  = (a^{2^4 - 1})^{2^4} * a^{2^4 - 1}
...
a^{2^{2^k} - 1} = (a^{2^{2^{k-1}} - 1})^{2^{2^{k-1}}} * a^{2^{2^{k-1}} - 1}

then a^{-1} = (a^{2^n - 1})^{...} adjusted. the chain for each tower level:

GF(2^2) -- F₂²

exponent = 2^2 - 2 = 2
a^{-1} = a²

cost: 1 squaring.

GF(2^4) -- F₂⁴

e1 = a                        // a^{2^1 - 1}
e2 = e1^{2^1} * e1            // a^{2^2 - 1} = a^3
a^{-1} = e2^{2^1} * e1        // a^{2^2 - 2} * a^0 ...
       = (a^3)^2 = a^6        // but 2^4 - 2 = 14 = 1110_2

full chain for 2^4 - 2 = 14:

e1 = a                         // a^1
e2 = e1^2 * e1                 // a^3
e4 = e2^4 * e2                 // a^15... no, use:
t  = a^2                       // a^2
t  = t * a                     // a^3
t  = t^4                       // a^12
t  = t * a^2                   // a^14

cost: 3 squarings + 2 multiplications.

GF(2^128) -- F₂¹²⁸

e1   = a                                // a^(2^1 - 1)
e2   = e1^(2^1) * e1                    // a^(2^2 - 1)       1 sq + 1 mul
e4   = e2^(2^2) * e2                    // a^(2^4 - 1)       2 sq + 1 mul
e8   = e4^(2^4) * e4                    // a^(2^8 - 1)       4 sq + 1 mul
e16  = e8^(2^8) * e8                    // a^(2^16 - 1)      8 sq + 1 mul
e32  = e16^(2^16) * e16                 // a^(2^32 - 1)     16 sq + 1 mul
e64  = e32^(2^32) * e32                 // a^(2^64 - 1)     32 sq + 1 mul
e128 = e64^(2^64) * e64                 // a^(2^128 - 1)    64 sq + 1 mul

a^{-1} = e128 * a^{-1}...

more precisely: a^{-1} = a^{2^128 - 2} = (a^{2^127 - 1})^2. so:

e127 = e64^(2^63) * ...       // build a^(2^127 - 1) from the chain
a^{-1} = e127^2               // one final squaring

the chain for 2^127 - 1:

e1   = a                               1 sq,  0 mul  (total: 0 sq, 0 mul)
e2   = e1^2 * e1                                      (total: 1 sq, 1 mul)
e4   = e2^(2^2) * e2                                  (total: 3 sq, 2 mul)
e8   = e4^(2^4) * e4                                  (total: 7 sq, 3 mul)
e16  = e8^(2^8) * e8                                  (total: 15 sq, 4 mul)
e32  = e16^(2^16) * e16                               (total: 31 sq, 5 mul)
e64  = e32^(2^32) * e32                               (total: 63 sq, 6 mul)
e127 = e64^(2^63) * e64^{...}                         (build from subchains)

total for 2^128 - 2: 127 squarings + 7 multiplications.

tower-recursive inversion

the tower structure enables a recursive formula that reduces inversion in F_{2^{2k}} to inversion in F_{2^k}:

inv(a) where a = a_lo + a_hi * x in F_{2^{2k}}:
  // compute norm: N(a) = a_lo * (a_lo + a_hi) + a_hi² * alpha
  d = a_lo * (a_lo + a_hi) + sq(a_hi) * alpha     // d in F_{2^k}
  d_inv = inv(d)                                    // recursive call

  // a^{-1} = (a_lo + a_hi + a_hi * x) * d^{-1}
  result_lo = (a_lo + a_hi) * d_inv
  result_hi = a_hi * d_inv
  return result_lo + result_hi * x

cost recurrence

let I(k) = cost of inversion in F_{2^{2^k}}, M(k) = cost of multiplication:

I(0) = 0                          (F₂: trivial)
I(k) = I(k-1) + 3*M(k-1) + S(k-1) + 2*A(k-1)

where S(k) is squaring cost and A(k) is addition cost (XOR, negligible).

level	field	muls in subfield	inv in subfield	total subfield ops
1	F₂²	3 F₂ muls	0	3 AND + 2 XOR
2	F₂⁴	3 F₂² muls	1 F₂² inv	3 F₂² mul + 1 F₂² inv
3	F₂⁸	3 F₂⁴ muls	1 F₂⁴ inv	3 F₂⁴ mul + 1 F₂⁴ inv
7	F₂¹²⁸	3 F₂⁶⁴ muls	1 F₂⁶⁴ inv	3 F₂⁶⁴ mul + 1 F₂⁶⁴ inv

the recursion bottoms out at F₂ where inv(1) = 1 (and inv(0) is undefined).

zero handling

zero has no inverse. the implementation must either:

assert/panic on zero input (debug builds)
return an arbitrary value and let the caller guarantee nonzero input (release builds)

see batch for inverting many elements with zero handling.

Dimensions

kuro/docs/explanation/inversion

inversion in binary fields inversion computes a⁻¹ such that a · a⁻¹ = 1. it is the most expensive single operation in any finite field — roughly 64× the cost of multiplication. understanding how inversion works in the binary tower reveals deep structure: Fermat's theorem, the Frobenius…

kuro/specs/inversion.md