binius PCS — binary-native polynomial commitment

a second PCS backend for zheng operating over F₂ tower fields. the IOP layer (SuperSpartan + sumcheck + HyperNova) stays field-generic. only the polynomial commitment scheme changes. hemera remains the only hash — used externally for Merkle commitment and Fiat-Shamir, never proved inside binary circuits.

motivation

nox bitwise patterns (xor, and, not, shl, lt) cost 32-64 STARK constraints each in F_p because bit decomposition is expensive in a prime field. in F₂, they cost 1 constraint each. the 32× gap is not a design flaw — it is the honest algebraic distance between F_p and F₂.

two workloads dominate the binary regime:

quantized AI inference: BitNet-style 1-bit models. matrix-vector multiply = XOR + popcount. a 4096×4096 layer = ~16M binary constraints (vs ~512M F_p constraints)
tri-kernel SpMV: quantized axon weights for π iteration over the cybergraph. each of 5-8 iterations is a massive binary workload

architecture

zheng
├── IOP:          SuperSpartan + sumcheck     (field-generic, shared)
├── Composition:  HyperNova folding           (field-generic, shared)
├── Hash:         hemera                      (one hash, universal)
├── PCS₁:         WHIR (Goldilocks)           (existing)
└── PCS₂:         Binius (F₂ tower)           (this proposal)

zheng exposes a PCS trait. WHIR and Binius both implement it:

trait PCS<F: Field> {
    fn commit(poly: &MultilinearPoly<F>) -> Commitment;
    fn open(poly: &MultilinearPoly<F>, point: &[F]) -> Opening;
    fn verify(commitment: &Commitment, point: &[F], value: F, proof: &Opening) -> bool;
}

Binius PCS specifics

binary tower fields

F₂ → F₂² → F₂⁴ → F₂⁸ → F₂¹⁶ → F₂³² → F₂⁶⁴ → F₂¹²⁸

each extension defined by irreducible polynomial over the previous level. tower structure enables:

128 F₂ elements packed in one u128 machine word
SIMD-native operations (64× data parallelism vs Goldilocks)
Karatsuba multiplication over tower levels

commitment

polynomial evaluations arranged in a matrix. rows packed into machine words. hemera Merkle tree over packed rows.

commit(poly over F₂):
  1. arrange evaluations in √n × √n matrix
  2. pack each row into machine words (128 F₂ elements per u128)
  3. Reed-Solomon extend rows over F₂¹²⁸
  4. build hemera Merkle tree over extended rows
  5. root = hemera hash (same as WHIR commitment)

hemera operates on bytes — it doesn't care that the bytes represent binary field elements. same hemera::hash_node, same tree primitives, same 32-byte root.

opening

Binius folding: each round halves the polynomial by combining rows using a random F₂¹²⁸ challenge.

open(poly, point):
  for each folding round:
    squeeze challenge from hemera Fiat-Shamir transcript
    fold polynomial (halve rows)
    commit folded polynomial (hemera Merkle)
  return final value + all round commitments + Merkle auth paths

verification

verify(commitment, point, value, proof):
  replay Fiat-Shamir (hemera sponge)
  check folding consistency (F₂ field arithmetic)
  check Merkle auth paths (hemera)
  check final evaluation

the verifier uses hemera for hashing and F₂ for field ops. no binary-native hash needed.

cross-algebra composition

a mixed nox program (field + bitwise) partitions into sub-traces:

nox<F_p> sub-trace → zheng<WHIR> proof
nox<F₂>  sub-trace → zheng<Binius> proof
                           ↓
            HyperNova fold into shared F_p accumulator
            cost: ~30 F_p field ops + 1 hemera hash per fold

universal CCS with selectors enables heterogeneous folding:

universal_ccs = {
  sel_Fp:  1 for Goldilocks rows, 0 otherwise
  sel_F2:  1 for binary rows, 0 otherwise
}

one accumulator covers both algebras. the decider runs once in F_p (hemera-native).

recursion boundary

binary proofs are NOT verified inside binary circuits. verification crosses to Goldilocks:

binary execution → binary proof (hemera Merkle + FS externally)
                        ↓
    verify in F_p circuit (hemera native, ~70K constraints with jets)
                        ↓
    fold into F_p accumulator

hemera inside F₂ circuit: ~142K binary constraints per hash (simulating Goldilocks mul in bits). unacceptable for recursion. hemera inside F_p circuit: ~736 constraints (hemera-2). the recursion boundary is Goldilocks.

kuro dependency

a new repo kuro (黒) provides F₂ tower arithmetic:

F₂ base field operations (XOR = add, AND = mul)
tower extension arithmetic (F₂² through F₂¹²⁸)
packed SIMD operations (128 elements per u128)
no hemera dependency, no nebu dependency
pure binary algebra

zheng depends on kuro for the Binius PCS backend. hemera remains the only hash throughout.

cost comparison

operation	F_p (WHIR)	F₂ (Binius)	ratio
AND	~32 constraints	1 constraint	32×
XOR	~32 constraints	1 constraint	32×
lt (comparison)	~64 constraints	~1 constraint	64×
field multiply	1 constraint	~64 constraints	0.016×
hemera hash	~736 constraints	N/A (external)	—

binary wins for bitwise. Goldilocks wins for arithmetic. the prover chooses based on workload.

open questions

packed trace layout: nox trace spec (16 registers × 2ⁿ rows) needs binary variant with 128-bit packing
cross-algebra value transfer: quantize/dequantize at F_p ↔ F₂ boundary — concrete constraint cost
optimal partition granularity: how large should binary sub-traces be before boundary overhead dominates
kuro SIMD strategy: AVX-512 (512 elements), AVX2 (256 elements), or portable u128

see algebra-polymorphism for nox instantiation model, zheng-2 for unified architecture, proof-horizons for composition strategy

Dimensions

binius-pcs

zheng/reference/props/binius-pcs.md