zheng-2
the next-generation proof architecture for cyber. two PCS backends (WHIR/Brakedown for Goldilocks, Binius for F₂) over one field-generic IOP (SuperSpartan + sumcheck + HyperNova). hemera is the only hash. 14 nox languages map to 2 provers. cross-algebra composition via universal CCS folding.
targets
zheng-1 (Whirlaway) zheng-2 + hemera-2 improvement source
proof size (128-bit): 157 KiB 1-5 KiB 30-150× algebraic-extraction → brakedown
verification: 1.0 ms 10-50 μs 20-100× gravity-commitment
recursive step: 70K constraints 30 field ops 2,300× folding-first
prover memory: O(N) O(√N) √N × tensor-compression
prover time: O(N log N) O(N) streaming log N × tensor-compression
proving latency: separate step zero ∞ × proof-carrying
GPU throughput: CPU only 45-100× on commodity 45-100× gpu-prover
binary workloads: 32-64× overhead native (1 constraint) 32-64× binius-pcs
FHE bootstrapping: catastrophic native (ring-aware) orders of mag ring-aware-fhe
light client: full chain replay 200 bytes, one verify ∞ × universal-accumulator
per-cyberlink (bbg): ~94K constraints ~3K constraints 30× algebraic-nmt
architecture
zheng-2
├── IOP layer (field-generic, shared)
│ ├── SuperSpartan CCS constraint system
│ ├── sumcheck exponential sum reduction
│ └── HyperNova folding + composition
│
├── PCS layer (field-specific, two backends)
│ ├── WHIR (Goldilocks) Reed-Solomon + Merkle (zheng-1 compatible)
│ ├── Brakedown (Goldilocks) expander-graph codes, Merkle-free (target)
│ └── Binius (F₂ tower) binary Reed-Solomon with packing
│
├── hash layer
│ └── hemera one hash, universal
│ Merkle + Fiat-Shamir for all PCS backends
│ never proved inside binary circuits
│
├── constraint encodings (per-language)
│ ├── generic AIR nox patterns 0-16
│ ├── ring-structured R_q operations (Wav, FHE)
│ └── binary F₂ patterns (Bt)
│
├── jet libraries (per-language)
│ ├── verifier jets hash, poly_eval, merkle_verify, fri_fold, ntt
│ ├── fhe_bootstrap jets ntt_batch, key_switch, gadget_decomp, noise_track
│ └── per-language jets 14 languages × dedicated jet sets
│
└── composition
└── universal accumulator folds ALL proof types into one constant-size object
two PCS backends
Goldilocks path
12 of 14 nox languages encode as F_p constraints:
| language | algebra | constraint type |
|---|---|---|
| Tri | F_{p^n} field tower | native field arithmetic |
| Tok | UTXO conservation | field arithmetic + hemera membership |
| Arc | category theory | tree traversal + hemera hashing |
| Seq | partial order | structural comparisons |
| Inf | Horn clauses | unification + resolution |
| Bel | g on Δ^n | fixed-point Bayes |
| Ren | G(p,q,r) shapes | fixed-point geometry |
| Dif | (M, g) manifolds | discretized derivatives |
| Sym | (M, ω) dynamics | Hamiltonian integration |
| Wav | R_q convolution | NTT multiply (ring-aware jets) |
| Ten | contraction (full-precision) | matrix multiply |
| Rs | Z/2^n words (arithmetic-heavy) | word arithmetic + range checks |
PCS options:
- WHIR (near-term): Reed-Solomon + hemera Merkle. proof size 60-157 KiB → 5-12 KiB with algebraic extraction
- Brakedown (target): expander-graph encoding, Merkle-free. proof size ~1-5 KiB. shifts bottleneck from hemera to nebu
binary path
2 of 14 languages encode as F₂ constraints:
| language | algebra | constraint type |
|---|---|---|
| Bt | F₂ tower | native binary (XOR=add, AND=mul) |
| Ten | contraction (quantized) | 1-4 bit matrix multiply |
PCS: Binius over F₂ tower fields with hemera Merkle commitment externally.
Rs (words) can go either way depending on workload — compiler decides.
why two, not one
| operation | F_p cost | F₂ cost | ratio |
|---|---|---|---|
| AND | ~32 constraints | 1 constraint | 32× |
| XOR | ~32 constraints | 1 constraint | 32× |
| lt | ~64 constraints | ~1 constraint | 64× |
| field multiply | 1 constraint | ~64 constraints | 0.016× |
binary wins 32-64× for bitwise. Goldilocks wins 64× for arithmetic. one PCS cannot serve both efficiently.
hemera: one hash
hemera is the ONLY hash in the entire stack. both PCS backends use it:
- WHIR: hemera Merkle tree over Goldilocks polynomial evaluations
- Brakedown: hemera not needed for commitment (field-ops only), used for Fiat-Shamir
- Binius: hemera Merkle tree over packed binary evaluations (bytes in, bytes out — field-agnostic)
- Fiat-Shamir: hemera sponge for all backends. challenges squeezed as bytes, interpreted as target field elements
recursion boundary: binary proofs are NEVER verified inside binary circuits. verification crosses to Goldilocks where hemera is native (~736 constraints per permutation with hemera-2). the recursion topology:
binary execution → binary proof (hemera external)
↓
verify in F_p circuit (hemera native, ~736 constraints)
↓
fold into universal accumulator (Goldilocks)
hemera-2 contribution
hemera-2 (32-byte output, 24 rounds, x⁻¹ partial rounds):
hemera-1 hemera-2 improvement
rounds per hash: 72 24 3×
constraints/perm: ~1,200 ~736 1.6×
fold steps per hash: 72 24 3×
output size: 64 bytes 32 bytes 2×
tree node cost: 2 permutations 1 permutation 2×
MPC depth: 192 40 5.4×
FHE noise: 192 sequential 40 sequential 5.4×
cross-algebra composition
universal CCS
a single CCS instance with algebra selectors:
universal_ccs = {
sel_Fp: 1 for Goldilocks rows, 0 otherwise
sel_F2: 1 for binary rows, 0 otherwise
sel_ring: 1 for ring-structured rows (Wav/FHE), 0 otherwise
}
HyperNova folding across algebras
fold(acc, goldilocks_instance) → acc' (~30 F_p ops + 1 hemera)
fold(acc', binary_instance) → acc'' (~30 F_p ops + 1 hemera)
fold(acc'', ring_instance) → acc''' (~30 F_p ops + 1 hemera)
one accumulator, all algebras
decider: one proof, 10-50 μs verification
boundary cost per cross-algebra fold: ~766 F_p constraints (30 field ops + 1 hemera-2 hash). for 32-layer quantized neural net inference: 32 × 766 = ~24.5K F_p constraints. negligible vs millions of binary execution constraints.
ring-aware FHE proving
with mudra choosing q = Goldilocks (goldilocks-fhe), R_q operations are native nebu NTT. the Wav language provides ring-structured CCS encodings and dedicated jets:
FHE bootstrapping phases:
gadget decomposition → Bt (Binius) binary bit-splitting
blind rotation → Wav (WHIR/ring) batched NTT multiply
key switching → Ten (WHIR) matrix-vector
modulus switching → Tri (WHIR) field rescaling
ring-aware jets:
ntt_batch: batch n polynomial multiplies (log(n) savings on commitment)
key_switch: automorphism as permutation argument (n/log(n) savings)
gadget_decomp: delegate binary decomposition to Binius
noise_track: running noise accumulator (check once at bootstrap boundary)
hemera under FHE: hemera-2's reduced multiplicative depth (40 vs 192) means 5.4× less noise for homomorphic hash evaluation.
proposals (integrated)
zheng-2 integrates 10 standalone proposals across four areas:
proof size
| proposal | target | mechanism |
|---|---|---|
| algebraic-extraction | 157 KiB → 5-12 KiB | batch algebraic opening replaces Merkle paths |
| brakedown-pcs | 157 KiB → 1-5 KiB | Merkle-free PCS via expander-graph codes |
| gravity-commitment | verification ∝ importance | mass-weighted polynomial encoding by π |
algebraic extraction is near-term (applies to WHIR). Brakedown supersedes it long-term by eliminating Merkle trees entirely. gravity commitment makes high-π particles cheaper to verify regardless of PCS backend.
composition and recursion
| proposal | target | mechanism |
|---|---|---|
| folding-first | 70K constraints → 30 field ops | HyperNova folding at every level |
| proof-carrying | zero proving latency | per-step folding during nox execution |
| universal-accumulator | one ~200 byte proof for everything | fold all proof types into one object |
folding-first reduces block proving 20× and epoch composition 1000×. proof-carrying eliminates the separate proving phase. universal accumulator extends folding to all obligations (signals + state + LogUp + DAS + VDF). light client: download 200 bytes, verify once, full chain confidence.
prover performance
| proposal | target | mechanism |
|---|---|---|
| tensor-compression | O(√N) memory, O(N) streaming | tensor decomposition of structured traces |
| gpu-prover | 45-100× throughput | full pipeline in VRAM, zero PCIe transfers |
tensor compression enables mobile proving. GPU prover handles foculus π computation and proof generation on one device.
binary and domain-specific
| proposal | target | mechanism |
|---|---|---|
| binius-pcs | 32-64× cheaper bitwise ops | F₂ tower native PCS with hemera external |
| ring-aware-fhe | native TFHE bootstrapping | ring-structured CCS + dedicated jets |
binius-pcs serves Bt and quantized Ten/Rs workloads (AI inference, tri-kernel SpMV). ring-aware-fhe proves FHE bootstrapping across 4 phases (gadget_decomp → blind_rotation → key_switch → mod_switch) using existing Wav/Bt/Ten/Tri languages.
dependency graph
nebu (Goldilocks F_p)──→ hemera (Poseidon2) ──→ zheng-2
│
kuro (F₂ towers) ──────────────────────────────→│
│
┌─────┴──────┐
│ │
WHIR/Brakedown Binius
(Goldilocks) (F₂)
│ │
└─────┬──────┘
│
SuperSpartan
sumcheck
HyperNova
│
universal CCS
universal accumulator
│
one hemera hash
language → prover mapping
14 languages → 14 algebras → 14 jet libraries → 2 fields → 2 PCS → 1 IOP → 1 hash
Goldilocks (WHIR/Brakedown): Tri, Tok, Arc, Seq, Inf, Bel, Ren, Dif, Sym, Wav, Rs*, Ten*
Binary (Binius): Bt, Ten*, Rs*
* = compiler decides based on workload
migration path
phase 1: algebraic extraction on WHIR (157 KiB → 5-12 KiB)
phase 2: folding-first composition (block proving 20×, epoch 1000×)
phase 3: Binius PCS backend + kuro (binary workloads 32-64× cheaper)
phase 4: proof-carrying computation (zero proving latency)
phase 5: ring-aware FHE jets (native FHE bootstrapping)
phase 6: Brakedown PCS replaces WHIR (Merkle-free, 1-5 KiB proofs)
phase 7: gravity commitment (power-law verification cost)
phase 8: tensor trace compression (O(√N) memory, mobile provers)
phase 9: universal accumulator (one proof for everything)
phase 10: GPU-native pipeline (45-100× throughput)
open questions
- Brakedown expander construction: which family works best for Goldilocks? concrete security parameter vs expansion tradeoff
- cross-algebra folding soundness: universal CCS with selectors — does folding heterogeneous instances preserve HyperNova security?
- Binius + hemera integration: concrete packed trace layout for F₂ with hemera Merkle externally
- ring-aware CCS formalization: R_q operations as structured CCS matrices — where exactly do the batching savings manifest?
- gravity weight stability: re-commitment when π changes significantly
- tensor rank of real traces: empirical validation on cyberlink, inference, and tri-kernel workloads
- GPU Fiat-Shamir: sequential hemera sponge on GPU — latency impact for long transcripts
see hemera-2 for hash upgrade, algebra-polymorphism for nox instantiation, goldilocks-fhe for FHE parameters