F₂ tower field arithmetic for binary proving. kuro is to F₂ what nebu is to Goldilocks.

the tower

F₂ → F₂² → F₂⁴ → F₂⁸ → F₂¹⁶ → F₂³² → F₂⁶⁴ → F₂¹²⁸
 1b    2b    4b    8b    16b     32b     64b     128b

each extension: $F_{2^{2k}} = F_{2^k}[x] / (x^2 + x + \alpha_k)$

at the top: F₂¹²⁸ = 128 F₂ elements packed in one u128 machine word. one XOR = 128 additions. one AND = 128 multiplications. SIMD-native.

why kuro exists

bitwise operations in Goldilocks (F_p) cost ~32 constraints each — bit decomposition is expensive in a prime field. in F₂, they cost 1 constraint. the 32× gap is the algebraic distance between F_p and F₂.

two workloads dominate the binary regime:

  • quantized AI inference: BitNet 1-bit models. matrix-vector multiply = XOR + popcount
  • tri-kernel SpMV: quantized axon weights for π iteration

kuro provides the field arithmetic. the binary PCS (zheng) provides the commitment. nox Bt instantiation (nox<F₂>) provides the execution. together: binary workloads at native cost.

dependency graph

nebu (Goldilocks)    kuro (F₂) ← this repo
       ↓                ↓
     hemera (hash, trust anchor)
       ↓
     nox (VM)
       ↓
     zheng (proofs, binary PCS uses kuro)
       ↓
     bbg (state, binary traces fold into Goldilocks accumulator)

discover all concepts


from subgraph kuro

kuro (黒)

F₂ tower field arithmetic for binary proving. kuro is to F₂ what nebu is to Goldilocks.

the tower

F₂ → F₂² → F₂⁴ → F₂⁸ → F₂¹⁶ → F₂³² → F₂⁶⁴ → F₂¹²⁸
 1b    2b    4b    8b    16b     32b     64b     128b

each extension: $F_{2^{2k}} = F_{2^k}[x] / (x^2 + x + \alpha_k)$

Wiedemann tower construction: $\alpha_k$ = product of all previous generators. at the top: F₂¹²⁸ = 128 F₂ elements packed in one u128 machine word.

operations

at every tower level:

operation description complexity
add XOR 1 instruction
mul tower Karatsuba 3 recursive muls at half-width
inv tower-recursive norm-based 1 sub-field inv + few muls
square Frobenius endomorphism linear in char 2
sqrt inverse Frobenius n-1 squarings
frobenius a^(2^k) k squarings
trace Tr(a) ∈ F₂ tower-recursive
norm N(a) to sub-field tower-recursive
exp square-and-multiply O(log e)

packed operations

128 parallel F₂ operations in one u128:

packed_add(a, b) = XOR      128 additions, 1 instruction
packed_mul(a, b) = AND      128 multiplications, 1 instruction
inner_product(a, b)          popcount(AND), 2 instructions — the BitNet kernel

why kuro exists

bitwise operations in Goldilocks (F_p) cost ~32 constraints each — bit decomposition is expensive in a prime field. in F₂, they cost 1 constraint. the 32× gap is the algebraic distance between F_p and F₂.

two workloads dominate the binary regime:

  • quantized AI inference: BitNet 1-bit models. matrix-vector multiply = XOR + popcount
  • tri-kernel SpMV: quantized axon weights for π iteration

structure

kuro/
├── rs/              core library (77 tests, zero deps, no_std)
│   ├── tower.rs     8 tower levels with full arithmetic
│   ├── packed.rs    Packed128 SIMD operations
│   ├── inv.rs       checked inversion
│   ├── batch.rs     Montgomery batch inversion
│   └── encoding.rs  bytes ↔ tower elements
├── wgsl/            GPU compute shaders
├── cli/             command-line tool
├── reference/       canonical specifications (9 docs)
└── docs/explanation/ educational articles (8 docs)

usage

cargo test -p kuro
cargo bench -p kuro
cargo run -p kuro-cli -- help
cargo run -p kuro-cli -- calc mul 0xDEADBEEF 0xCAFEBABE --level 32

companion repos

repo role
nebu Goldilocks field arithmetic (the prime field)
hemera hash function (trust anchor)
nox VM (Bt = nox<F₂, Z/2, external>)
zheng proof system (binary PCS uses kuro for F₂ ops)

license

cyber license: don't trust. don't fear. don't beg.

Pages in this namespace

Local Graph