folded sponge — fold multi-block absorption into accumulator

for long inputs requiring K sponge blocks, fold each permutation into the running accumulator instead of proving K independent permutations. 7× savings for K=10.

the observation

consecutive sponge absorption calls share state. the output of permutation N is the input of permutation N+1 (after absorbing the next block). this sequential dependency means K permutations are proved independently.

the construction

with proof-carrying computation (proof-carrying) and HyperNova folding (folding-first):

single block:  input → 24 rounds → output           (1 perm, ~736 constraints)

K-block sponge (current):
  K independent permutations, each ~736 constraints
  total: K × 736 constraints

K-block sponge (folded):
  fold each permutation into accumulator: K × ~30 field ops
  one decider at end: ~736 constraints
  total: 736 + 30K field ops

cost comparison

K = 1:     736 vs 766     (~1×, no savings)
K = 5:     3,680 vs 886   (4.2×)
K = 10:    7,360 vs 1,036  (7.1×)
K = 100:   73,600 vs 3,736 (19.7×)
K = 1000:  736,000 vs 30,736 (24×)

the savings grow with K. for large particles (content hashing) with many 56-byte blocks: substantial improvement.

example: 4 KB particle

4096 bytes / 56 bytes per block = 74 blocks

current:  74 × 736 = 54,464 constraints
folded:   736 + 74 × 30 = 2,956 constraints
savings:  18.4×

interaction with proof-carrying

in nox's proof-carrying execution, each hemera permutation during reduce() folds into the running accumulator automatically. the nox VM doesn't distinguish single-block from multi-block — every permutation is one fold step.

interaction with batched proving

folded sponge and batched proving (batched-proving) address different scenarios:

folded sponge: one long input (K blocks sequential)
batched proving: N independent short inputs (parallel)

they compose: a block with N particles, each K blocks long:

fold each particle's K blocks: K fold steps per particle
batch the N deciders: one sumcheck for all N
total: N × K × 30 fold ops + 736 + O(N) binding constraints

open questions

fold accumulator size: each fold step adds ~30 field ops to the accumulator. for K=1000: 30,000 field ops in the accumulator. is the accumulator polynomial degree manageable?
streaming compatibility: hemera already supports streaming absorption (absorb block, permute, absorb next). folding integrates naturally — each permute step folds

see inversion-sbox for per-permutation cost, proof-carrying for fold mechanism, batched-proving for multi-instance optimization

Dimensions

folded-sponge

hemera/reference/props/folded-sponge.md