erasure-coding.md

erasure coding over Goldilocks

Reed-Solomon erasure coding as a hemera module. same field, same NTT, same pipeline — encode data for availability, commit via hemera, verify via Lens.

motivation

hemera's mission: identity and authentication for content. erasure coding is the availability extension — content addressed by hemera hash, recoverable via hemera erasure encoding. one crate handles both "what is it?" (hash) and "can I get it back?" (erasure).

the math is already here. Reed-Solomon encoding is polynomial evaluation at distinct points over a finite field. hemera already lives in Goldilocks. nebu already provides NTT. RS encoding = NTT(data) + evaluate at extension points. no new field, no new arithmetic.

specification

interface

rs_encode(data: &[F_p], k: usize) → Vec<F_p>
  input:  k data symbols (Goldilocks field elements)
  output: n = 2k symbols (k data + k parity)
  method: interpret data as polynomial coefficients,
          evaluate at n distinct points via NTT

rs_decode(shares: &[(usize, F_p)], k: usize) → Result<Vec<F_p>, DecodeError>
  input:  any k-of-n symbols with their positions
  output: original k data symbols
  method: Lagrange interpolation (or equivalent via inverse NTT)
  error:  DecodeError if fewer than k shares provided

rs_verify_encoding(commitment: H, row_index: usize, cells: &[(usize, F_p)], k: usize) → bool
  verify that cells are consistent with claimed polynomial degree < k
  if degree(interpolated) ≥ k → bad encoding → fraud proof

2D grid layout (for DAS)

rs_encode_2d(data: &[F_p], sqrt_k: usize) → Grid2D
  arrange data into √k × √k grid
  RS-extend rows: √k → 2√k
  RS-extend columns: √k → 2√k
  result: 2√k × 2√k grid (4× original size)

rs_sample_verify(grid_commitment: H, row: usize, col: usize, value: F_p, proof: LensOpening) → bool
  verify single cell against grid commitment
  uses Lens opening (from zheng) — not hemera internal

fraud proof generation

rs_fraud_proof(row_commitment: H, cells: &[(usize, F_p)], k: usize) → Option<FraudProof>
  if cells.len() ≥ k+1:
    interpolate polynomial from k+1 points
    if degree(poly) ≥ k: encoding is invalid
    return FraudProof { cells, interpolated_poly, row_commitment }
  else: None (insufficient cells for fraud detection)

why hemera, not a separate crate

same field. RS over Goldilocks uses identical F_p arithmetic. no conversion.
same NTT. RS encoding is a domain evaluation via NTT — the same NTT nebu provides, the same NTT hemera's Poseidon2 permutation structure is designed around.
pipeline continuity. hemera already does: data → hash → tree → commitment. erasure extends to: data → RS encode → hash chunks → tree → commitment. the encode step inserts before the existing hash step.
scope coherence. hemera = "make data identifiable and available." hash = identifiable. erasure = available. same mission, same crate.

what stays outside hemera

DAS sampling protocol (which cells to sample, how many, confidence levels) → bbg
Lens opening proofs for grid cells → zheng
network-level chunk distribution → radio
fraud proof broadcasting → radio

hemera provides the codec. consumers provide the protocol.

cost model

operation	field ops	notes
RS encode (k symbols)	O(k log k)	one NTT
RS decode (k shares)	O(k log k)	one inverse NTT + interpolation
fraud proof verify	O(k)	polynomial evaluation at k+1 points
2D grid encode (k² data)	O(k² log k)	2k row NTTs + 2k column NTTs

dependencies

nebu: NTT over Goldilocks (evaluation and interpolation domains)
hemera (internal): hash chunks after RS encoding

open questions

systematic vs non-systematic encoding (systematic = first k symbols are original data, simpler but leaks structure)
optimal extension factor (2× is standard, 3× or 4× increases availability at cost of bandwidth)
interleaved coding (multiple polynomials share evaluation points — amortizes overhead)
integration with hemera tree: should RS-encoded chunks be leaves in the existing Merkle tree, or does erasure coding replace the tree for availability purposes?

Graph

Mentions

hemera/roadmap