- long-term integrity: 64-byte (512-bit) digests provide full 256-bit post-quantum pre-image margin (grover only halves the exponent). this keeps verification sound well past 2040 without needing a risky, late migration.
- migration risk: recomputing hashes later requires the original bytes. if some chunks are only available on third-party nodes, coordination and bandwidth become the bottleneck. starting at 64 bytes now avoids this systemic risk.
- negligible performance cost: blake3’s cost is dominated by message compression, not output size. moving from 32 to 64 bytes adds <5% cpu on large streams and only tens of nanoseconds on small items, while storage proofs roughly double but remain <2% of payload.
- privacy-by-default: publishing only anonymised node ids plus commitments to weights lets anyone audit topology without exposing personal balances or authorship by default. this resists “store now, decrypt later” and casual scraping.
- selective transparency: commitments are homomorphic. any stakeholder can later open chosen edges or identities and prove consistency with the canonical state, enabling targeted audits and disclosures without blanket deanonymisation.
- zk readiness: the ranking and audit logic runs inside zk circuits using poseidon2 (constraint-cheap), while storage and networking continue to use blake3 (hardware-fast). this separation keeps everyday ops fast and proofs practical.
- dedup and upgrade agility: fastcdc chunking plus content addressing preserves dedup across edits. multicodec tagging and auxiliary poseidon2 tags let us evolve circuit-level primitives without changing stored blake3 roots.
- operational simplicity: a single fixed digest width across all merkle nodes, proofs, and indices simplifies implementations, cache behavior, and protocol negotiation.
why these specific primitives
- blake3-xof-512 for storage: saturates modern simd, supports verified streaming, and scales linearly; choosing 64-byte output reclaims full quantum headroom at minimal cost.
- pedersen commitments for weights: perfectly hiding, additively homomorphic, mature libraries, no need to alter storage keys. later migration to kzg/ipa is possible via zk equivalence proofs.
- poseidon2 in circuits: field-friendly hash with low constraint count; we attach poseidon2 tags to blake3 nodes where zk needs to traverse merkle paths.
- anonymised node ids: poseidon2(pubkey ∥ salt) gives unlinkability by default and a clean path for voluntary identity disclosure.
- range proofs on weights: ensure values are non-negative and bounded, preventing pathological or adversarial inputs without revealing exact balances.
foundational choices
- content addressing: blake3-xof, fixed 64-byte digests for all merkle nodes and cids.
- chunking/dedup: variable-size fastcdc with target 32 kiB (min 4 kiB, max 256 kiB). each chunk is stored once and referenced by its 64-byte digest.
- storage encoding: bao-style verified streaming adapted to 64-byte digests.
- multicodec: register blake3-512-xof; cid strings in base32-lowercase.
confidentiality model
- public sees only an anonymised, weighted digraph: edges with commitments to weights and anonymised node identifiers.
- authorship and per-account contributions stay hidden by default.
- anyone can later reveal chosen subgraphs and prove they match the canonical state.
identities and node ids
- real identity key: ed25519/secp256k1 (off-chain).
- anonymised node id: poseidon2(pubkey ∥ salt) → field element.
- disclosure path: reveal salt + pubkey to bind a node id back to an identity when desired.
edge record (canonical on-chain/off-chain object)
- u′, v′: anonymised node ids (field elements).
- cₑ: pedersen commitment to total edge weight wₑ (additively homomorphic).
- πₑ: range proof that 0 ≤ wₑ < 2⁶⁴ (bulletproof or halo2 equivalent).
- leaf hash for storage: blake3-512(u′ ∥ v′ ∥ cₑ ∥ πₑ).
staking/deposit flow (confidential)
- depositor chooses random rᵢ and computes cᵢ = g^{wᵢ} h^{rᵢ}.
- relayer verifies a pedersen-opening proof for cᵢ against policy.
- relayer updates cₑ ← cₑ · cᵢ and republishes the updated leaf + merkle path.
- no public link to the depositor is recorded; only anonymised node ids and commitments are visible.
ranking with zk correctness
- inputs (public): anonymised adjacency list and per-edge commitments cₑ.
- inputs (private witness): openings of commitments (weights), salts for any node-ids needed inside the circuit.
- computation: run k power-iteration steps or chosen ranking algorithm using private weights.
- outputs: rank vector r (quantised) + poseidon2 root of r + zk proof π that r was computed exactly from the hidden weights and published topology.
selective disclosure of subgraphs
- reveal chosen edges by providing their pedersen openings (wᵢ, rᵢ) and verifying ∏ revealed cᵢ equals current cₑ.
- reveal authorship by disclosing the salt for affected node ids to map back to pubkeys.
- partial disclosure is zero-knowledge for everything not revealed; unopened edges remain perfectly hidden.
hashing inside vs outside circuits
- outside (fast path): blake3-xof-512 for all content-addressed storage and merkle nodes.
- inside zk (constraint-minimal): poseidon2 for merkle steps and vector commitments.
- glue: store poseidon2 tags of blake3 nodes where circuits need to traverse the same paths cheaply.
security notes
- integrity: 64-byte digests restore full 256-bit post-quantum pre-image margin (grover-resistant).
- hiding: pedersen commitments are perfectly hiding; range proofs prevent invalid or extreme values.
- anonymity: node ids derive from salted hashes; unlinkable until the salt is disclosed by the owner.
- downgrade resistance: clients that understand commitments/zk must reject content missing required fields once the flag-day is reached.
performance and overhead
- hashing throughput: blake3 remains ~7–10 gb/s per core for large streams; 64-byte outputs add <5% cpu.
- metadata growth: +32 bytes per commitment and ~50 bytes per aggregated range proof per edge (implementation-dependent).
- indexes/outboards: double due to 64-byte digests; still <2% overhead relative to payload for typical chunk sizes.
- zk proving: dominated by poseidon and linear algebra; practical for ~10⁵ edges per proof on today’s hardware; shard larger graphs across multiple proofs.
interoperability and migration
- genesis uses only 64-byte digests; no dual-hash compatibility path required.
- if a new field-friendly hash supersedes poseidon2, add an auxiliary tag without touching blake3 roots.
- commitments can migrate (pedersen ↔ kzg) via a zk equivalence proof without re-indexing storage.
implementation checklist
- update hashing and bao to fixed 64-byte outputs.
- implement fastcdc chunker and content-addressed store keyed by 64-byte blake3 digests.
- define edge record schema and merkle layout.
- integrate pedersen commitments and range proofs for weights.
- build poseidon2 merkle helpers and rank circuit (groth16/plonk/stark as appropriate).
- define selective disclosure api (openings, salt reveal, audit routines).
- set a flag-day after which clients must enforce presence/validity of commitments and range proofs.
open questions / to decide
- exact ranking algorithm parameters (α, iterations k, damping, normalisation).
- choice of proof system (groth16 vs plonk vs stark) based on ecosystem and performance.
- range-proof system (bulletproofs, plonkish custom gates, or halo2 native) and aggregation strategy.
- privacy budget for node-id salts and rotation policy.
- governance policy for subgraph disclosures and audit procedures.
appendix: example edge object (informal)
edge {
uid: blake3-512(u′ ∥ v′ ∥ cₑ ∥ πₑ)
from: u′ // poseidon2(pubkey ∥ salt)
to: v′ // poseidon2(pubkey ∥ salt)
commit: cₑ // pedersen commitment to total weight
range_proof: πₑ // non-negative, bounded
tags: [poseidon2(uid)] // optional, for in-circuit traversal
}