capacity typing — type integration into the hash

hemera has two reserved capacity slots (state[14], state[15]). the protocol is frozen — permutation parameters, round constants, S-box are permanent. but capacity slots are protocol-level metadata, not cryptographic parameters. using reserved slots for type information is a compatible extension.

motivation

every particle in the cybergraph has a type. currently type is external metadata — stored alongside the hash, not inside it. this means two particles with identical content but different types produce identical hashes. type confusion is possible: the same bytes interpreted as "image" and "executable" share a CID.

integrating type into capacity makes type intrinsic to identity. same content, different type → different hash. type becomes part of what the content IS, not a label attached after.

proposal

use state[14] for a type tag:

state[14]     type_tag    content type identifier (u64, from a fixed registry)
state[15]     reserved    remains zero (future use)

type registry

a fixed set of type identifiers, frozen at genesis:

TYPE_RAW        = 0x00    untyped bytes (default, backward compatible)
TYPE_PARTICLE   = 0x01    cybergraph particle (content-addressed node)
TYPE_CYBERLINK  = 0x02    cyberlink record (7-tuple)
TYPE_SIGNAL     = 0x03    signal (batch of cyberlinks + proof)
TYPE_POLYNOMIAL = 0x04    polynomial commitment data
TYPE_PROOF      = 0x05    zheng proof
TYPE_KEY        = 0x06    public key / neuron identity
TYPE_NULLIFIER  = 0x07    spent record nullifier
TYPE_COMMITMENT = 0x08    private record commitment

backward compatibility

TYPE_RAW = 0x00 is the default. existing hashes (with state[14] = 0) are type-raw hashes. all current hemera outputs remain valid. the extension is purely additive.

API change

hash(bytes) → Hash                      // existing (type_tag = 0x00)
hash_typed(bytes, type_tag: u64) → Hash  // new: type-aware hash

hash(bytes) remains unchanged. hash_typed(bytes, TYPE_PARTICLE) produces a different hash than hash_typed(bytes, TYPE_CYBERLINK) even for identical bytes. domain separation through capacity — the same mechanism hemera already uses for flags, counter, and domain tags.

consequences

type confusion prevention

a particle hash and a cyberlink hash can never collide even if the underlying bytes are identical. the type tag enters the permutation through capacity — the same security mechanism that separates keyed from unkeyed hashes.

verifiable typing

given a hash and a claimed type, verification is: recompute hash_typed(content, claimed_type) and compare. if the hash matches, the type is authentic. type is cryptographically bound to content.

structural restriction

the type registry is finite and frozen at genesis. new types require a protocol-level decision. this is intentional — type inflation defeats the purpose. the registry should be small enough that every type is meaningful.

interaction with other capacity fields

capacity fields are orthogonal:

state[8]  = counter      (chunk position)
state[9]  = flags        (structural role: root, parent, chunk)
state[10] = msg_length   (finalization)
state[11] = domain_tag   (API mode: hash, keyed, derive_key)
state[12] = ns_min       (NMT namespace lower)
state[13] = ns_max       (NMT namespace upper)
state[14] = type_tag     (content type) ← NEW
state[15] = reserved     (zero)

a typed keyed hash of a Merkle leaf would have: state[9] = FLAG_CHUNK, state[11] = DOMAIN_KEYED, state[14] = TYPE_PARTICLE. all three are independent — type does not interfere with existing domain separation.

open questions

  • should the type registry be extensible post-genesis? (recommendation: no — frozen registry, same as hemera parameters)
  • should type_tag be a single u64 or split into (type_class: u32, type_variant: u32)? (simpler = better)
  • interaction with NMT: does type_tag affect namespace sorting? (recommendation: no — type is per-hash, namespace is per-tree)
  • should hash() eventually become hash_typed() with mandatory type? (recommendation: keep hash() as TYPE_RAW for backward compatibility, add hash_typed() as the typed entry point)
  • cost impact: zero — same number of permutations, capacity values just enter the existing permutation. no additional constraints

Dimensions

capacity-typing

Local Graph