noun encoding specification

version: 0.2 status: canonical

identity

every noun has an identity computed by the instantiated hash function H (see nouns.md structural hash). the nox protocol operates exclusively on hash identities.

in the canonical instantiation (nox<Goldilocks, Z/2^32, Hemera>), the identity is 32 bytes (4 × F_p elements). identity size is per-instantiation — it depends on H's output size. all concrete sizes below refer to the canonical instantiation.

identity = H(noun)                          32 bytes
computation_key = (H(object), H(formula))   64 bytes
computation_val = H(result)                 32 bytes

content-addressed storage

there is no reason for a prefix byte in a system where every value is looked up by its hash. nouns are stored and transmitted as content-addressed entries. no prefix bytes. no self-describing framing. the content byte length determines interpretation.

canonical (nox<Goldilocks, Z/2^32, Hemera>)

store: Identity (32 bytes) → Content

Content variants (determined by byte length):
   8 bytes     atom value (single field element, canonical LE)
  32 bytes     hash value (4 field elements, canonical LE)
  64 bytes     cell (H(left) ‖ H(right), two 32-byte identities)

three sizes: 2³, 2⁵, 2⁶. all powers of 2.

content sizes are per-instantiation. atom size = sizeof(F). hash size = sizeof(H output). cell size = 2 × hash size. in nox<F₂>, an atom is 1 bit. the storage model is the same — only the element widths change.

resolution

to materialize a noun from its identity:

resolve(id):
  content = store.get(id)
  match content.len():
    8   → atom(decode_field_element(content))
    32  → hash_atom(decode_4_field_elements(content))
    64  → cell(resolve(content[0..32]), resolve(content[32..64]))

atom type distinction

field atoms and word atoms store the same 8 bytes. the type distinction is a runtime property — which arithmetic algebra applies (modular vs bitwise). the structural hash distinguishes them (capacity[14] produces different identities for the same value with different types). the store does not need to know the type. the VM knows from execution context. the STARK proof verifies type correctness.

field element encoding (canonical: Goldilocks)

8 bytes, little-endian, canonical value in [0, p)
p = 2^64 - 2^32 + 1
values ≥ p are invalid

hash encoding (canonical: Hemera)

32 bytes = 4 × 8 bytes
each element is a canonical field element in [0, p), little-endian

canonical invariants

  1. field element values MUST be in [0, p)
  2. hash values: each of the 4 elements MUST be in [0, p)
  3. cell content is deterministic: left identity before right identity
  4. no trailing bytes after content
  5. content length MUST be exactly 8, 32, or 64 bytes

formula encoding

a formula is a noun of the form cell(tag, body) where tag is an atom (pattern number 0-16).

[0 a]        axis
[1 c]        quote
[2 [x y]]    compose
[3 [a b]]    cons
[4 [t [y n]]]  branch
[5 [a b]]    add
...
[15 a]       hash
[16 c]       hint

formulas are nouns. they are stored and resolved the same way as any other noun — by identity lookup, recursively.

Dimensions

encoding
nebu/reference/encoding
encoding specification how bytes map to field elements and back. this encoding is shared by every system that moves data across the byte↔field boundary. bytes to field elements input bytes are packed into field elements using 7-byte little-endian chunks. the maximum 7-byte value is 2⁵⁶ − 1 =…
hemera/reference/encoding
encoding specification Input Encoding (Bytes → Field Elements) Pack input bytes into 7-byte little-endian chunks. Each 7-byte chunk is zero-extended to 8 bytes and interpreted as a u64 in little-endian order, producing one Goldilocks field element. Why 7 bytes, not 8: The maximum 7-byte value is…

Local Graph