encoding specification

Input Encoding (Bytes → Field Elements)

Pack input bytes into 7-byte little-endian chunks. Each 7-byte chunk is zero-extended to 8 bytes and interpreted as a u64 in little-endian order, producing one Goldilocks field element.

bytes[0..7]   → element 0    (zero-extend to u64 LE)
bytes[7..14]  → element 1
bytes[14..21] → element 2
...
bytes[49..56] → element 7    (= one full rate block)

Why 7 bytes, not 8: The maximum 7-byte value is 2⁵⁶ − 1 = 0x00FF_FFFF_FFFF_FFFF. The Goldilocks prime is p = 0xFFFF_FFFF_0000_0001. Since 2⁵⁶ − 1 < p, every 7-byte value is a valid field element without reduction. No conditional splitting, no branching, no overflow handling. The encoding is a single u64::from_le_bytes with a zero high byte — branchless, constant-time, and injective.

At 8 bytes per element, approximately 1 in 2³² inputs would require splitting (when the value ≥ p), making encoding data-dependent and variable-length. The 7-byte encoding trades 12.5% rate reduction (56 vs 64 bytes per block) for unconditional simplicity. At planetary scale, branch-free encoding is worth one extra permutation per 8 rate blocks.

Rate block: 8 elements × 7 bytes = 56 input bytes per absorption. One permutation processes 56 bytes of content.

Output Encoding (Field Elements → Bytes)

Output uses the full canonical u64 representation: 8 bytes per element, little-endian. Output elements are guaranteed to be in [0, p) by the permutation — no reduction needed.

element 0 → bytes[0..8]     (u64 to LE bytes)
element 1 → bytes[8..16]
...
element 7 → bytes[56..64]   (= 64-byte hash output)

The asymmetry — 7 bytes in, 8 bytes out — is deliberate. Input encoding must be injective for collision resistance. Output encoding must preserve full field element fidelity for algebraic composability. These are different constraints with different optima.

Padding (Hemera: 0x01 ∥ 0x00*)

After all input bytes are buffered:

Append a single 0x01 byte (padding marker)
Pad with 0x00 bytes to fill the rate block (56 bytes total)
Encode the padded block as 8 field elements and absorb
Store total input byte count in state[10] (capacity length field)

The padding is rate-aligned: every message, regardless of length, ends with exactly one padded absorption. The 0x01 marker distinguishes message ∥ 0x00 from message — standard multi-rate padding adapted to the 7-byte element encoding.

Output Format

A Hemera hash is 64 bytes. Nothing more. No version prefix, no mode byte, no escape hatch. The raw output of 8 Goldilocks field elements in little-endian canonical form IS the particle address.

Hemera output = 8 × 8 bytes = 64 bytes (little-endian, canonical range [0, p))

If Hemera is ever broken, the entire graph rehashes. Storage proofs make this possible. Versioning headers do not save you — they waste bytes multiplied by 10²⁴ cyberlinks.

Dimensions

encoding

nebu/reference/encoding

encoding specification how bytes map to field elements and back. this encoding is shared by every system that moves data across the byte↔field boundary. bytes to field elements input bytes are packed into field elements using 7-byte little-endian chunks. the maximum 7-byte value is 2⁵⁶ − 1 =…

nox/reference/encoding

noun encoding specification version: 0.2 status: canonical identity every noun has an identity computed by the instantiated hash function H (see nouns.md structural hash). the nox protocol operates exclusively on hash identities. in the canonical instantiation (nox), the…

Linked References

hemera/reference/field

hemera/reference/encoding.md