hardware architecture
the brain has one compute primitive (neuron) and different wiring (connectome). nox has the same structure: 16 algebra-polymorphic patterns are the neuron types, the noun tree topology is the wiring. intelligence emerges from connectivity, not from diverse compute elements.
this insight splits hardware into two independent concerns: a small universal compute substrate and a large algebra-dependent memory system.
compute is small and universal
all 16 nox patterns reduce to five hardware units. four are the GFP primitives:
| unit | GFP primitive | patterns served | silicon cost |
|---|---|---|---|
| field ALU | fma | 5-10 (field arithmetic) | 256 MAC units |
| transform engine | ntt | butterfly networks | dedicated datapath |
| hash pipeline | p2r | pattern 15 (Poseidon2 rounds) | 4-deep pipeline |
| lookup engine | lut | table evaluation | 4 x 64K SRAM tables |
| binary ALU | — | 11-14 (AND, XOR, shifts) | trivial gate array |
the binary ALU is not a GFP primitive because it requires no field arithmetic — raw AND/XOR gates suffice. total compute silicon: roughly 200mm² at 7nm. this covers every computation nox can express across all thirteen languages.
the key property: compute does not change when the algebra changes. Arc, Ten, Wav, Geo — all execute on the same four primitives. only the instruction sequence differs. the GFP ISA (10 instructions) is stable because the mathematical operations are stable (§1.4 of the GFP spec).
memory is large and algebra-dependent
the noun store is where algebras diverge. every nox value is a noun (binary tree of atoms). the tree topology determines how data connects to computation. different algebras produce different tree shapes:
| algebra | typical noun shape | atom width | access pattern |
|---|---|---|---|
| Arc | deep, irregular (graph adjacency) | 64-bit F_p | random traversal |
| Ten | wide, regular (tensors as nested lists) | 64-bit F_p | dense sequential |
| Bt | compact, balanced (binary trees) | 1-bit F₂ | structured descent |
| Wav | butterfly-structured (FFT coefficients) | 64-bit F_p | stride-2^k access |
| Any | hash-width leaves (Poseidon2 digests) | 512-bit hash | Merkle path lookup |
content-addressed storage means every unique noun is stored exactly once. the bbg layer provides this with polynomial commitment indexes for cryptographic completeness proofs. the storage system must handle:
- leaf-width adaptation: F₂ atoms pack 64 per word, F_p atoms use one word, hash atoms span 8 words. the same tree traversal hardware must handle all three widths efficiently
- Merkle path caching: hot paths (frequently accessed subtrees) stay in L1 SRAM. bbg's NMT indexing determines which paths are hot
- noun prefetch: tree structure is known before traversal begins (the formula determines which axes will be accessed). prefetch the path before the compute unit needs it
memory dominates silicon area and power. the GFP compute units occupy ~200mm². the memory hierarchy (8 MB L1 SRAM + HBM controller + content-addressed lookup logic) occupies the rest of the die budget and consumes the majority of power.
storage IS wiring
in a content-addressed system, the noun tree topology IS the connectivity graph between operations and data. there is no separate "bus" or "interconnect" — the tree structure determines what connects to what.
axis(s, 2) means "follow this wire to the left child." axis(s, 7) means "go right, right, left." these are not memory accesses in the conventional sense — they are wire-following operations through a content-addressed graph.
this means:
- changing the algebra changes the wiring. Arc produces deep irregular trees (sparse random connectivity). Ten produces wide regular trees (dense grid connectivity). the same GFP compute units see completely different data flow patterns depending on which algebra generated the nouns
- optimizing tree traversal optimizes everything. every algebra uses
axisto navigate nouns. faster content-addressed lookup, better Merkle path caching, smarter prefetch — these accelerate all thirteen languages simultaneously - the noun store is the connectome. just as brain function depends more on white matter topology than on individual neuron properties, nox execution performance depends more on noun store efficiency than on ALU throughput
the complete hardware stack
┌─────────────────────────────────────────────┐
│ application layer │
│ 13 languages × domain-specific jets │
├─────────────────────────────────────────────┤
│ nox execution │
│ 16 patterns → instruction sequences │
├──────────────────────┬──────────────────────┤
│ GFP (compute) │ bbg (memory) │
│ │ │
│ fma: field MAC │ noun store: │
│ ntt: transforms │ content-addressed │
│ p2r: hashing │ leaf-width-adaptive│
│ lut: lookups │ Merkle-cached │
│ binary: AND/XOR │ NMT-indexed │
│ │ │
│ ~200mm² silicon │ 8MB SRAM + HBM │
│ ~40% of power │ ~60% of power │
├──────────────────────┴──────────────────────┤
│ physical substrate │
│ 7nm/5nm ASIC, PCIe / M.2 / SoC / USB │
└─────────────────────────────────────────────┘
GFP handles compute. bbg handles memory. together they form the universal substrate for all thirteen execution languages. no language-specific hardware exists — the jet mechanism maps language operations to GFP primitives, and the noun store adapts its access patterns to each algebra's tree topology.
jets bridge the gap
domain-specific operations compile to nox pattern sequences. the jet system recognizes these sequences by formula hash and dispatches them to optimized GFP instruction streams:
language operation jet hardware path
──────── ───────── ─── ─────────────
Arc rank(g, steps) matmul fma array (batch MAC)
Wav fft(x) ntt ntt engine (butterfly)
Any hash(x) hash p2r pipeline
Ten activation(x) lookup lut engine
Geo geometric_product geo_mul fma array (mul/add)
STARK verify(proof) verifier all four primitives
the jet mechanism is the same one that accelerates the STARK verifier. every domain-specific language benefits from the same acceleration path. the multiproof architecture composes these jets into batch proofs across algebras.
design implications
for GFP development: the compute side is fully specified. the four primitives plus binary ALU cover all 16 patterns. focus GFP engineering on throughput, power efficiency, and the memory interface — not on adding instructions.
for bbg development: the memory side is where algebra-specific optimization lives. noun store layout, caching policy, prefetch strategy, and leaf-width packing are the levers that differentiate performance across algebras. bbg's polynomial commitment indexes (NMT, EdgeSet) determine memory access efficiency.
for nox development: the jet compiler maps pattern sequences to GFP instructions. jet quality determines how much of the theoretical GFP throughput is realized. the jet recognition system (formula hash matching) must be fast — it runs on every function call.
for chip architects: the compute-memory split means GFP and bbg can be developed and optimized independently. compute scales with transistor density (Moore's law). memory scales with packaging (HBM generations, 3D stacking). the interface between them (the noun access protocol) is the critical contract.
cross-references
- Goldilocks field processor — compute substrate specification
- bbg — authenticated state layer (memory substrate)
- nox — execution model (16 patterns, jet system)
- multiproof architecture — batch proof composition across algebras
- languages — the thirteen execution algebras
- rosetta stone — why the four primitives unify all domains