my namespace: particles I created, my neuron record ~1 KB - 1 GB my neighborhood: shards I query frequently (cached) ~1 GB - 100 GB proof cache: Lens openings from remote shards (LRU) ~100 MB BBG_root: global commitment 32 bytes
a neuron on a phone: ~1 GB total. stores its own data, caches neighbors, verifies everything else via 32-byte root.
a validator: stores multiple shards. more shards = more responsibility = more reward (proof-of-storage incentive from DAS + φ*-weighted replication).
## within a shard: the local data structure
each shard is a self-contained polynomial sub-state. the choice of local data structure depends on ONE variable: does the shard fit in RAM?
### if shard fits in RAM: flat array + hash index
CID → compact_index: HashMap<[u8;32], u32> 40 ns lookup state[dimension][idx]: &[FieldElement] 10 ns access dirty: BitVec O(1) mark
read: 50 ns write: 60 ns + WAL append commit: O(|dirty|) field ops for polynomial delta
no tree. no B-tree. no LSM. direct memory access. the simplest possible structure because polynomial commitment handles authentication externally.
persistence: mmap the arrays. WAL for crash recovery. fsync once per block.
### if shard exceeds RAM: B+ tree on SSD
fanout: 4 KB page / 48 bytes per particle entry ≈ 83 children per node depth: log₈₃(10¹¹) = 5.7 → 6 levels at nation-scale shard top 3-4 levels: pinned in RAM (~400 MB) leaf reads: 2 SSD reads per lookup = 20 μs
random read: 20 μs (warm cache, 2 SSD reads) batch write: sort dirty entries → sequential merge → 100 μs for 1000 entries range scan: sequential leaf traversal → SSD bandwidth limited (~3 GB/s)
B+ tree is the right structure for "too big for RAM, random access needed" since 1970. nothing has improved on it for this access pattern.
### archival (full history, HDD or network): sorted log + NMT
here is where NMT returns — not for authentication but for DISK ACCESS OPTIMIZATION:
historical state: append-only log sorted by namespace NMT index: sorted namespace ranges → file offsets
advantage: namespace query = sequential scan of sorted region HDD sequential: 200 MB/s HDD random: 0.1 MB/s (8 ms seek) ratio: 2000×
NMT's sorted invariant = optimal disk layout for namespace queries
NMT on archival tier is a STORAGE INDEX, not a TRUST MECHANISM. the polynomial commitment authenticates. the NMT organizes bytes on slow disks for sequential access. two different jobs.
## the complete per-node architecture
┌──────────────────────────────────────────────┐ │ POLYNOMIAL LAYER (authentication) │ │ │ │ BBG_root = compose(C₁, C₂, ..., C_S) │ │ 32 bytes. O(1) verify. scale-invariant. │ │ │ │ my_shard_commitment: 32 bytes │ │ update: O(|dirty|) field ops per block │ │ cross-shard verify: O(1) Lens opening │ └──────────────────┬───────────────────────────┘ │ ┌──────────────────▼───────────────────────────┐ │ HOT STATE (current, in RAM) │ │ │ │ flat array + HashMap index │ │ my shard's current entries │ │ read: 50 ns, write: 60 ns │ │ fits: up to 64 GB per shard │ │ │ │ persistence: mmap + WAL (fsync per block) │ └──────────────────┬───────────────────────────┘ │ ┌──────────────────▼───────────────────────────┐ │ WARM STATE (recent history, SSD) │ │ │ │ B+ tree with RAM-cached top levels │ │ historical queries: "state at time T" │ │ read: 20 μs, range scan: SSD bandwidth │ └──────────────────┬───────────────────────────┘ │ ┌──────────────────▼───────────────────────────┐ │ COLD STATE (full history, HDD/network) │ │ │ │ sorted log + NMT index for disk layout │ │ NMT = storage optimization, not trust │ │ namespace scan: sequential 200 MB/s │ │ for: archival, research, deep replay │ └──────────────────────────────────────────────┘
## the hardware evolution mapping
| storage tier | era | optimal local structure | authentication | notes |
|---|---|---|---|---|
| HDD | 1970s-2010s | B-tree / NMT (minimize seeks) | Merkle tree (tree serves dual role: index + auth) | tree tax paid once |
| SSD | 2010s-2020s | B+ tree with RAM cache | Merkle or polynomial | tree for index, polynomial for auth |
| RAM | 2020s | flat array | polynomial (32 bytes) | tree unnecessary — direct addressing |
| persistent memory (CXL) | 2030s | flat array, no WAL | polynomial (32 bytes) | even WAL unnecessary |
| GFP + RAM | target | flat array, commitment in hardware | polynomial (field ops in silicon) | data structure disappears |
the trend: as storage gets faster, data structures get simpler. trees are compensating mechanisms for slow storage. when access is O(1) (RAM), the tree adds cost without benefit. the polynomial commitment provides authentication without requiring any particular data layout.
GFP is the endgame: field arithmetic in silicon + flat array in RAM. the data structure literally disappears. what remains: memory addresses and polynomial evaluations. no nodes, no pointers, no pages, no seeks. bytes and math.
## what NMT actually is (reframed)
NMT was designed as an authentication structure (completeness proofs via sorted invariant). in polynomial state, authentication is handled by lens.
but NMT's sorted namespace property has a second life: optimal disk layout for cold storage. sorted data = sequential reads = fast on slow disks.
| role | polynomial state | NMT |
|---|---|---|
| authentication (trust) | primary: Lens opening, O(1) | unnecessary |
| hot storage index (RAM) | flat array | unnecessary |
| warm storage index (SSD) | B+ tree | unnecessary (B+ tree is better for random access) |
| cold storage layout (HDD) | not designed for disk | useful: sorted namespace = sequential scan |
| DAS chunk organization | lens-based algebraic DAS | useful: namespace-sorted chunks |
NMT survives in the architecture — not where it was designed to be (authentication), but where its sorted property happens to match the hardware (cold storage, DAS chunk layout).
## implications for BBG implementation
bbg/src/store.rs should implement three storage backends behind one trait:
```rust
trait ShardStore {
fn get(&self, dimension: u8, key: &[u8; 32]) -> Option<&[FieldElement]>;
fn put(&mut self, dimension: u8, key: &[u8; 32], value: &[FieldElement]);
fn dirty_entries(&self) -> impl Iterator<Item = (u8, [u8; 32], &[FieldElement])>;
fn commit(&mut self) -> [u8; 32]; // returns shard sub-commitment
}
// implementations:
struct FlatArrayStore { ... } // RAM: mmap + HashMap + BitVec
struct BPlusTreeStore { ... } // SSD: B+ tree with RAM cache
struct ArchivalStore { ... } // HDD: sorted log + NMT layout index
the polynomial commitment logic is ABOVE the store — it consumes dirty entries and produces the 32-byte sub-commitment. the store doesn't know about polynomials. the polynomial doesn't know about storage. clean separation.
cross-shard composition happens at the network layer — above both store and polynomial. each node: store → polynomial → network → global BBG_root.
see algebraic state commitments for the polynomial commitment mechanism. see BBG for the state specification. see NMT for namespace Merkle trees (now reframed as cold storage optimization). see DAS for data availability sampling. see Goldilocks field processor for the hardware endgame. see cyber/research/provable consensus for why O(1) state reads matter