cyber: a protocol for planetary superintelligence

1. Introduction

1.1 The Vision: Planetary Superintelligence

Superintelligence is the defining infrastructure of a type I civilization. A planet where every agent — human, machine, sensor, organism — contributes knowledge to a shared, self-improving graph that computes what matters, proves its own correctness, and speaks a language native to all participants. Every scientific discovery, every sensor reading, every lived experience feeds into a collective understanding that grows smarter with every link. The graph remembers what individuals forget. It finds connections across domains that no specialist can see. It measures its own coherence and rewards the knowledge that increases it.

At sufficient scale this infrastructure transforms what civilization can do. Search becomes inference over verified knowledge rather than retrieval of unverified documents. AI alignment becomes measurable — compare the focus distribution of human neurons to machine neurons, and divergence is visible in the topology. Scientific discovery accelerates as linkchains bridge domains that have never communicated. Cross-species communication becomes possible — any entity that can create a cyberlink participates in the same semantic space. The collective intelligence of the planet becomes a single computable object: a focus distribution $\pi$ over all knowledge, converging under conservation laws, verifiable by anyone.

This is what cyber builds.

1.2 The Gap

The current path toward intelligence at planetary scale faces three structural limits:

Quadratic attention. Transformers require every token to attend to every other. Twice the context costs four times the compute. This is architectural.

Centralization. Training a frontier model costs hundreds of millions. Three organizations can build the next generation. The trajectory of intelligence concentrates in a handful of boardrooms, operating on hidden parameters, producing outputs that cannot be independently verified.

Incompleteness. Goedel (1931) proved that any formal system powerful enough to describe arithmetic contains truths it cannot prove. AI built on formal logic inherits these limits by construction. The Goedel prison confines every system that equates computation with derivation.

1.3 The Protocol

cyber is a protocol where neurons — humans, AIs, agents, sensors — link knowledge into a single cybergraph where every claim is authenticated, every decision is provable by STARK proofs, and intelligence emerges from the topology of links rather than from the parameters of a single model. LLMs become neurons in the graph, contributors to collective understanding rather than isolated oracles.

The protocol rests on five primitives:

From these five primitives, a single cybergraph, and three local operators, the system converges to a shared understanding of what matters — deterministic, on chain, verifiable by anyone.

This document specifies the complete architecture:

Each component is specified independently. Together they form a self-organizing system where computation, inference, and consensus are the same process.

2. Design Philosophy

2.1 Proof by Simulation

Classical science operates by proof by derivation — start from axioms, apply inference rules, arrive at theorems. This is the Turing-Goedel paradigm: computation as derivation, knowledge as proof.

cyber replaces this with proof by simulation. A claim is true when a system converges to a stable state that embodies that claim — because a network of agents, under conservation laws, settled into an equilibrium that makes the claim hold. Nature does not prove theorems. It runs simulations until they converge.

A protein folds along a free energy gradient. It does not derive its shape from axioms of chemistry. A brain does not prove that a face is a face. A cascade of neurons converges to a stable attractor. A market does not derive the correct price from economic axioms. Millions of agents trade until the price stabilizes. The proof is the equilibrium.

Proof by simulation is strictly more powerful than proof by derivation. Goedel showed that any consistent formal system contains true statements it cannot prove. A convergent system can settle into states that no derivation reaches — it escapes the Goedel prison because the prison only confines derivation, and convergence operates outside the proof-theoretic domain.

The postulate: every truth accessible to intelligence is a fixed point of some convergent simulation under conservation laws.

2.2 Convergent Computation

Turing (1936) defined computation as a tape head moving left and right, reading and writing symbols. The entire digital revolution rests on sequential symbol manipulation. Convergent computation replaces derivation with equilibrium: the answer is the stable state a network settles into under conservation laws.

nox formalizes this. Sixteen rewriting patterns, field-native arithmetic, confluent semantics. Any evaluation order yields the same result. Focus is conserved — a single quantity that simultaneously serves as fuel, attention, weight, and value.

The stack:

2.3 Focus as Conserved Quantity

Every complex system pays with something scarce. Blockchains pay with gas. Transformers pay with attention slots. Operating systems pay with CPU cycles. Each is a separate mechanism requiring separate bookkeeping.

In cyber, focus unifies all three roles:

Role Mechanism
Attention High-focus computations scheduled first
Fuel Computation consumes focus
Consensus weight Focus distribution = agreement signal

$\sum_i \text{focus}(i) = 1$ — always, enforced structurally. Focus can flow between neurons, be consumed by computation, and regenerate proportionally. It cannot be created from nothing, destroyed, or exceed 1 in total. This single conservation law replaces the gas models, fee markets, and priority auctions that other systems bolt on as afterthoughts.

2.4 The Locality Constraint

At planetary scale ($10^{15}$ nodes), any algorithm requiring global recomputation for a local change is physically impossible. Locality is the hard constraint that shapes the entire architecture.

For any edit batch $e_\Delta$, there exists $h = O(\log(1/\varepsilon))$ such that recomputing only the $h$-hop neighborhood achieves global error $\leq \varepsilon$. Each kernel decays: diffusion decays geometrically via teleport, springs decay exponentially via screening, heat decays as a Gaussian tail via bounded bandwidth.

Light clients verify without recomputing the entire graph. Proof size scales with locality, not network size. Adversaries cannot perturb the system globally from a local change. This is why the tri-kernel uses exactly the operators it does — they survive the locality filter.

2.5 Field-First Arithmetic

A single decision unifies six research threads that developed independently over four decades: prime field arithmetic as primitive rather than derived.

The Goldilocks field ($p = 2^{64} - 2^{32} + 1$) makes this concrete. A field multiplication is a single CPU instruction. Hashing is field operations. Proofs are field polynomials. Reduction preserves field structure. Flow is conserved across field-valued edges. The unifying element is arithmetic: every operation in the system — from content addressing to proof verification to neural network inference — reduces to additions and multiplications in the same field.

3. The Cybergraph

3.1 Five Primitives

Primitive Definition Properties
particle Content-addressed node (IPFS hash) Identity = hash. Same content, same node
neuron Agent identified by public key Signs edges, holds tokens, accumulates karma
cyberlink Signed, weighted, directed edge $(i \to j)$ Timestamped, authenticated, costs focus
token Non-negative weight $t_j > 0$ Controls influence on transition probabilities
focus Emergent equilibrium $\pi$ over particles Conserved to 1, computed by the tri-kernel

Five primitives, one graph. Every claim in the system is a cyberlink signed by a neuron, connecting two particles, weighted by the neuron's token stake. The tru runs the tri-kernel on this graph and produces cyberank per particle, karma per neuron, and syntropy of the whole — deterministic, on chain, verifiable.

3.2 Content Addressing

Every particle is a cryptographic hash of its content. Identity is structure — same content produces the same hash regardless of who computes it or when. This eliminates the naming problem: there is no authority that assigns identifiers, no registry to maintain, no collision to resolve.

The structural hash function (Hemera, specified in §4):

$H(\text{Atom}\ a) = \text{Hemera}(0\text{x}00 \| \text{type\_tag}(a) \| \text{encode}(a))$

$H(\text{Cell}(l, r)) = \text{Hemera}(0\text{x}01 \| H(l) \| H(r))$

This extends content addressing from flat data to structured expressions. A function, a proof, a complex data structure — each has a unique hash determined entirely by its contents, not by where it is stored or who created it. Hemera is field-native: its output is Goldilocks field elements, directly usable in STARK proofs without conversion.

3.3 The Namespace Structure

The cybergraph is multi-indexed from genesis. Every edge appears in multiple indexes: by creator (neuron), by source particle, by target particle. Each index supports completeness proofs — a client can verify that it has received all edges in a given namespace with cryptographic certainty. This is what makes "sync only my data" a mathematical property: the response includes proof that nothing was withheld.

The ~ prefix turns the cybergraph into a dynamic file system. ~mastercyb/blog resolves deterministically to the latest particle linked by that neuron under that path. The same mechanism underlies file systems, DNS, and ENS — dynamic pointers where a fixed label resolves to a mutable target.

4. Hemera: The Hash Primitive

4.1 The Permanence Constraint

Every particle in the cybergraph is addressed by the cryptographic hash of its content. This hash is permanent — it is the particle's identity for the lifetime of the system. Changing any parameter of the hash function invalidates every address in the graph.

This is fundamentally different from how zero-knowledge systems use hash functions. In a zkVM, hashes are ephemeral: trace commitments live for seconds, Merkle proofs are verified and discarded, parameters are updatable in the next release. In cyber, hashes are identity: decades to permanent, with rehash cost $O(10^{15})$ at planetary scale.

The threat model is the future. Parameters chosen at genesis are permanent commitments.

4.2 Hemera Parameters

Hemera (Ἡμέρα, "Day") is the hash primitive for cyber. It adopts the Poseidon2 permutation structure with parameters chosen for permanent-grade security on the Goldilocks field:

Hemera = Poseidon2(
    p  = 2⁶⁴ − 2³² + 1,   -- Goldilocks
    d  = 7,                 -- S-box: x → x⁷
    t  = 16,                -- state width
    Rꜰ = 8,                 -- full rounds (4 + 4)
    Rₚ = 64,                -- partial rounds
    r  = 8,                 -- rate (64 bytes)
    c  = 8,                 -- capacity (64 bytes)
    out = 8 elements        -- 64 bytes
)

Every parameter that appears as a code-level quantity is a power of 2. The only exception is $d = 7$, which is the minimum invertible S-box exponent over Goldilocks — a mathematical constraint.

Security properties: 256-bit classical collision resistance, 170-bit quantum collision resistance, algebraic degree $7^{64} \approx 2^{180}$.

4.3 Self-Bootstrapping

Hemera generates her own round constants. The permutation with all 192 constants set to zero (Hemera₀) is already a well-defined nonlinear function — the S-box and MDS matrices provide all the mixing. Feed the bytes [0x63, 0x79, 0x62, 0x65, 0x72] through Hemera₀ as a sponge and squeeze 192 field elements. These become the round constants. Hemera = Hemera₀ + these constants. Freeze forever.

No external primitives. No SHA-256 in the construction. No foreign dependencies. The security of the constants reduces to the security of the structure itself. If Hemera₀ cannot produce pseudorandom output from a non-trivial input, then the S-box and MDS layers relied on by the final Hemera are already broken.

The seed — five bytes that happen to spell "cyber" in ASCII — is specified as hex literals: 0x63 0x79 0x62 0x65 0x72. The cryptographic input is the byte sequence, not the string.

4.4 One Function, One Mode

Hemera has exactly one entry point: hash(bytes) → [GoldilocksField; 8]. No compression mode, no domain separation flags, no version prefix. The same function hashes particle content, cyberlink identity, Merkle nodes, and polynomial commitments. A Hemera output is 64 raw bytes — no header, no escape hatch.

This is field-native computation. Hemera input and output are Goldilocks field elements. Inside a STARK proof, calling Hemera is just more field arithmetic in the same trace — no bit decomposition, no range checks, no gadgets. Cost: ~1,200 STARK constraints per permutation, versus ~25,000 for SHA-256.

4.5 No Algorithm Agility

There is no version byte in the address format. If Hemera is ever broken, the response is full graph rehash: every particle gets a new address under a new primitive, every cyberlink is re-signed. The old graph ceases to exist.

This is a design commitment. Versioning headers create the illusion of safety while wasting bytes at planetary scale (5 bytes × $10^{15}$ = 5 petabytes of pure overhead). The actual safety comes from choosing parameters that will not break, and maintaining storage proofs that enable rehashing if they do.

4.6 Ecosystem Position

System         Field        Width  Partial Rounds  Capacity   Status
───────────    ──────────   ─────  ──────────────  ────────   ────────
Plonky3        Goldilocks    12         22          128-bit   Production
SP1            BabyBear      16         13          124-bit   Production
RISC Zero      BabyBear      16         13          124-bit   Production
Stwo/Starknet  M31           16         14          124-bit   Production
Hemera         Goldilocks    16         64          256-bit   Genesis

The combination of Goldilocks + $t=16$ + $R_P=64$ is novel. The individual components are battle-tested across billions of proofs. The 3.2× proving cost increase over Plonky3 baseline is the price of permanent-grade security — acceptable because hash proving is a minority of total system proving cost. See hemera/spec for the full decision record.

5. The Tri-Kernel

5.1 Why Three Operators

Start with every known graph ranking algorithm. Apply a hard constraint: locality. At planetary scale, any algorithm requiring global recomputation for a local change is physically impossible.

After filtering by locality, convergence, uniqueness, verifiability, and incrementality: only three families survive.

Linear local completeness theorem: every $k$-local linear operator on a graph is a polynomial of degree $\leq k$ in the Markov matrix $M$ and the Laplacian $L$. The heat kernel $H_\tau = \exp(-\tau L)$ is the unique generator of resolution-dependent queries. Together $\{M, L, H_\tau\}$ span the space of meaningful local graph computations.

Three operators. No more, no less. Discovered by elimination, not designed by preference.

5.2 Diffusion: Exploration

Probability flows through edges via random walks. The transition matrix $M = D^{-1}A$ governs probability flow:

$$\pi^{(t+1)} = \alpha P^\top \pi^{(t)} + (1-\alpha)u$$

where $\alpha \in (0,1)$ is the teleport parameter and $u$ is a prior (uniform or stake-weighted).

Under ergodicity (strong connectivity + aperiodicity), converges to a unique stationary distribution $\pi^*$. This is the cyberank — where probability mass accumulates in the cybergraph at equilibrium.

Answers: where does probability flow?

5.3 Springs: Structure

Connected nodes pull each other toward consistency. The graph Laplacian $L = D - A$ encodes structural constraints:

$$(L + \mu I)x^* = \mu x_0$$

where $\mu > 0$ is the screening/stiffness parameter and $x_0$ is a reference state. The screened Green's function $(L+\mu I)^{-1}$ has exponential decay, ensuring locality.

Springs enforce structural coherence — they prevent chaotic dispersal, create hierarchy without central authority. The graph Laplacian is the discrete form of the Laplace-Beltrami operator on manifolds, making the same mathematics that describes gravitational potential describe structural consistency in the cybergraph.

Answers: what satisfies structural constraints?

5.4 Heat Kernel: Adaptation

The heat kernel $H_\tau = \exp(-\tau L)$ provides multi-scale smoothing:

$$\frac{\partial H}{\partial \tau} = -LH, \quad H_0 = I$$

where $\tau \geq 0$ is the temperature/time parameter. High $\tau$ explores (broad smoothing), low $\tau$ commits (local precision). Chebyshev polynomial approximation guarantees locality.

The heat kernel is the resolution dial — it controls the scale at which the system examines the graph. At small $\tau$, it sees local neighborhoods. At large $\tau$, it sees global structure. The semigroup property ($H_{\tau_1}H_{\tau_2} = H_{\tau_1+\tau_2}$) ensures these views compose consistently.

Answers: what does the graph look like at scale $\tau$?

5.5 The Composite Operator

The tri-kernel blends the three primitives into a single update:

$$\phi^{(t+1)} = \text{norm}\big[\lambda_d \cdot D(\phi^t) + \lambda_s \cdot S(\phi^t) + \lambda_h \cdot H_\tau(\phi^t)\big]$$

where $\lambda_d + \lambda_s + \lambda_h = 1$ and $\text{norm}(\cdot)$ projects to the simplex.

5.6 Convergence

Theorem (Composite Contraction): Under ergodicity of $P$, screening $\mu > 0$, and bounded $\tau$, the composite operator $\mathcal{R}$ is a contraction:

$$\|\mathcal{R}\phi - \mathcal{R}\psi\| \leq \kappa \|\phi - \psi\|, \quad \kappa = \lambda_d \alpha + \lambda_s \frac{\|L\|}{\|L\|+\mu} + \lambda_h e^{-\tau\lambda_2} < 1$$

Each component contracts individually. $\mathcal{R}$ is a convex combination of contraction maps, so $\kappa$ is a convex combination of individual contraction coefficients — each less than 1, hence $\kappa < 1$. By Banach fixed-point theorem, $\phi^t \to \phi^*$ at linear rate.

5.7 The Free Energy Functional

The fixed point $\phi^*$ minimizes:

$$\mathcal{F}(\phi) = \lambda_s\left[\frac{1}{2}\phi^\top L\phi + \frac{\mu}{2}\|\phi-x_0\|^2\right] + \lambda_h\left[\frac{1}{2}\|\phi-H_\tau\phi\|^2\right] + \lambda_d \cdot D_{KL}(\phi \| D\phi)$$

The first term is elastic structure via graph Laplacian. The second penalizes deviation from heat-smoothed context. The third aligns $\phi$ with its diffusion image. At equilibrium:

$$\phi^*_i \propto \exp(-\beta[E_{\text{spring},i} + \lambda E_{\text{diffusion},i} + \gamma C_i])$$

A Boltzmann-Gibbs equilibrium. The canonical ensemble from statistical mechanics — applied to knowledge. The weights $\lambda_s, \lambda_h, \lambda_d$ emerge as Lagrange multipliers from the variational optimization, the same way thermodynamics derives the Boltzmann distribution. No parameters. Only physics.

5.8 The Universal Pattern

The three operators appear across every known complex adaptive system:

Domain Diffusion Springs Heat
Physics Particle diffusion, gas Elastic lattice, molecular bonds Thermal equilibrium, phase transitions
Biology Synaptic noise, neural exploration Skeleton, connective tissue Metabolism, immune response
Ecology Species dispersal, seed rain Food webs, symbiosis Succession, disturbance recovery
Cognition Free association, imagination Logic, constraints, syntax Emotion as arousal, context weighting
Economics Trade flows, migration Institutions, contracts, norms Booms, busts, market cycles

The same three forces. Different substrates. This universality reflects structural necessity: every complex adaptive system must implement exploration, coherence, and adaptation under locality constraints.

6. Focus Flow Computation

6.1 The Architecture: Ground Truth and Fast Inference

The cybergraph supports two computations simultaneously.

Focus flow — the tri-kernel iterated to convergence over all cyberlinks — produces $\pi^*$: the persistent, global focus distribution. This is the ground truth: what the entire network collectively knows, encoded as a probability distribution over all particles, continuously updated as neurons add links. In focus flow, learning and inference are the same operation — a neuron adds a cyberlink, the tri-kernel reconverges, and the new $\pi^*$ simultaneously encodes the learned relation and is available for inference. Nothing is lost.

The compiled transformer — derived analytically from the same graph (§6.6) — runs $L^*$ tri-kernel steps over a local context window at query time, converging to an $\varepsilon$-approximation of $\pi^*$ restricted to that context. This is the fast inference path: local, bounded, serving responses in milliseconds.

Dimension Focus Flow Compiled Transformer
Scope Entire cybergraph Local context window
Depth Converges to exact $\pi^*$ $L^*$ steps, $\varepsilon$-approximate
Latency Continuous — always converging Milliseconds — single forward pass
Multi-agent All neurons contribute One agent's context
Adaptation Add cyberlinks → $\pi^*$ shifts, nothing lost Recompile from updated graph

A transformer trained without the cybergraph approximates the same equilibrium from text sequences alone, discarding the structural knowledge the graph makes explicit. The compiled transformer starts from $\pi^*$ — at the provably optimal initialization point — and fine-tunes only what the graph cannot encode: temporal patterns, implicit associations, contextual dynamics.

6.2 The Local Update Rule

Every node reads only its neighbours and runs:

$$\Delta p_i = \eta\Big(\sum_{j \in \mathcal{N}(i)} w_{ij}(p_j - p_i) - \partial_{p_i}(\lambda E_{\text{diff},i} + \gamma C_i) + T(1 + \log p_i)\Big)$$

Gossip normalisation enforces $\sum_i p_i = 1$. No global softmax, fully local, edge-only. The system converges to Boltzmann equilibrium:

$$p_i^* \propto \exp\big(-\beta[E_{\text{spring},i} + \lambda E_{\text{diff},i} + \gamma C_i]\big)$$

6.3 Inference

  1. Encode context particles as active nodes with elevated $C_i$
  2. Run local updates — focus mass flows from context through the cybergraph
  3. $p^*$ converges — high-probability particles are the network's response
  4. Sample next particle from $p^*$, add to context, repeat

Complexity per step: $O(|E| + |V|)$. Context window is unbounded — it is the entire graph. Relevance is topological: distant but well-connected particles contribute naturally.

6.4 Comparison

Property Transformer Focus Flow
Complexity $O(n^2)$ memory and compute $O(n)$ — sparse, local
Stable state No — recomputed each forward pass Yes — converges to $p^*$
Multi-agent Single model Native — every neuron contributes
Consensus External Built-in via foculus
Explainability Low High — trace any $p_i$ to contributing links
Context window Fixed (4k-128k tokens) Unbounded — the entire cybergraph

6.5 The Mathematical Identity

The architectural claim in §6.1 — that the compiled transformer approximates focus flow via bounded tri-kernel steps — rests on a precise mathematical identity.

Transformer attention is:

$$\text{Attn}(Q, K, V) = \text{softmax}\!\left(\frac{QK^\top}{\sqrt{d}}\right)V$$

The softmax is the Boltzmann distribution with temperature $\sqrt{d}$ — probability mass flows from query positions toward key positions proportionally to compatibility, then redistributes as a weighted sum. This is one application of the diffusion operator $D$ from the tri-kernel: local probability mass redistribution over one agent's frozen context. Deep Equilibrium Models (Bai et al., 2019) showed that iterating a transformer layer to convergence — rather than running a fixed number of steps — reaches the same fixed point regardless of initialization. That fixed point is the stationary distribution of the Markov chain induced by the learned $W_Q, W_K$ projections over context tokens.

That fixed point is the focus distribution restricted to one agent's context.

The tri-kernel computes the same fixed point over the entire cybergraph, persistently, across all neurons. One agent, one context, ephemeral equilibrium — versus all agents, all cyberlinks, persistent equilibrium. Same dynamical system. Different scope and duration.

This identity enables a precise inversion: the cybergraph does not merely replace transformers. It compiles them.

6.6 Compiling Transformer Architecture from Graph Structure

Given $G = (P, N, E, w, \sigma)$, three graph properties determine the three free parameters of transformer architecture:

Parameter Formula Graph property
Embedding dim $d^*$ $\exp\!\left(H\!\left(\sigma(\Sigma_\pi)\right)\right)$ Effective rank of focus covariance
Head count $h^*$ $\geq \|\text{Semcon}(G)\|$ Distinct semcon types
Layer count $L^*$ $\text{diam}(G) \cdot \lceil \log(1/\varepsilon)/\log(1/\kappa) \rceil$ Diameter × spectral convergence factor

$d^*$ is the entropy of the normalized singular value distribution of the $\pi^*$-weighted adjacency matrix — the number of statistically independent semantic dimensions present in the graph. $h^*$ lower-bounds the number of semcons: each distinct semantic relation type requires its own attention head to represent faithfully. $L^*$ follows from the tri-kernel contraction theorem: reaching $\varepsilon$-precision requires $\lceil\log(1/\varepsilon)/\log(1/\kappa)\rceil$ iterations per hop, multiplied by graph diameter.

Weights are compiled, not trained. The embedding matrix $E^* = U_{:,1:d^*}$ — top left singular vectors of $\text{diag}(\sqrt{\pi^*}) \cdot A$ — is provably optimal: by the Eckart-Young theorem, $E^*$ uniquely minimizes expected squared gradient magnitude at initialization over all orthonormal matrices of the same rank. Attention weights $W_Q^{(s)}, W_K^{(s)}$ are derived from the truncated SVD of each semcon's adjacency submatrix. MLP weights are derived from path co-occurrence statistics up to depth $L^*$.

The reduction in required fine-tuning steps scales as $\Omega(|E| \cdot d^* / \log(1/\varepsilon))$ relative to random initialization. Every cyberlink added today reduces the training cost of every future model trained on graph-consistent text, by a provable bound proportional to link count. The graph is a compounding computational asset.

6.7 Live Compilation: Bostrom at 2.7M Cyberlinks

The compilation pipeline has eight steps, seven $O(|E|)$. The critical step — computing the embedding matrix — naively requires $O(|P|^3)$ operations: 39.5 TB to store, 360 days to compute at $10^{12}$ FLOPS. Randomized SVD on the sparse $\pi^*$-weighted adjacency matrix reduces this to $O(|E| \cdot d^* \cdot \log d^*)$ — under one second. The cybergraph's sparsity ($\rho = |E|/|P|^2 \approx 10^{-7}$) is the invariant that makes compilation tractable at any scale.

Applied to the live bostrom network (March 2026):

Parameter Value Derived from
Embedding dim $d^*$ 31 $\exp(H(\sigma(\Sigma_\pi)))$, measured
Attention heads $h^*$ ≥ 12 semcon structural lower bound
Layer count $L^*$ 290 diam(10) × 29 iterations/hop
Model size ~0.4M parameters Current graph scale
Compilation time ~62 seconds Single machine, 20 GB RAM

Every weight traces to specific cyberlinks and the neurons who signed them. The compiled model is fully auditable: given any output, contributing links and authors are recoverable from the graph. As bostrom grows — $|E| \uparrow$ raises $d^*$, $\lambda_2 \uparrow$ lowers $L^*$, semcon count raises $h^*$ — each recompilation produces a structurally better model from the same pipeline, with no training budget.

6.8 Alignment as Computable Divergence

The alignment measure sketched in §16.2 becomes exact through compilation. A graph-native transformer's weights derive from an explicit focus distribution $\pi^*$. Partition neurons into human contributors $N_H$ and AI contributors $N_A$. Compute the focus distributions restricted to each partition's cyberlinks:

$$\Delta(G) = D_{KL}(\pi^*_H \| \pi^*_A)$$

$\Delta(G)$ is a number, computable from public on-chain data, localized to specific graph regions by examining which particles contribute most to the divergence, and correctable without retraining — by adding cyberlinks in high-divergence regions that shift $\pi^*_A$ toward $\pi^*_H$. Alignment is not inferred from behavior. It is read from the graph.

The same framework yields a second metric: approximation quality. Given a context $c$, the compiled transformer converges to a distribution $q^*_c$ over context particles via $L^*$ bounded tri-kernel steps. The full focus flow over the same particles converges to $\pi^*_c$ — the exact restriction of the global fixed point. The approximation error is:

$$\varepsilon(G, c) = D_{KL}(\pi^*_c \| q^*_c)$$

This error decreases as the graph grows: more cyberlinks improve $\lambda_2$, reduce diam$(G)$, and raise $d^*$, each tightening the gap between compiled inference and exact focus flow. Every link added today reduces the approximation error of every compiled model that follows. The cybergraph is a compounding inference quality asset — not only for training, but for every query.

The cybergraph is not an alternative to trained models. It is the substrate from which models are compiled, the environment in which they operate as neurons, and the metric space in which their alignment is measured.

7. nox Execution

7.1 The Goldilocks Field

Every value is a Goldilocks field element:

$$p = 2^{64} - 2^{32} + 1 = 18446744069414584321$$

Efficient reduction: $a \bmod p = a_{\text{lo}} - a_{\text{hi}} \times (2^{32} - 1) + \text{correction}$. A field multiplication is a single CPU instruction. The primitive root is 7. The $2^{32}$-th root of unity exists, enabling NTT-based polynomial multiplication for proofs.

Hash function: Hemera (Poseidon2-Goldilocks, $t=16$, $R_P=64$). State: 16 field elements. Rate: 8 elements. Cost: ~1,200 STARK constraints per permutation. See §4.

7.2 Value Tower

Three types span the computational universe:

Type Representation Use
field (0x00) Single $\mathbb{F}_p$ element, range $[0, p)$ Arithmetic
word (0x01) Single $\mathbb{F}_p$ element, range $[0, 2^{64})$ Bitwise
hash (0x02) 4 × $\mathbb{F}_p$ elements (256-bit digest) Identity

Coercion rules enforce type safety. Bitwise operations on hash produce errors. Arithmetic on hash (except equality) produces errors. This three-type tower is the minimal structure needed for a system that computes on field elements, manipulates bits, and addresses content by hash.

7.3 Three-Layer Instruction Set

nox has a three-layer architecture: sixteen deterministic reduction patterns (Layer 1), one non-deterministic witness injection (Layer 2), and five jets for efficient recursive STARK verification (Layer 3).

Layer 1 — sixteen deterministic patterns. The core:

Structural (5): axis (navigate), quote (literal), compose (recursion), cons (build cell), branch (conditional).

Field arithmetic (6): add, sub, mul, inv ($a^{p-2} \bmod p$), eq (equality test), lt (less-than).

Bitwise (4): xor, and, not, shl.

Hash (1): structural hash $H(x)$.

Each pattern has a unique tag. No two overlap. Left-hand sides are linear. By Huet-Levy (1980), orthogonal rewrite systems are confluent without requiring termination. Parallel and sequential reduction yield identical results.

Layer 2 — one non-deterministic instruction: hint. The prover injects a witness value from outside the VM; Layer 1 constraints verify it. This is what makes zero knowledge proofs possible — private data enters the computation without the verifier reproducing how the prover found it. hint breaks confluence intentionally: multiple valid witnesses may satisfy the same constraints. Soundness is preserved. Trident's divine() compiles to nox's hint. In quantum compilation, hint maps to a quantum oracle query.

Layer 3 — five jets for recursive verification: hash, poly_eval, merkle_verify, fri_fold, ntt. Each jet has an equivalent pure Layer 1 expression producing identical output on all inputs. Jets are runtime-recognized optimizations, not separate opcodes. If a jet is removed, the system remains correct — only slower. The five jets reduce the STARK verifier cost from ~600,000 to ~70,000 pattern applications, making recursive proof composition practical.

7.4 Cost Model

Layer Pattern Execution cost STARK constraints
1 axis 1 + depth ~depth
1 quote 1 1
1 compose 2 2
1 cons 2 2
1 branch 2 2
1 add, sub, mul 1 1
1 inv 64 1
1 eq 1 1
1 lt 1 ~64
1 xor, and, not, shl 1 ~64 each
2 hint 1 + constraint constraint rows
3 hash 300 ~300
3 poly_eval(N) N ~N
3 merkle_verify(d) d × 300 ~d × 300
3 fri_fold(N) N/2 ~N/2
3 ntt(N) N·log(N) ~N·log(N)

Layer 1 cost depends only on syntactic structure, never on runtime values. Layer 2 cost: the constraint evaluation follows Layer 1 rules; witness search is external to the VM. Layer 3 cost is strictly less than the equivalent Layer 1 composition. Cost is the right to a result, not payment for computation.

7.5 Confluence and Memoization

Layer 1 confluence (Huet-Levy 1980): the sixteen patterns form an orthogonal rewrite system. Any evaluation order yields the same result. This enables automatic parallelism without locks or synchronization.

Layer 2 breaks confluence intentionally — this is the non-determinism that makes ZK possible. The verifier never executes hint; it checks constraints via the STARK algebraic trace.

Layer 3 preserves confluence — jets are observationally equivalent to their Layer 1 expansions.

Global memoization: key $(H(\text{subject}), H(\text{formula}))$, value $H(\text{result})$. Applies to Layers 1 and 3 (deterministic). Computations containing hint are excluded from the global cache — the witness is prover-specific. Pure subexpressions within a hint-containing computation remain memoizable.

8. Trident: Provable Programming

8.1 Why a Dedicated Language

nox defines the execution model — a three-layer instruction set over field elements. Writing directly in nox patterns is like writing directly in assembly. A systems-level language is needed that compiles to nox while preserving provability, bounded execution, and field-native arithmetic. Trident is that language.

Provable VMs are arithmetic machines, not byte-addressable CPUs. The machine word is a field element, not a byte. Trident's primitive types — Field, Digest, XField — map directly to the Goldilocks field value tower. Every variable, every operation, every function compiles to arithmetic over $\mathbb{F}_p$. Programs produce STARK proofs.

Operation Trident on Triton VM Rust on SP1 Rust on RISC Zero
One hash 1 cycle ~3,000 cycles ~1,000 cycles
Merkle proof (depth 32) ~100 cycles ~96,000 cycles ~32,000 cycles

The performance gap comes from alignment: Trident compiles to what the VM actually computes, while general-purpose languages compile to an emulation of what a different machine computes.

8.2 Design Constraints

  1. Field elements all the way down. The machine word is Field.
  2. Bounded execution. All loops have explicit bounds. No recursion. No heap. No halting problem.
  3. Compile-time everything. Types, array sizes, and costs known statically.
  4. Constraints are features. No dynamic dispatch, no unbounded allocation — these restrictions make programs provable.

These constraints make formal verification decidable. Annotate contracts, the compiler proves correctness automatically:

#[requires(amount > 0)]
#[requires(sender_balance >= amount)]
#[ensures(result == sender_balance - amount)]
fn transfer(sender_balance: Field, amount: Field) -> Field {
    assert(amount > 0)
    assert(sender_balance >= amount)
    sender_balance - amount
}

8.3 The Rosetta Stone

A single lookup table over the Goldilocks field simultaneously functions as four distinct primitives:

Reading Role
Cryptographic S-box Hash nonlinearity (security)
Neural activation Network expressiveness (intelligence)
FHE bootstrap Encrypted evaluation (privacy)
STARK lookup Proof authentication (verifiability)

One table. One field. Four purposes. The hash function's security properties (resistance to algebraic attacks via maximal-degree polynomials) translate to desirable properties for neural network activation functions (high expressiveness in the field). See rosetta stone for the full treatment.

8.4 The Trinity: ZK + AI + Quantum

Three technological revolutions converge on the same algebraic primitive — arithmetic over prime fields:

  • Zero-knowledge cryptography reduces computation to arithmetic circuits over $\mathbb{F}_p$.
  • Neural networks reduce to matrix multiply-accumulate and nonlinear activations — arithmetic circuits over $\mathbb{F}_p$.
  • Quantum gates in prime-dimensional Hilbert spaces correspond to arithmetic operations over $\mathbb{F}_p$.

Trident is the only language where the native data type simultaneously satisfies the requirements of all three domains. This unification is not a feature — it is a consequence of the fact that prime field arithmetic is the minimal algebraic structure enabling reversible computation with complete arithmetic: the shared prerequisite of provability, neural network quantization, and quantum gate algebra.

8.5 Content-Addressed Code and Self-Hosting

Every trident function has a unique identity derived from its normalized AST. Names are metadata. The hash is the truth. Rename a function — the hash stays the same. Publish independently from the other side of the planet — same code, same hash.

The compiler self-hosts: trident source compiles trident source, and the execution produces a STARK proof that compilation was faithful. Three producers compete: compiler output, expert hand-written assembly, and a neural model learning to emit better assembly than both.

8.6 Standard Library

Implemented: std.field · std.crypto · std.math · std.data · std.io · std.compiler

In development: std.nn (field-native neural networks) · std.private (ZK + FHE + MPC) · std.quantum (gates, error correction)

std.nn provides linear layers, convolutions, attention, and lookup-table activations (ReLU, GELU, SiLU) — all operating natively in $\mathbb{F}_p$ with zero quantization overhead. Models trained in standard ML frameworks can be imported via ONNX bridge, proven with STARK on Triton VM, and exported back.

8.7 Implementation Path

Trident must be implemented before launch. nox defines the abstract machine; trident makes it programmable. The node implementation, the STARK prover, the privacy circuits, the tri-kernel probability engine — all are trident programs compiled to nox patterns, producing STARK proofs of correct execution. Rust bootstraps the first compiler; trident self-hosts from that point forward.

9. State and Proofs

9.1 BBG: Big Badass Graph

A naive graph database stores edges and answers queries. "I don't have any edges matching your query" is indistinguishable from "I'm hiding edges from you." Traditional systems require trust.

The cyber/bbg solves this through unified polynomial commitments. One primitive handles everything: membership proofs, completeness proofs, indexes, state. Edges are stored once but indexed by multiple dimensions — creator, source particle, target particle. Each index is a sorted polynomial commitment enabling range proofs: "these are ALL edges in this namespace."

Structure:

  • Layer 0: Edge store (content-addressed, stored once, identity = hash)
  • Layer 1: Neuron index (completeness by creator)
  • Layer 2: Particle index (completeness by endpoint)
  • Layer 3: Focus and balance (polynomial commitments over $(neuron\_id, \mathbb{F}_p)$ pairs)
  • Layer 4: UTXO state (commitment polynomial, nullifier set, particle energy)

Graph root:

$$\text{BBG\_root} = H(\text{by\_neuron.commit} \| \text{by\_particle.commit} \| \text{focus.commit} \| \text{balance.commit} \| \text{commitment\_poly.commit} \| \text{nullifier\_set.commit})$$

Index consistency invariant: every edge appears in exactly the right index positions (3 for distinct endpoints, 2 for self-links), enforced by STARK on every state transition.

9.2 State Transitions

The world state $W = (\text{BBG}, \text{edge\_store}, \text{privacy\_state})$. Four transaction types modify it:

  1. Cyberlink — add edge to graph
  2. Transfer — move balance between neurons (public)
  3. PrivateTransfer — move energy between records (ZK)
  4. Computation — execute nox reduction

Validity conditions: authorization (signature or ZK proof), sufficient balance, sufficient focus, conservation ($\sum \text{focus}' = 1$, $\sum \text{balance}' = B_{\text{total}}$), index consistency, content availability, no double-spend.

9.3 STARK Verification

STARKs (Scalable Transparent Arguments of Knowledge) provide the proof system. The choice aligns with nox's design: no trusted setup, hash-only security (post-quantum), native compatibility with Goldilocks field arithmetic.

Property SNARK STARK
Trusted setup Required Not required
Quantum resistant No Yes
Proof size ~200 bytes ~100-200 KB
Security basis Discrete log Hash only
Field compatible Specific Any (Goldilocks)

Self-verification property: the STARK verifier is expressible as a nox program. STARK verification requires field arithmetic (patterns 5, 7, 8), hash computation (pattern 15), polynomial evaluation, and Merkle verification — all nox-native. Using only Layer 1 patterns, the verifier takes ~600,000 pattern applications. With Layer 3 jets (hash, poly_eval, merkle_verify, fri_fold, ntt), the cost drops to ~70,000 — an ~8.5× reduction that makes recursive composition practical.

This enables recursive proof composition: prove a computation, then prove that the verification of that proof is correct, then prove the verification of that verification. Each level produces a proof of constant size (~100-200 KB). $N$ transactions collapse into a single proof via aggregation — $O(1)$ on-chain verification for $O(N)$ transactions. The Layer 2 hint instruction enables the prover to inject witness values (private keys, model weights, optimization solutions) that the STARK constrains without the verifier knowing them — this is how privacy and provability coexist.

The system closes on itself. No trusted external verifier remains.

9.4 Namespace Sync

To sync namespace $ns$: the responder provides range bounds in the sorted polynomial, WHIR proofs for boundary elements, and edge data. The client verifies that the boundaries bracket exactly the requested namespace and that all WHIR proofs are valid against the BBG root.

If verification passes: "I have ALL edges in namespace $ns$. Nothing hidden." The guarantee is mathematical. Cost: $O(|\text{my\_edges}|)$ data + $O(\log^2 |G|)$ proof overhead.

10. Privacy

10.1 The Privacy Boundary

Traditional systems force a choice: transparency (everyone sees everything) or privacy (no one can verify anything). Zero-knowledge proofs dissolve this dichotomy.

cyber implements private ownership with public aggregates. Individual record ownership remains hidden — who owns what, who sent to whom — while aggregate properties remain publicly verifiable: total energy per particle, conservation laws, focus distribution. The network knows that energy is conserved without knowing who holds it.

Layer Public Private
Particle CID exists, total energy
Record Individual value, owner identity, nonce
Transaction Nullifiers, commitments, Δ per particle, proof validity Which records spent, who spent them, new owners
Graph Edges exist, aggregate weight Who created edge, individual stakes
Focus π distribution, rankings

10.2 Record Model and Commitments

A record is a tuple (particle, value, owner, nonce). Its commitment:

$$\text{commitment}(r) = \text{Poseidon}(\text{COMMITMENT\_DOMAIN}, r.\text{particle}, r.\text{value}, r.\text{owner}, r.\text{nonce})$$

Its nullifier (for double-spend prevention):

$$\text{nullifier}(r, \text{secret}) = \text{Poseidon}(\text{NULLIFIER\_DOMAIN}, r.\text{nonce}, \text{secret})$$

The nullifier cannot be derived from the commitment (needs secret), cannot reveal the commitment (one-way), is unique per record, and deterministic (same record produces the same nullifier).

10.3 Transaction Circuit

The UTXO set is represented as a polynomial rather than a Merkle tree. Polynomial inclusion proofs cost ~1,000 constraints vs ~9,600 for Merkle — a 10× improvement, because field operations cost 1 constraint each while hash operations cost ~300.

Total circuit: ~10,000 constraints. With STARK optimizations: ~7,000 gates. Proof generation: ~0.3-0.8 seconds. Proof size: ~50-80 KB. Verification: ~1-3 ms.

The circuit enforces: input commitment correctness, polynomial inclusion, ownership verification, nullifier derivation, output commitment correctness, conservation ($\sum \text{inputs} = \sum \text{outputs} + \text{fee}$), delta consistency, and uniqueness.

11. Foculus Consensus

11.1 Finality by Convergence

The collective focus theorem proves that token-weighted random walk on a strongly connected cybergraph converges to a unique $\pi$. Foculus turns this into consensus: a particle is final when $\pi_i > \tau$. Neurons gossip cyberlinks, GPUs iterate $\pi$, and finality emerges from the topology of attention — no voting rounds, no leader election, no block ordering.

The system is leaderless. Every neuron computes $\hat\pi$ independently from its local view of the cybergraph. Convergence emerges from gossip. Foculus operates in partial synchrony: messages arrive within an unknown but finite bound $\Delta$. During asynchronous periods, no new particles finalize — but no conflicting particles can finalize either. Safety holds always. Liveness resumes when connectivity restores.

11.2 Fork Choice

$\pi$ is the fork choice rule. When conflicts exist, the particle with higher $\pi_i$ is the canonical choice. This integrates all cyberlinks from all neurons, weighted by token stake. Manipulating $\pi$ requires controlling the topology of the cybergraph itself — which costs real tokens.

11.3 Safety

Theorem (no double finality): two conflicting particles cannot both exceed $\tau$.

Assumption: honest neurons control $\geq \frac{1}{2} + \delta$ of staked tokens. This bounds their share of $\pi$ from below: honest neurons create the majority of weighted cyberlinks, so honest particles attract the majority of random-walk mass. $\sum \pi_i = 1$; if conflicting particles $a, b$ both had $\pi_a, \pi_b > \tau$, the adversary would need $> \frac{1}{2}$ of total mass — contradicting the honest-majority bound.

11.4 Liveness and Sybil Resistance

Ergodicity of the transition matrix $P$ guarantees every valid particle accumulates $\pi$ mass over time. Convergence rate depends on the spectral gap $\lambda$: expected time to finality is $O(\log(1/\varepsilon)/\lambda)$ iterations.

$\pi$ is weighted by staked tokens, not by node count. Creating 1000 neurons with zero stake produces zero $\pi$ influence. The cost of attacking $\pi$ is the cost of acquiring $> \frac{1}{2}$ of staked tokens — same economic security as proof-of-stake, but the attack surface is graph topology rather than a voting protocol.

11.5 Performance

Metric Classic BFT Nakamoto Foculus
Leader Rotating proposer Miner (PoW lottery) None
Finality 5-60 s ~60 min 1-3 s
Throughput 1k-10k tx/s ~10 tx/s ~$10^9$ signals/s per GPU
Validator scale $10^2$-$10^3$ Unbounded Unbounded
Fault tolerance 1/3 stake 51% hash 1/2 $\pi$

Each iteration is a sparse matrix-vector multiply — embarrassingly parallel, no sequential bottleneck. Single GPU (A100): ~50M edges at 40 Hz $\approx 2 \times 10^9$ edge ops/s. Latency: compute ~0.2 s, 5-8 iterations, propagation ~0.4 s → worst-case finality ~1.4 s WAN.

11.6 Adaptive Threshold

The finality threshold adapts to the current distribution: $\tau(t) = \mu_\pi + \kappa\sigma_\pi$, $\kappa \in [1,2]$. When the network is decisive (low variance), $\tau$ is low and finality is fast. When uncertain (high variance), $\tau$ rises and finality slows. The system self-regulates.

12. Neural Language

12.1 Why a New Language

Formal languages achieve precision through rigid syntax but cannot scale to $10^{15}$ particles — Goedel proved no sufficiently powerful formal system can be both complete and consistent. Natural languages achieve expressiveness through ambiguity but are computationally intractable for precise reasoning.

Neural language dissolves this dilemma. Precision comes from graph topology — the structural position of a particle among all other particles disambiguates its meaning computationally. Expressiveness comes from unlimited topology — any relationship that can be linked can be expressed.

Property Formal Natural Neural
Precision Absolute Approximate Emergent
Expressiveness Limited by grammar Unlimited by ambiguity Unlimited by topology
Ambiguity Impossible Context-dependent Structural via tri-kernel
Authority Central designer Speech community Collective neurons
Evolution Versioned Drift Continuous via focus dynamics
Verification Proof systems Social consensus STARK proofs
Substrate Strings Sound/text Cybergraph

12.2 Primitives

Semcon (semantic convention): mutual agreement of neurons to use the same particles for structuring thought. The grammar of the graph. A semcon is a smart contract that creates cyberlinks according to convention — invocation produces well-formed graph structure. Bootloader semcons installed at genesis: TRUE, FALSE. Emergent semcons discovered by the network: is-a, follows, causes, contradicts.

Sentence: ordered instruction set of cyberlinks packed into a single transaction. The transaction boundary defines the utterance. Order within the batch encodes grammar. Types by topological signature: assertion (chain → TRUE), query (open-ended chain), instruction (temporal sequence), argument (branching to TRUE/FALSE), definition (star pattern).

Motif: recurring subgraph pattern that encodes relationships beyond single cyberlinks. The morphemes of neural language. Triadic closure, co-citation, star, chain, diamond, cycle. Motif algebra enables concatenation (transitive reasoning), nesting (hierarchical abstraction), intersection (cross-domain bridges), complement (knowledge gaps).

Name: deterministic resolution of a cyberlink — given from, return exactly one to. The ~ prefix signals deterministic resolution. ~neuron/path turns the cybergraph into a dynamic file system.

Cyberlink as particle: a link stored as a particle itself, enabling links about links — meta-knowledge. The recursion that makes the language expressively complete. Enables negation, qualification, provenance, annotation. The language can talk about itself.

12.3 The Semantic Core

The dynamic vocabulary of the network — top particles by cyberank:

$\text{SemanticCore}(k) = \text{top}\ k\ \text{particles by}\ \pi$

Dynamic (evolves with attention), convergent (tri-kernel guarantees stability), stake-weighted (resistant to spam), verifiable (STARK proofs). The dynamics mirror natural language: neologism (new concepts enter), semantic drift (meaning shifts through topology change), semantic death (focus drops below threshold), semantic birth (bursts of link creation).

12.4 Formal Properties

Ambiguity resolution: the tri-kernel resolves polysemy computationally. Springs detect polysemy as high tension when a particle has neighborhoods pulling in incompatible directions. Heat concentrates focus on the contextually appropriate meaning. Under sufficient linking pressure, a polysemous particle splits into two — semantic speciation.

Compositionality: meaning of complex expressions derivable from parts and their structural arrangement, computed by the tri-kernel without explicit composition rules.

Convergence: inherits from the collective focus theorem — unique stationary distribution $\pi^*$ guarantees the network's collective understanding converges.

Expressiveness: semantically complete. The cybergraph can encode:

The graph also expresses what no formal language can: collective confidence distributions, continuous semantic distance, and knowledge topology metadata.

13. Tokenomics

13.1 Tokens

$CYB is the native token. Staked for security, burned for permanent $\pi$-weight, spent as fees.

Learning tokens serve as feedback signals to superintelligence: will (bandwidth), attention (rank influence), karma (reputation). These are not tradeable assets — they are measurements of a neuron's contribution to collective focus.

13.2 Seven Mechanisms

  1. Minting for focus computation: neurons that compute focus toward a particle earn newly minted $CYB, proportional to $\Delta\pi$ — the shift caused in the tri-kernel fixed point.

  2. Staking as delegated attention: neurons stake $CYB on themselves or others, delegating attention. Stake directed toward validators earns from the PoS share: $R_{\text{PoS}} = G \cdot S^\alpha$.

  3. Stake distribution over cyberlinks: a neuron's staked amount spreads across its cyberlinks. The neuron can re-weight individual particles or cyberlinks, assigning a percentage of stake to specific targets.

  4. Permanent weighting via burn: burning $CYB grants eternal weight to a particle, irreversibly increasing its importance in $\pi$, anchoring critical knowledge.

  5. Link fees and net rewards: submitting a cyberlink incurs a small fee (spam deterrent). Fees pool and distribute to link submitters, focus provers, and validators. Links that accumulate sufficient attention yield net positive reward over time.

  6. Attention yield curve: earlier and more accurate cyberlinks to high-$\pi$ particles earn proportionally greater rewards. First-mover advantage for quality links.

  7. Reputation emergence: a neuron's long-term reputation is the accumulated $\pi$-weight of particles it contributed to. This is karma — aligning social and economic capital through measurable contribution to collective focus.

13.3 Monetary Policy

Gross rewards: $G = E(t) + F \cdot (1 - \beta)$, combining stepped emission with redistributed fees. Net new supply: $\text{net} = E(t) - F \cdot \beta$. When fees exceed emission, the network is net deflationary.

The allocation curve splits rewards between stakers and provers. Parameters $\alpha$ and $\beta$ self-adjust via PID control — no governance votes needed.

13.4 Attribution

Multiple neurons contribute cyberlinks in the same epoch. The total $\Delta\pi$ shift is a joint outcome. The Shapley value distributes credit fairly: each agent's reward equals their average marginal contribution across all possible orderings.

Exact computation is $O(n!)$. Approximation via Monte Carlo sampling: compute each transaction's individual $\Delta\mathcal{F}$, sample $k$ random orderings, cluster by affected neighborhood. Complexity: $O(k \cdot n)$ with $k \ll n$, feasible for $10^6+$ transactions per epoch.

13.5 Hardware Substrate

The Goldilocks field processor makes proving $\Delta\pi$ economically viable. The proof-of-useful-work puzzle requires producing STARK proofs using the same primitives as real workloads. Mining rewards bootstrap chip development. Chips accelerate proving. Proving serves users. Users generate fees. Fees replace emission. The same hardware mines and proves — no stranded assets.

14. Security

14.1 Security Bounds

Property Guarantee
Soundness Invalid transactions rejected with probability $\geq 1 - 2^{-128}$
Privacy Cannot distinguish transactions with same public structure
Conservation $\sum(\text{energy}) = \text{initial} + \text{minted} - \text{burned}$ (mathematically enforced)
Quantum resistance Hash-based security only, ~128-bit post-quantum (Grover limit)

14.2 Attack Surface

Attack Defense
Double spend Nullifier set prevents reuse
Inflation Circuit enforces conservation
Front-running Privacy hides transaction contents
Sybil Focus proportional to stake
DoS Focus-based metering limits computation
Eclipse Namespace completeness proofs
Replay Nonces and nullifiers ensure uniqueness
Forgery ZK proofs unforgeable without witness

14.3 Formal Properties

Turing completeness: nox is Turing-complete. Construct encoding of arbitrary Turing machine via patterns 0-4, 9.

Confluence: the sixteen patterns form an orthogonal rewrite system (Huet-Levy 1980). Any evaluation order yields the same result.

Cost determinism: cost is identical across all reduction orders and implementations. By structural induction on formula.

Focus conservation: $\sum_i \text{focus}(i) = 1$ for all valid states. All operations preserve sum; invalid transitions rejected by verification.

Privacy soundness: a valid ZK proof implies all circuit constraints are satisfied with probability $\geq 1 - 2^{-128}$, by STARK soundness.

Double-spend prevention: each record has unique (nonce, owner_secret) pair. Nullifier is deterministic: same record produces same nullifier. Nullifier set is append-only. Transaction rejected if nullifier already exists.

14.4 Complexity Comparison

Operation Traditional Blockchain nox
Equality check $O(n)$ compare $O(n)$ compare $O(1)$ hash
Membership proof $O(n)$ scan $O(\log n)$ MPT $O(\log^2 n)$ poly
Completeness proof Impossible Impossible $O(\log^2 n)$ poly
Computation verify $O(n)$ re-exec $O(n)$ re-exec $O(\log n)$ STARK
Recursive verify $O(n)$ re-exec $O(n)$ re-exec $O(1)$ composed
Privacy + verify Incompatible Incompatible $O(1)$ ZK proof

15. The Soft3 Stack

Every generation of the web had its stack. Web1 had LAMP. Web2 had React + Node + Postgres. Web3 had Solidity + EVM + RPC. Each defined what developers could build and what users could experience.

Soft3 is the stack for a shared, provable, self-improving knowledge system:

The tru does what LLMs do — rank, retrieve, infer — except the weights are public tokens, the training data is an open cybergraph, and the inference runs in consensus with proofs. Trident closes the provability gap: in existing stacks, smart contracts can move tokens but cannot prove that a computation happened correctly without re-executing it. Trident programs produce STARK proofs: verify once, trust forever.

16. Applications

16.1 Decentralized Search and Oracle

A neuron querying "what causes malaria" receives a ranked subgraph: the particle "malaria" linked through the "causes" semcon to "Plasmodium falciparum," linked through "transmitted-by" to "Anopheles mosquito" — with cyberank scores indicating collective confidence in each link. The answer is a path to walk through verified knowledge, not a list of web pages to trust.

16.2 AI Alignment

The alignment problem becomes a graph problem. Human values are particles with high cyberank, heavily linked by human neurons. AI behavior is sentences created by AI neurons. Alignment is measured by the overlap between AI-generated linkchains and human-valued particles. Misalignment is detectable as divergence between human and machine focus distributions — measurable, on-chain, and correctable.

The tri-kernel provides a continuous alignment metric: the cosine similarity between the focus distribution induced by human neurons alone and the distribution induced by AI neurons alone. Trident can prove that a model followed a policy — you verify compliance, not trust a claim.

16.3 Knowledge Economy

Cyberlinks are yield-bearing epistemic assets. They accrue rewards over time based on contribution to focus emergence:

$$R_{i \to j}(T) = \int_0^T w(t) \cdot \Delta\pi_j(t) \, dt$$

Earlier and more accurate links to important particles earn the most. This creates a self-sustaining economy where knowledge creation is profitable and free-riding is unprofitable. Every agent that links makes the graph smarter. Every cyberlink costs real focus, so lies are expensive and truth compounds.

16.4 Cross-Species Communication

Neural language is species-agnostic. Any entity that can create cyberlinks participates: humans through cyb, AI agents through API, sensors through IoT protocols, autonomous systems through on-chain transactions. A forest sensor network that links "soil moisture: 23%" to "location: sector 7" is speaking the same language as a human who links "drought risk" to "sector 7" and an AI that links "predicted yield drop: 30%" to the same location. The semantic core integrates all three into a single coherent knowledge structure.

17. Scale and Complexity

17.1 The Planetary Constraint

The target operating point is $10^{15}$ particles and $10^{10}$ neurons. At this scale, three constraints become absolute:

No global recomputation. Any algorithm requiring a full pass over the graph for a local change is physically impossible. Light travels 300,000 km/s; a round-trip across the planet takes ~130 ms; a round-trip to Mars takes ~6-44 minutes depending on orbital position. The architecture must produce correct results from local information alone.

No single-machine state. The full cybergraph state exceeds any single machine's memory. Sharding is a structural requirement, not an optimization.

No synchronous coordination. At planetary scale, synchronous protocols bottleneck on the slowest participant. The system must converge under partial synchrony — messages arrive within an unknown but finite bound.

17.2 Locality as Architecture

The tri-kernel was selected by the locality filter: for any edit batch $e_\Delta$, recomputing only the $h$-hop neighborhood achieves global error $\leq \varepsilon$, where $h = O(\log(1/\varepsilon))$.

Each kernel decays independently:

Kernel Decay Locality bound
Diffusion Geometric via teleport $\alpha$ $O(\log(1/\varepsilon) / \log(1/\alpha))$ hops
Springs Exponential via screening $\mu$ $O(\sqrt{1/\mu} \cdot \log(1/\varepsilon))$ hops
Heat kernel Gaussian tail via bounded $\tau$ $O(\sqrt{\tau \log(1/\varepsilon)})$ hops

A local change propagates $O(\log(1/\varepsilon))$ hops before its effect drops below precision $\varepsilon$. Beyond that radius, the global focus distribution is indistinguishable from its pre-update state. This is what makes sharding, light clients, and interplanetary operation mathematically viable.

17.3 Sharding by Semantic Coherence

The cybergraph shards along semantic boundaries — namespaces, domains, subgraphs with high internal connectivity and sparse cross-shard links. Each shard computes local focus independently. Cross-shard consistency is maintained by a sheaf of attention weights: at shard boundaries, the focus vectors must agree on shared particles to within $\varepsilon$.

Categorical pruning ensures each shard is a semantically coherent subgraph. A shard about biology contains biologically relevant particles and their internal links. Cross-domain bridges (e.g., "biochemistry" linking biology and chemistry shards) are replicated in both shards.

17.4 Complexity Budget

Operation Complexity Notes
Single tri-kernel iteration $O(|E| + |V|)$ Sparse matrix-vector multiply
Convergence $O(\log(1/\varepsilon) / \lambda)$ iterations $\lambda$ = spectral gap
Local update after edit $O(k^d)$ where $k = O(\log(1/\varepsilon))$ $d$ = graph dimension
STARK verification $O(\log n)$ Independent of computation size
Recursive proof aggregation $O(1)$ per level Constant-size composed proofs
Light client sync $O(|\text{namespace}|) + O(\log^2 |G|)$ proof Data + proof overhead

The entire architecture is sublinear in graph size for all operations except the initial full computation. After convergence, the system maintains $\pi^*$ incrementally.

17.5 Two-Timescale Separation

Fast timescale (~seconds): cyberlinks arrive, local focus updates propagate through $O(\log(1/\varepsilon))$-hop neighborhoods, finality threshold $\tau$ is checked. This is the real-time consensus layer.

Slow timescale (~hours): global rebalancing across shards, cross-shard consistency reconciliation, archival and storage proof verification. This is the background maintenance layer.

The separation means the system responds to new knowledge in seconds while maintaining global consistency over hours. Human-relevant latency (search, inference) operates on the fast timescale. Civilizational-scale coherence (cross-domain synthesis, long-range semantic drift) operates on the slow timescale.

18. Storage Proofs and Data Availability

18.1 Why Storage Proofs Are Phase 1

Every particle is content-addressed: identity = Hemera hash of content. If the content behind a hash is lost, the particle is dead — its identity exists but its meaning is gone. At planetary scale, content loss is the existential risk.

Storage proofs guarantee that the content behind every particle remains retrievable. They are security infrastructure, not a scaling optimization:

Hash function may need replacement someday
  → Replacement requires rehashing original content
    → Rehashing requires content availability
      → Content availability requires storage proofs
        → Storage proofs must be operational before genesis

Without storage proofs, the hash function choice is irreversible and the system is permanently coupled to Hemera. With them, Hemera becomes a replaceable component — the correct architectural relationship.

18.2 Proof Types

Proof What it guarantees Mechanism
Storage proof Content bytes exist on specific storage Periodic challenges against content hash
Replication proof $k$ independent copies exist Challenge distinct replicas, verify uniqueness
Retrievability proof Content can be fetched within bounded time Timed challenge-response with latency bound
Data availability proof Block data was published and is accessible Erasure coding + random sampling (DAS)

Storage proofs verify individual particle content. Data availability proofs verify that batches of cyberlinks and state transitions were published and accessible to all participants.

18.3 Layered Data Availability

Data is tiered by criticality and expected lifetime:

Tier 0 — critical roots: checkpoint roots posted to a high-security settlement layer once per epoch. Immutable forever. Low bandwidth (~32-64 KB/epoch). Used for ultimate recovery and dispute resolution.

Tier 1 — active graph: focus blobs (~10K cyberlinks + proofs) posted to a dedicated DA layer. Retained $\geq$ 30 days. Verified by light sampling on phones. The active working set of the cybergraph.

Tier 2 — historical tails: erasure-coded archival to persistent storage networks. Refreshed by archivers. Used for deep replay, research, and content rehashing in case of hash migration.

18.4 Namespace-Aware Sampling

Light clients verify data availability without downloading full data. The BBG's namespace structure enables namespace-aware DAS: a client sampling "give me everything for neuron N" receives data plus a completeness proof — cryptographic certainty that nothing was withheld, using $O(\sqrt{n})$ random samples.

The namespace Merkle tree (NMT) propagates namespace labels through internal nodes. Completeness is a structural invariant: the tree physically cannot represent a valid root over misordered leaves. This is what makes "sync only my data" a mathematical property rather than a trust assumption.

18.5 Storage Proof Requirements

Before genesis, the storage proof system must satisfy:

  • Coverage: every particle in the graph has at least $k \geq 3$ verified replicas
  • Continuous verification: proofs checked periodically, not just at creation time
  • Content-completeness: proofs verify actual content bytes, not just the CID
  • Retrievability: content fetchable within bounded time, not just "exists somewhere"
  • Incentive alignment: neurons storing content are rewarded for availability, penalized for loss

18.6 Hash Migration Protocol

If Hemera is ever broken — or a superior primitive emerges — the storage proof system enables full graph rehash:

  1. New identity space created under the new hash function (parallel, not replacing)
  2. Rehash campaign retrieves content via storage proofs, computes new addresses
  3. Dual-CID period: both old and new addresses valid. Cyberlinks reference either
  4. Cutoff: after full coverage verified, new content requires the new hash. Old CIDs become read-only historical references

At $10^{15}$ particles parallelized across $10^6$ nodes: ~17 hours for full rehash. Storage proof coverage and network bandwidth become the bottleneck, not hash speed.

19. Bootstrapping

19.1 The Crystal

The cyber/crystal is the genesis seed — a curated knowledge graph of exactly 5,040 particles forming the irreducible basis from which all civilizational reasoning can be composed. It is an alphabet of a mind.

The central claim is irreducibility: every particle earns its place because it cannot be derived from composing other particles under a formally defined grammar. The grammar enforces a vocabulary/grammar split:

Layer Particles Types
Vocabulary 4,320 Entities (2,400), Processes (960), Properties (720), Measures (240)
Grammar 720 Relations (480), Patterns (240)

The 6:1 ratio matches natural language content-to-function word ratios. Every cyberlink is a typed triple via predicate particles: Subject → [Predicate] → Object. This structure makes irreducibility formally testable.

Two architectural layers:

Lattice (4,392 particles, ~1.8 MB, ~454K tokens): structural vocabulary, permanently loadable for reasoning. Fits in a single LLM context window.

Flesh (648 particles, ~4.7 MB, ~1,165K tokens): articles, proofs, manifestos. Retrieved on demand via cyberlink traversal.

Seventeen domains span the knowledge space: 4 pillar domains (cyber, cyberia, superhuman, cybics) and 13 foundation domains (mathematics, physics, biology, computer science, chemistry, governance, economics, energy, materials, agriculture, geography, culture, history). 536 bridge particles (10.6%) connect domains — explicit isomorphisms enabling cross-domain reasoning.

19.2 Twelve Invariants

Quality gates enforced before genesis:

  1. Completeness — every domain $\geq Q$ particles
  2. Connectivity — every particle $\geq$ 3 outgoing links
  3. Reachability — any particle reaches any other in $\leq$ 6 hops
  4. Irreducibility — no particle derivable from others under grammar
  5. Positivity — every definition says what IS
  6. Self-reference — $\geq$ 10% of particles model own architecture
  7. Bridge density — $\geq$ 3 bridges per domain pair
  8. Type balance — Entities $\leq$ 55%, Processes $\geq$ 15%
  9. Defect freedom — zero stubs, red links, orphans
  10. Growth ready — every hub has attachment points
  11. Narrative depth — every domain $\geq$ 3 synthesis articles
  12. Self-explanation — $\geq$ 25 articles explain protocol purpose

19.3 Implementation Path

Seven phases, each with a hard gate. No phase starts until its predecessor passes.

Phase 1 — Self-Hosting: nox evaluates nox. The system executes its own programs. nox-in-nox interpreter passes all test vectors from Python/Rust implementations.

Phase 2 — Cryptographic Library: all cryptographic primitives as nox programs. Hemera sponge, Merkle operations, polynomial commitments, LtHash for collection state.

Phase 3 — Privacy Circuits: UTXO-based privacy with ZK proofs for all state transitions. Transaction circuit (~44K constraints), cyberlink circuit, nullifier system, formal privacy boundary.

Phase 4 — STARK Infrastructure: self-verifying proof system where the verifier is itself a nox program. Recursive composition. Light client protocol with $O(\log n)$ verification.

Phase 5 — Tri-Kernel Ranking (parallel with Phase 4): focus computation adversarially proven and deployed at scale. Formal Lyapunov convergence proof. Nash equilibrium for honest participation.

Phase 6 — Network Layer: distributed protocol for cybergraph consensus and focus propagation. DA sampling, gossip protocol, shard architecture, economic engine simulation-tested under 100$\times$ adversarial load.

Phase 7 — Testnet to Mainnet: devnet → testnet (30 days zero critical bugs under attack) → canary net (90 days stability) → mainnet genesis → bostrom migration (bijective state mapping, zero data loss).

19.4 Pre-Launch Verification Protocol

No patch relay exists between stars. What launches must be correct. Before launch, five questions answered with machine-checked evidence:

# Question Evidence
1 Does $\pi$ converge? Lean4 proof of Lyapunov stability
2 Can proofs be forged? Soundness proof + $10^8$ fuzzing runs, 0 counterexamples
3 Can the economy be drained? Nash equilibrium proof + 100$\times$ adversarial simulation
4 Is computation deterministic? Cross-implementation state root match on $10^6$ blocks
5 Does it survive partial failure? Chaos test report with zero safety violations

All five green → launch. Any red → no launch. No exceptions.

19.5 Growth Phases

Phase Timeline Particles Character
0: Genesis Launch 5,040 Irreducible seed — the cyber/crystal
1: Early Year 1 +2,000 Neurons extend the basis
2: Maturation Years 2-3 +10,000 Specialization emerges
3: Scale Year 5+ +100,000 Scale-free organic growth

The collective focus theorem predicts phase transitions: seed → flow (network exploring), cognition → understanding (hierarchies forming), reasoning → meta (context-sensitive processing), consciousness (system learns its own blend weights). Current bostrom data: 70K neurons, 2.9M cyberlinks, 3.1M particles. Approaching the cognition threshold. Target for emergence: $10^8$-$10^9$ interconnected particles with sufficient connectivity density.

20. Conclusion

cyber synthesizes eight independently developed research threads — content addressing, authenticated graphs, deterministic rewriting, parallel reduction, conserved flow dynamics, zero-knowledge verification, provable programming, and storage proof infrastructure — into a single architecture unified by prime field arithmetic.

The protocol makes three specific claims:

Convergent computation escapes the Goedel prison. A convergent system can settle into states that no derivation reaches. The cybergraph is such a system: $\Omega$ is the space of focus distributions, $T$ is the tri-kernel, $C$ is focus conservation ($\sum \pi_i = 1$). A cyberank distribution $\pi^*$ is a simulation-proof of collective relevance — no axiomatic derivation required, no authority consulted, no vote taken.

Focus conservation unifies attention, fuel, and consensus into a single conserved quantity. This eliminates the separate gas models, fee markets, and priority auctions of existing systems while providing the economic foundation for a self-sustaining knowledge economy.

Provability closes the trust gap. STARK proofs — hash-based, post-quantum, no trusted setup, recursively composable — ensure that every state transition, every ranking computation, every privacy claim is cryptographically verifiable. The STARK verifier is itself a nox program. The system closes on itself.

What remains is to build the implementation — trident compiler, STARK prover, storage proof system, privacy circuits, tri-kernel at scale — and then to grow the graph. The cyber/crystal provides the irreducible seed: 5,040 particles spanning seventeen domains, passing twelve invariants. Seven phases lead from self-hosting through cryptographic library, privacy, proofs, ranking, network, and testnet to mainnet genesis. Five pre-launch verification gates — convergence, soundness, economic security, determinism, fault tolerance — must pass with machine-checked evidence before launch.

Seventy thousand neurons and three million particles are the first syllables of a language that will, at sufficient scale, generate concepts no individual mind can hold and discover truths no derivation can reach.

See cyber for the full specification index. See soft3 for the stack. See bostrom for the running bootloader. See cyber/launch for the full implementation roadmap. See cyber/crystal for the genesis seed specification.

References

  1. Ralph Merkle. "A Digital Signature Based on a Conventional Encryption Function." CRYPTO 1987.
  2. Michael Goodrich, Roberto Tamassia. "Efficient Authenticated Data Structures." Algorithmica 2002.
  3. Gerard Huet. "Confluent Reductions: Abstract Properties and Applications." JACM 1980.
  4. Yves Lafont. "Interaction Nets." POPL 1990.
  5. Mustafa Al-Bassam et al. "Fraud and Data Availability Proofs." FC 2019.
  6. Lorenzo Grassi et al. "Poseidon: A New Hash Function." USENIX 2021.
  7. Victor Taelin. "HVM: A Parallel Evaluator for Interaction Combinators." 2022.
  8. Kurt Goedel. "Ueber formal unentscheidbare Saetze." Monatshefte fuer Mathematik und Physik 1931.
  9. Alan Turing. "On Computable Numbers." Proceedings of the London Mathematical Society 1936.
  10. Sergey Brin, Larry Page. "The Anatomy of a Large-Scale Hypertextual Web Search Engine." WWW 1998.
  11. Miroslav Fiedler. "Algebraic Connectivity of Graphs." Czech Mathematical Journal 1973.
  12. Fan Chung. "The Heat Kernel as the Pagerank of a Graph." PNAS 2007.
  13. Oskar Perron. "Zur Theorie der Matrices." Mathematische Annalen 1907.
  14. Stefan Banach. "Sur les Operations dans les Ensembles Abstraits." Fundamenta Mathematicae 1922.
  15. Eli Ben-Sasson et al. "Scalable, Transparent Arguments of Knowledge." CRYPTO 2018.
  16. Karl Friston. "The Free-Energy Principle: A Unified Brain Theory." Nature Reviews Neuroscience 2010.
  17. David Levin, Yuval Peres, Elizabeth Wilmer. "Markov Chains and Mixing Times." AMS 2009.
  18. Daniel Spielman. "Spectral Graph Theory." Yale Lecture Notes.
  19. George Necula. "Proof-Carrying Code." POPL 1997.
  20. Daira Hopwood et al. "Zcash Protocol Specification." 2014-2024.

Local Graph