research/cybergraph model architecture.md

---
tags: cyber, research, article
crystal-type: pattern
crystal-domain: cyber
status: draft
---
# cybergraph as generative model

## abstract

the cybergraph is not a database that a model queries. it IS the model. the tri-kernel focus distribution $\phi^*$ is the "weights" — a stationary distribution over particles that encodes what the network collectively considers relevant. a query is a context vector that biases the focus field. the response is the equilibrium of the biased field. generation is iterative: each produced particle shifts the context, the field re-equilibrates, and the next particle emerges from the new $\phi^*$.

this is not a proposal. the components exist: [[nox]] (VM), [[zheng]] (proofs), [[Hemera]] (hash), [[BBG]] (state), tri-kernel (focus computation), [[structural-sync]] (convergence). what follows describes how they compose into a generative model and what this model can do that transformers cannot.

## the tri-kernel IS the attention mechanism

transformers compute attention as:



$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d}}\right) V$$



the tri-kernel computes focus as the stationary distribution of three coupled operators on the cybergraph:



$$\phi^* = \text{fixed-point}(\alpha \cdot D + \beta \cdot S + \gamma \cdot H)$$



where:
- $D$ = diffusion kernel (random walk on the graph — structural proximity)
- $S$ = springs kernel (SpringRank — hierarchical ordering)
- $H$ = heat kernel ($e^{-tL}$ — multi-scale smoothing)

both produce a probability distribution over elements. both are differentiable. the difference:

| property | transformer attention | tri-kernel focus |
|---|---|---|
| domain | fixed token vocabulary | open particle set (content-addressed) |
| context window | bounded (2K–128K tokens) | unbounded (entire graph, O(log n) access) |
| update cost | O(n²) per layer per token | O(edges) per iteration, converges in 1-3s |
| memory | parameters frozen after training | graph evolves continuously (new cyberlinks) |
| training signal | loss on next-token prediction | focus expenditure by [[neurons]] (economic) |
| provability | opaque matrix multiply | every step provable via zheng |
| explainability | attention weights (post-hoc) | $\phi^*$ decomposition (D, S, H contributions visible) |

the tri-kernel is not "like" attention. it IS attention — computed over a knowledge graph instead of a token sequence, with economics instead of gradient descent as the training signal.

## how generation works

### query as context bias

a query $q$ is a set of particles with activation weights. the tri-kernel runs with a context potential $C(q)$ that biases the focus field:



$$\phi^*_q = \text{fixed-point}(\alpha D + \beta S + \gamma H + \delta C(q))$$



the biased equilibrium $\phi^*_q$ concentrates on particles relevant to the query. the top-$k$ particles by $\phi^*_q$ are the response.

### autoregressive generation

generating a sequence of particles (analogous to token generation):

step 1: input context → bias focus field → φ*_q step 2: sample particle p₁ from φ*q step 3: add p₁ to context → re-bias → φ*{q ∪ {p₁}} step 4: sample p₂ from updated field ... step n: sequence (p₁, p₂, ..., pₙ) generated by iterated equilibration


each step is a nox computation that produces a zheng proof via proof-carrying. the generated sequence is self-proving: anyone can verify it was produced by correctly running the tri-kernel on the stated context.

### temperature and creativity

the tri-kernel has a natural temperature parameter $T$ (from the free energy formulation):



$$\phi^*_i \propto \exp\left(-\frac{E_i}{T}\right)$$



- $T \to 0$: deterministic — always pick the highest-$\phi^*$ particle (argmax)
- $T = 1$: standard sampling — faithful to the collective focus distribution
- $T > 1$: creative — explore low-probability particles

this follows from the exponential optimality under constraint. the same exponential that governs Boltzmann distributions, softmax, and collective focus governs generation temperature.

## what the cybergraph model does that transformers cannot

### 1. open vocabulary without retraining

a transformer's vocabulary is fixed at training time. adding a new concept requires fine-tuning or prompt engineering.

the cybergraph's "vocabulary" is the set of all particles — content-addressed and open. a new particle enters the graph when a neuron creates a cyberlink to it. no retraining. no fine-tuning. the new particle immediately participates in $\phi^*$ computation via its connections.

transformer: new concept → fine-tune or RAG → stale after training cutoff cybergraph: new concept → cyberlink → immediately in φ* → always current


### 2. provable inference

every generation step is a nox computation with proof-carrying:

input: context particles + query compute: tri-kernel iteration (nox) output: response particles + zheng proof

verification: 10-50 μs (one decider call) proves: "the tri-kernel was run correctly on this context"


a transformer cannot prove its inference was correct without re-running it. the cybergraph model produces a proof of correct inference as a byproduct of the computation — zero additional cost.

for quantized models running on Binius (F₂ native):

quantized inference (BitNet 1-bit, 4096×4096 layer): naive F_p: ~2.7B constraints binary jets: ~1.84M constraints (1,400× speedup) proof: carried during execution (zero latency)


### 3. continuous learning from economic signal

transformers learn from gradient descent on labelled data or human feedback (RLHF). the cybergraph model learns from focus — an economic signal where neurons stake tokens on cyberlinks.

transformer training: curate dataset → compute gradients → update weights → deploy cybergraph learning: neuron creates cyberlink → spends focus → graph updates → φ* shifts


every cyberlink is a training sample. every focus expenditure is a gradient signal. the "model" (the graph + $\phi^*$) updates continuously without batch training, gradient computation, or model deployment.

the quality of learning follows the collective focus theorem: attention across competing stimuli distributes exponentially by quality. neurons with good judgement earn karma; neurons with poor judgement waste focus. the selection pressure is economic, not algorithmic.

### 4. multi-agent generation

a transformer generates from a single model. the cybergraph generates from a collective:

transformer: one model, one owner, one perspective cybergraph: N neurons, each with focus, each contributing edges, φ* = collective intelligence


multiple neurons can simultaneously contribute to the same query response. their cyberlinks compete and compose via the tri-kernel. the result is not majority vote — it is the equilibrium of a physical system where stake determines influence.

this is foculus applied to generation: $\phi^*$ convergence as consensus on what to say next.

### 5. scale-invariant architecture

the tri-kernel operates on whatever subgraph is relevant. a query about cats touches cat-related particles. a query about quantum mechanics touches physics particles. the computation scales with the LOCAL graph, not the GLOBAL graph.

transformer: all parameters active for every query (O(params) regardless of topic) cybergraph: only relevant subgraph active (O(relevant_edges) per query)


this follows from BBG's law 1 (bounded locality): operation cost is proportional to what it touches, not total graph size. a graph with $10^{15}$ particles still answers a cat query in $O(\text{cat-related edges})$ time.

## the computational pipeline

query arrives (context particles) ↓ tri-kernel biased by context (nox computation) ↓ proof-carrying φ*_q equilibrium (focus distribution) ↓ sample / argmax from φ*_q (particle selection) ↓ response particle + zheng proof (verifiable output) ↓ optional: iterate (autoregressive generation) ↓ full response + accumulator (self-proving sequence)

cost per generation step: tri-kernel iteration: O(relevant_edges) field ops proof overhead: ~30 field ops per nox step (folded) verification: 10-50 μs (one decider)


## connection to BBG state

the graph that the model runs on IS the BBG authenticated state:

particles.root → the "vocabulary" (all content-addressed nodes) axons_out.root → the "embeddings" (outgoing edge structure per particle) axons_in.root → the "reverse embeddings" (incoming structure) neurons.root → the "trainers" (agents with focus budgets)


with algebraic NMT, the entire state is a polynomial commitment. a query is a polynomial opening. the response is verifiable against BBG's 32-byte root commitment.

"what does the collective think about particle P?" = evaluate BBG_poly(particles, P, t_now) = one Lens opening, ~200 bytes proof, 10-50 μs verification


## comparison

| dimension | GPT-class transformer | cybergraph generative model |
|---|---|---|
| vocabulary | fixed (50K-200K tokens) | open (all particles, content-addressed) |
| context | bounded window | unbounded graph (O(log n) access) |
| attention | softmax over tokens | tri-kernel over particles |
| training | offline gradient descent | continuous economic signal (focus) |
| update | retrain or fine-tune | new cyberlinks enter φ* immediately |
| inference proof | impossible | zero-cost (proof-carrying) |
| explainability | opaque | decomposable (D + S + H contributions) |
| multi-agent | no | native (N neurons, foculus consensus) |
| privacy | model is public or private | individual links private, aggregate public |
| scale | O(params) per query | O(relevant_edges) per query |
| temperature | softmax τ | Boltzmann T (same math, economic grounding) |
| quantised inference | 32-64× overhead (F_p) | native binary (Binius, 1,400× for BitNet) |
| light client | download full model | 240-byte checkpoint, verify everything |

## honest assessment

the comparison above is not fair. transformers are production systems generating human-quality text at scale. the cybergraph model is a specification built on a stack that doesn't exist yet.

what IS fair:

| claim | basis | confidence |
|---|---|---|
| open vocabulary | content-addressed particles | high — architectural property |
| continuous learning | focus economics | high — but quality depends on neuron behaviour |
| provable inference | proof-carrying nox | high — if nox + zheng are built |
| O(relevant_edges) | BBG law 1 (bounded locality) | high — but tri-kernel iteration count may vary |
| 1,400× quantised inference | Binius binary jets | medium-high — jets designed, unimplemented |
| multi-agent generation | foculus φ* convergence | medium — convergence proven, generation quality unknown |
| better than GPT | unknown | low — no empirical comparison possible yet |

the cybergraph model will not replace transformers for text generation any time soon. what it offers is something transformers cannot: provable, continuously-updating, multi-agent, privacy-preserving generation on a knowledge graph. whether that produces useful output is an empirical question that can only be answered after the stack is built.

## what to build first

tri-kernel on a real graph compute φ* on a 10^4-particle test graph
biased generation add context potential, sample sequences
proof-carrying inference prove tri-kernel steps via nox + zheng
compare with RAG baseline same graph, same queries, transformer+RAG vs tri-kernel
evaluate relevance, diversity, explainability, proof cost


see tri-kernel architecture for the focus computation, BBG for the state layer, proof-carrying for zero-cost proofs, collective focus theorem for why the distribution is optimal, universal law for the exponential temperature connection, zheng for the proof system, foculus for multi-agent convergence, structural-sync for the sync infrastructure

research/cybergraph model architecture.md

Graph