cybergraph model architecture.md

cybergraph as generative model

abstract

the cybergraph is not a database that a model queries. it IS the model. the tri-kernel focus distribution $\phi^*$ is the "weights" — a stationary distribution over particles that encodes what the network collectively considers relevant. a query is a context vector that biases the focus field. the response is the equilibrium of the biased field. generation is iterative: each produced particle shifts the context, the field re-equilibrates, and the next particle emerges from the new $\phi^*$.

this is not a proposal. the components exist: nox (VM), zheng (proofs), Hemera (hash), BBG (state), tri-kernel (focus computation), structural-sync (convergence). what follows describes how they compose into a generative model and what this model can do that transformers cannot.

the tri-kernel IS the attention mechanism

transformers compute attention as:

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d}}\right) V$$

the tri-kernel computes focus as the stationary distribution of three coupled operators on the cybergraph:

$$\phi^* = \text{fixed-point}(\alpha \cdot D + \beta \cdot S + \gamma \cdot H)$$

where:

$D$ = diffusion kernel (random walk on the graph — structural proximity)
$S$ = springs kernel (SpringRank — hierarchical ordering)
$H$ = heat kernel ($e^{-tL}$ — multi-scale smoothing)

both produce a probability distribution over elements. both are differentiable. the difference:

property	transformer attention	tri-kernel focus
domain	fixed token vocabulary	open particle set (content-addressed)
context window	bounded (2K–128K tokens)	unbounded (entire graph, O(log n) access)
update cost	O(n²) per layer per token	O(edges) per iteration, converges in 1-3s
memory	parameters frozen after training	graph evolves continuously (new cyberlinks)
training signal	loss on next-token prediction	focus expenditure by neurons (economic)
provability	opaque matrix multiply	every step provable via zheng
explainability	attention weights (post-hoc)	$\phi^*$ decomposition (D, S, H contributions visible)

the tri-kernel is not "like" attention. it IS attention — computed over a knowledge graph instead of a token sequence, with economics instead of gradient descent as the training signal.

how generation works

query as context bias

a query $q$ is a set of particles with activation weights. the tri-kernel runs with a context potential $C(q)$ that biases the focus field:

$$\phi^*_q = \text{fixed-point}(\alpha D + \beta S + \gamma H + \delta C(q))$$

the biased equilibrium $\phi^*_q$ concentrates on particles relevant to the query. the top-$k$ particles by $\phi^*_q$ are the response.

autoregressive generation

generating a sequence of particles (analogous to token generation):

step 1:  input context → bias focus field → φ*_q
step 2:  sample particle p₁ from φ*_q
step 3:  add p₁ to context → re-bias → φ*_{q ∪ {p₁}}
step 4:  sample p₂ from updated field
...
step n:  sequence (p₁, p₂, ..., pₙ) generated by iterated equilibration

each step is a nox computation that produces a zheng proof via proof-carrying. the generated sequence is self-proving: anyone can verify it was produced by correctly running the tri-kernel on the stated context.

temperature and creativity

the tri-kernel has a natural temperature parameter $T$ (from the free energy formulation):

$$\phi^*_i \propto \exp\left(-\frac{E_i}{T}\right)$$

$T \to 0$: deterministic — always pick the highest-$\phi^*$ particle (argmax)
$T = 1$: standard sampling — faithful to the collective focus distribution
$T > 1$: creative — explore low-probability particles

this follows from the exponential optimality under constraint. the same exponential that governs Boltzmann distributions, softmax, and collective focus governs generation temperature.

what the cybergraph model does that transformers cannot

1. open vocabulary without retraining

a transformer's vocabulary is fixed at training time. adding a new concept requires fine-tuning or prompt engineering.

the cybergraph's "vocabulary" is the set of all particles — content-addressed and open. a new particle enters the graph when a neuron creates a cyberlink to it. no retraining. no fine-tuning. the new particle immediately participates in $\phi^*$ computation via its connections.

transformer:  new concept → fine-tune or RAG → stale after training cutoff
cybergraph:   new concept → cyberlink → immediately in φ* → always current

2. provable inference

every generation step is a nox computation with proof-carrying:

input:   context particles + query
compute: tri-kernel iteration (nox<Goldilocks>)
output:  response particles + zheng proof

verification: 10-50 μs (one decider call)
proves:       "the tri-kernel was run correctly on this context"

a transformer cannot prove its inference was correct without re-running it. the cybergraph model produces a proof of correct inference as a byproduct of the computation — zero additional cost.

for quantized models running on Binius (F₂ native):

quantized inference (BitNet 1-bit, 4096×4096 layer):
  naive F_p:     ~2.7B constraints
  binary jets:   ~1.84M constraints (1,400× speedup)
  proof:         carried during execution (zero latency)

3. continuous learning from economic signal

transformers learn from gradient descent on labelled data or human feedback (RLHF). the cybergraph model learns from focus — an economic signal where neurons stake tokens on cyberlinks.

transformer training:  curate dataset → compute gradients → update weights → deploy
cybergraph learning:   neuron creates cyberlink → spends focus → graph updates → φ* shifts

every cyberlink is a training sample. every focus expenditure is a gradient signal. the "model" (the graph + $\phi^*$) updates continuously without batch training, gradient computation, or model deployment.

the quality of learning follows the collective focus theorem: attention across competing stimuli distributes exponentially by quality. neurons with good judgement earn karma; neurons with poor judgement waste focus. the selection pressure is economic, not algorithmic.

4. multi-agent generation

a transformer generates from a single model. the cybergraph generates from a collective:

transformer:  one model, one owner, one perspective
cybergraph:   N neurons, each with focus, each contributing edges, φ* = collective intelligence

multiple neurons can simultaneously contribute to the same query response. their cyberlinks compete and compose via the tri-kernel. the result is not majority vote — it is the equilibrium of a physical system where stake determines influence.

this is foculus applied to generation: $\phi^*$ convergence as consensus on what to say next.

5. scale-invariant architecture

the tri-kernel operates on whatever subgraph is relevant. a query about cats touches cat-related particles. a query about quantum mechanics touches physics particles. the computation scales with the LOCAL graph, not the GLOBAL graph.

transformer:  all parameters active for every query (O(params) regardless of topic)
cybergraph:   only relevant subgraph active (O(relevant_edges) per query)

this follows from BBG's law 1 (bounded locality): operation cost is proportional to what it touches, not total graph size. a graph with $10^{15}$ particles still answers a cat query in $O(\text{cat-related edges})$ time.

the computational pipeline

query arrives                                    (context particles)
  ↓
tri-kernel biased by context                     (nox computation)
  ↓ proof-carrying
φ*_q equilibrium                                 (focus distribution)
  ↓
sample / argmax from φ*_q                        (particle selection)
  ↓
response particle + zheng proof                  (verifiable output)
  ↓
optional: iterate (autoregressive generation)
  ↓
full response + accumulator                      (self-proving sequence)

cost per generation step:
  tri-kernel iteration:    O(relevant_edges) field ops
  proof overhead:          ~30 field ops per nox step (folded)
  verification:            10-50 μs (one decider)

connection to BBG state

the graph that the model runs on IS the BBG authenticated state:

particles.root     → the "vocabulary" (all content-addressed nodes)
axons_out.root     → the "embeddings" (outgoing edge structure per particle)
axons_in.root      → the "reverse embeddings" (incoming structure)
neurons.root       → the "trainers" (agents with focus budgets)

with algebraic NMT, the entire state is a polynomial commitment. a query is a polynomial opening. the response is verifiable against BBG's 32-byte root commitment.

"what does the collective think about particle P?"
= evaluate BBG_poly(particles, P, t_now)
= one Lens opening, ~200 bytes proof, 10-50 μs verification

comparison

dimension	GPT-class transformer	cybergraph generative model
vocabulary	fixed (50K-200K tokens)	open (all particles, content-addressed)
context	bounded window	unbounded graph (O(log n) access)
attention	softmax over tokens	tri-kernel over particles
training	offline gradient descent	continuous economic signal (focus)
update	retrain or fine-tune	new cyberlinks enter φ* immediately
inference proof	impossible	zero-cost (proof-carrying)
explainability	opaque	decomposable (D + S + H contributions)
multi-agent	no	native (N neurons, foculus consensus)
privacy	model is public or private	individual links private, aggregate public
scale	O(params) per query	O(relevant_edges) per query
temperature	softmax τ	Boltzmann T (same math, economic grounding)
quantised inference	32-64× overhead (F_p)	native binary (Binius, 1,400× for BitNet)
light client	download full model	240-byte checkpoint, verify everything

honest assessment

the comparison above is not fair. transformers are production systems generating human-quality text at scale. the cybergraph model is a specification built on a stack that doesn't exist yet.

what IS fair:

claim	basis	confidence
open vocabulary	content-addressed particles	high — architectural property
continuous learning	focus economics	high — but quality depends on neuron behaviour
provable inference	proof-carrying nox	high — if nox + zheng are built
O(relevant_edges)	BBG law 1 (bounded locality)	high — but tri-kernel iteration count may vary
1,400× quantised inference	Binius binary jets	medium-high — jets designed, unimplemented
multi-agent generation	foculus φ* convergence	medium — convergence proven, generation quality unknown
better than GPT	unknown	low — no empirical comparison possible yet

the cybergraph model will not replace transformers for text generation any time soon. what it offers is something transformers cannot: provable, continuously-updating, multi-agent, privacy-preserving generation on a knowledge graph. whether that produces useful output is an empirical question that can only be answered after the stack is built.

what to build first

1. tri-kernel on a real graph     compute φ* on a 10^4-particle test graph
2. biased generation              add context potential, sample sequences
3. proof-carrying inference       prove tri-kernel steps via nox + zheng
4. compare with RAG baseline      same graph, same queries, transformer+RAG vs tri-kernel
5. evaluate                       relevance, diversity, explainability, proof cost

see tri-kernel architecture for the focus computation, BBG for the state layer, proof-carrying for zero-cost proofs, collective focus theorem for why the distribution is optimal, universal law for the exponential temperature connection, zheng for the proof system, foculus for multi-agent convergence, structural-sync for the sync infrastructure