- --
step 1: input context → bias focus field → φ*_q step 2: sample particle p₁ from φ*q step 3: add p₁ to context → re-bias → φ*{q ∪ {p₁}} step 4: sample p₂ from updated field ... step n: sequence (p₁, p₂, ..., pₙ) generated by iterated equilibration
each step is a nox computation that produces a zheng proof via proof-carrying. the generated sequence is self-proving: anyone can verify it was produced by correctly running the tri-kernel on the stated context.
### temperature and creativity
the tri-kernel has a natural temperature parameter $T$ (from the free energy formulation):
$$\phi^*_i \propto \exp\left(-\frac{E_i}{T}\right)$$
- $T \to 0$: deterministic — always pick the highest-$\phi^*$ particle (argmax)
- $T = 1$: standard sampling — faithful to the collective focus distribution
- $T > 1$: creative — explore low-probability particles
this follows from the exponential optimality under constraint. the same exponential that governs Boltzmann distributions, softmax, and collective focus governs generation temperature.
## what the cybergraph model does that transformers cannot
### 1. open vocabulary without retraining
a transformer's vocabulary is fixed at training time. adding a new concept requires fine-tuning or prompt engineering.
the cybergraph's "vocabulary" is the set of all particles — content-addressed and open. a new particle enters the graph when a neuron creates a cyberlink to it. no retraining. no fine-tuning. the new particle immediately participates in $\phi^*$ computation via its connections.
transformer: new concept → fine-tune or RAG → stale after training cutoff cybergraph: new concept → cyberlink → immediately in φ* → always current
### 2. provable inference
every generation step is a nox computation with proof-carrying:
input: context particles + query
compute: tri-kernel iteration (nox
verification: 10-50 μs (one decider call) proves: "the tri-kernel was run correctly on this context"
a transformer cannot prove its inference was correct without re-running it. the cybergraph model produces a proof of correct inference as a byproduct of the computation — zero additional cost.
for quantized models running on Binius (F₂ native):
quantized inference (BitNet 1-bit, 4096×4096 layer): naive F_p: ~2.7B constraints binary jets: ~1.84M constraints (1,400× speedup) proof: carried during execution (zero latency)
### 3. continuous learning from economic signal
transformers learn from gradient descent on labelled data or human feedback (RLHF). the cybergraph model learns from focus — an economic signal where neurons stake tokens on cyberlinks.
transformer training: curate dataset → compute gradients → update weights → deploy cybergraph learning: neuron creates cyberlink → spends focus → graph updates → φ* shifts
every cyberlink is a training sample. every focus expenditure is a gradient signal. the "model" (the graph + $\phi^*$) updates continuously without batch training, gradient computation, or model deployment.
the quality of learning follows the collective focus theorem: attention across competing stimuli distributes exponentially by quality. neurons with good judgement earn karma; neurons with poor judgement waste focus. the selection pressure is economic, not algorithmic.
### 4. multi-agent generation
a transformer generates from a single model. the cybergraph generates from a collective:
transformer: one model, one owner, one perspective cybergraph: N neurons, each with focus, each contributing edges, φ* = collective intelligence
multiple neurons can simultaneously contribute to the same query response. their cyberlinks compete and compose via the tri-kernel. the result is not majority vote — it is the equilibrium of a physical system where stake determines influence.
this is foculus applied to generation: $\phi^*$ convergence as consensus on what to say next.
### 5. scale-invariant architecture
the tri-kernel operates on whatever subgraph is relevant. a query about cats touches cat-related particles. a query about quantum mechanics touches physics particles. the computation scales with the LOCAL graph, not the GLOBAL graph.
transformer: all parameters active for every query (O(params) regardless of topic) cybergraph: only relevant subgraph active (O(relevant_edges) per query)
this follows from BBG's law 1 (bounded locality): operation cost is proportional to what it touches, not total graph size. a graph with $10^{15}$ particles still answers a cat query in $O(\text{cat-related edges})$ time.
## the computational pipeline
query arrives (context particles) ↓ tri-kernel biased by context (nox computation) ↓ proof-carrying φ*_q equilibrium (focus distribution) ↓ sample / argmax from φ*_q (particle selection) ↓ response particle + zheng proof (verifiable output) ↓ optional: iterate (autoregressive generation) ↓ full response + accumulator (self-proving sequence)
cost per generation step: tri-kernel iteration: O(relevant_edges) field ops proof overhead: ~30 field ops per nox step (folded) verification: 10-50 μs (one decider)
## connection to BBG state
the graph that the model runs on IS the BBG authenticated state:
particles.root → the "vocabulary" (all content-addressed nodes) axons_out.root → the "embeddings" (outgoing edge structure per particle) axons_in.root → the "reverse embeddings" (incoming structure) neurons.root → the "trainers" (agents with focus budgets)
with algebraic NMT, the entire state is a polynomial commitment. a query is a polynomial opening. the response is verifiable against BBG's 32-byte root commitment.
"what does the collective think about particle P?" = evaluate BBG_poly(particles, P, t_now) = one Lens opening, ~200 bytes proof, 10-50 μs verification
## comparison
| dimension | GPT-class transformer | cybergraph generative model |
|---|---|---|
| vocabulary | fixed (50K-200K tokens) | open (all particles, content-addressed) |
| context | bounded window | unbounded graph (O(log n) access) |
| attention | softmax over tokens | tri-kernel over particles |
| training | offline gradient descent | continuous economic signal (focus) |
| update | retrain or fine-tune | new cyberlinks enter φ* immediately |
| inference proof | impossible | zero-cost (proof-carrying) |
| explainability | opaque | decomposable (D + S + H contributions) |
| multi-agent | no | native (N neurons, foculus consensus) |
| privacy | model is public or private | individual links private, aggregate public |
| scale | O(params) per query | O(relevant_edges) per query |
| temperature | softmax τ | Boltzmann T (same math, economic grounding) |
| quantised inference | 32-64× overhead (F_p) | native binary (Binius, 1,400× for BitNet) |
| light client | download full model | 240-byte checkpoint, verify everything |
## honest assessment
the comparison above is not fair. transformers are production systems generating human-quality text at scale. the cybergraph model is a specification built on a stack that doesn't exist yet.
what IS fair:
| claim | basis | confidence |
|---|---|---|
| open vocabulary | content-addressed particles | high — architectural property |
| continuous learning | focus economics | high — but quality depends on neuron behaviour |
| provable inference | proof-carrying nox | high — if nox + zheng are built |
| O(relevant_edges) | BBG law 1 (bounded locality) | high — but tri-kernel iteration count may vary |
| 1,400× quantised inference | Binius binary jets | medium-high — jets designed, unimplemented |
| multi-agent generation | foculus φ* convergence | medium — convergence proven, generation quality unknown |
| better than GPT | unknown | low — no empirical comparison possible yet |
the cybergraph model will not replace transformers for text generation any time soon. what it offers is something transformers cannot: provable, continuously-updating, multi-agent, privacy-preserving generation on a knowledge graph. whether that produces useful output is an empirical question that can only be answered after the stack is built.
## what to build first
- tri-kernel on a real graph compute φ* on a 10^4-particle test graph
- biased generation add context potential, sample sequences
- proof-carrying inference prove tri-kernel steps via nox + zheng
- compare with RAG baseline same graph, same queries, transformer+RAG vs tri-kernel
- evaluate relevance, diversity, explainability, proof cost
see tri-kernel architecture for the focus computation, BBG for the state layer, proof-carrying for zero-cost proofs, collective focus theorem for why the distribution is optimal, universal law for the exponential temperature connection, zheng for the proof system, foculus for multi-agent convergence, structural-sync for the sync infrastructure