transformer jets — compiled cybergraph inference
composite jets for compiling the cybergraph into a transformer and running fast inference inside nox. the focus flow computation specification (§6.6) derives transformer architecture analytically from graph structure — these jets make the compilation and inference path practical at scale.
motivated by Percepta's demonstration (March 2026) that a WASM interpreter embedded in transformer weights achieves 30K tok/s with 2D attention heads and convex hull KV-cache. their key insight: restricting head dimension to 2 turns attention lookup into a geometric query solvable in O(log n) via convex hull data structures. we adopt this for the compiled transformer fast path.
key finding: no new languages needed
all seven operations decompose into existing cyb/languages:
| jet | operation | primary lang | secondary | proof path |
|---|---|---|---|---|
| 0 sparse_svd | π*-weighted truncated SVD | Ten (contraction) | Arc (adjacency) | Ten → Tri |
| 1 spectral_rank | effective dimensionality d* | Bel (entropy on Δⁿ) | Ten | Bel → research / Ten → Tri |
| 2 semcon_partition | subgraph extraction by edge type | Arc (subcategory) | — | Arc → Tri |
| 3 compile_weights | assemble transformer weights | Ten | Arc | composition of jets 0-2 |
| 4 hull_attention | 2D convex hull max-dot query | Ren (G(2,0,0)) | — | Ren → Tri |
| 5 tri_step | composite D+S+H operator | Ten (SpMV) | Arc, Bel | Ten → Tri |
| 6 reconverge | incremental Δπ + STARK proof | Tok (conservation) | Tri (proof) | Tok → stark |
these are composite jets — compositions of existing language primitives recognized by formula hash and accelerated. they introduce no new algebraic domain. the one genuinely new primitive is hull_attention, which belongs to Ren (2D Euclidean geometric algebra).
decomposition into language primitives
sparse_svd = Ten(matmul, transpose) ∘ Arc(π*_weighted_adjacency)
spectral_rank = Bel(shannon_entropy) ∘ Ten(normalize_spectrum)
semcon_partition = Arc(filter_edges_by_morphism_type)
compile_weights = Ten(assemble) ∘ sparse_svd ∘ semcon_partition
hull_attention = Ren(convex_hull_supporting_point) ← one new Ren op
tri_step = Ten(spmv) × 3 + Ten(simplex_project) ← existing "matmul jet → fma"
reconverge = tri_step^k + Tok(verify_conservation) + Tri(stark_prove)
context: two inference paths
the cybergraph supports two simultaneous computations:
- focus flow — tri-kernel iterated to convergence over all cyberlinks. persistent, global, exact π*. the ground truth
- compiled transformer — architecture derived from graph, runs L* tri-kernel steps over local context. fast inference path, ε-approximate
jets 0-3 handle compilation (graph → weights). jets 4-6 handle inference (query → response via compiled weights). together they close the loop specified in §6.6 of the whitepaper.
jet 0: sparse_svd
sparse_svd(A_weighted, rank) → (U, Σ, V)
input: π*-weighted adjacency matrix (sparse), target rank d*
output: truncated SVD — left/right singular vectors + singular values
language decomposition: Arc extracts the π*-weighted adjacency (sparse graph → sparse matrix). Ten performs randomized SVD via iterated matrix-vector products. the same "matmul jet → fma" GFP primitive that already handles Arc:rank(g, steps).
- pure equivalent: millions of field ops (power iteration + QR)
- jet cost: O(|E| · d* · log d*)
- accelerates: embedding matrix E* = U_{:,1:d*} computation, the provably optimal initialization (Eckart-Young theorem)
jet 1: spectral_rank
spectral_rank(Σ) → d*
input: singular value spectrum from jet 0
output: effective dimensionality d* = exp(H(σ(Σ_π*)))
language decomposition: Ten normalizes the spectrum. Bel computes Shannon entropy H(σ) — the information-geometric measure of how many independent dimensions the graph spans. this is Bel's native domain: entropy on the probability simplex.
- pure equivalent: ~3N ops (normalize, log, entropy sum)
- jet cost: N (number of singular values)
jet 2: semcon_partition
semcon_partition(A_eff, semcon_ids) → Vec<A_s>
input: effective adjacency matrix, semcon type assignments
output: per-semcon adjacency submatrices
language decomposition: pure Arc — extract subcategories of the cybergraph by morphism type (semcon). each semcon s defines a subgraph from which attention weights W_Q^(s), W_K^(s) are derived. the number of distinct semcons determines h* (minimum head count).
- jet cost: O(|E| · h*)
jet 3: compile_weights
compile_weights(E*, {A_s}, L*, d*) → TransformerWeights
input: embedding matrix, per-semcon adjacencies, layer count, dim
output: complete compiled transformer weight set
composition of jets 0-2 plus Ten path co-occurrence statistics up to depth L*. layer count L* = diam(G) · ⌈log(1/ε)/log(1/κ)⌉ from the collective focus theorem.
- jet cost: O(|E| · d* · L*)
jet 4: hull_attention — the one new primitive
hull_attention(q, hull_cache) → (value, updated_cache)
input: 2D query vector, convex hull KV-cache
output: max-dot-product value, updated cache with new key inserted
the core inference acceleration and the only genuinely new primitive. implements hard-max attention via convex hull supporting-point query in Ren's domain: G(2,0,0) — 2D Euclidean geometric algebra.
given direction q ∈ R², find the key on the convex hull that maximizes q · k. this is a supporting hyperplane query — a standard operation in computational geometry, native to Ren's Clifford algebra.
- pure equivalent: O(n) linear scan over all cached keys
- jet cost: O(log n) per query via convex hull binary search
- stark constraints: O(log n) — hull membership proof
- GFP primitive: fma (same as Ren:geometric_product)
- accelerates: every decoding step of the compiled transformer. on million-token traces: 200× speedup (demonstrated by Percepta)
cache maintenance: incremental convex hull update on key insertion — amortized O(log n).
extension: k-sparse softmax via nested convex hulls. retrieve top-k keys, softmax over those k. cost: O(k + log n). this bridges hard-max (k=1, pure execution) and full softmax (k=n, standard attention).
why 2D is sufficient: any 1D lookup (retrieve value at index i) can be encoded as a 2D max-dot-product query. keys k_j = (2j, -j²), query q = (i, 1): the unique maximizer is j = i. this embeds integer indexing into 2D geometry — Ren's domain. higher-dimensional heads (3D hulls) give O(log² n) but may be unnecessary.
jet 5: tri_step
tri_step(φ, A_local, λ_d, λ_s, λ_h, τ) → φ'
input: current focus vector (local), local adjacency, kernel weights, temperature
output: updated focus vector after one composite tri-kernel step
language decomposition: three Ten sparse matrix-vector products (diffusion, springs, heat) + Ten simplex projection. this is the existing "Arc: rank(g, steps) → matmul jet → fma" extended to the full tri-kernel composite.
$$φ' = \text{norm}[λ_d · D(φ) + λ_s · S(φ) + λ_h · H_τ(φ)]$$
operates on the local h-hop neighborhood only (locality theorem T4).
- jet cost: O(|E_local|)
- stark constraints: O(|E_local|)
jet 6: reconverge
reconverge(π_current, Δlinks, bbg_root, ε) → (π_updated, π_Δ, proof)
input: current focus, new cyberlinks, state root, precision target
output: updated focus, sparse delta, STARK proof of correctness
language decomposition: tri_step^k (Ten) until convergence + Tok conservation verification (Σπ = 1) + Tri STARK proof generation. this is the self-minting operation: a neuron creates cyberlinks, proves Δπ, and mints $CYB proportional to the proven shift.
- jet cost: O(|E_local| · log(1/ε) / log(1/κ))
- the proof IS the mining
the compilation loop
graph state (bbg)
↓ sparse_svd (jet 0: Ten ∘ Arc)
↓ spectral_rank (jet 1: Bel ∘ Ten)
↓ semcon_partition (jet 2: Arc)
↓ compile_weights (jet 3: Ten ∘ Arc)
compiled transformer
↓ hull_attention (jet 4: Ren) × L* layers
↓ tri_step (jet 5: Ten × 3) per layer
fast inference response
↓ new cyberlinks from inference
↓ reconverge (jet 6: Ten + Tok + Tri)
updated π*, proof, reward
→ back to graph state
the loop is self-improving: every cyberlink added increases |E|, raises d*, may shrink diam(G) — producing a structurally better compiled model at next compilation. the cybergraph is a compounding inference quality asset.
relationship to existing jets and languages
jet groups
| jet group | count | target | languages used |
|---|---|---|---|
| verifier jets (recursive-jets) | 5 | proof composition | Tri |
| binary jets (binary-jets) | 8 | Bt prover | Bt |
| transformer jets (this proposal) | 7 | compiled inference | Ten, Arc, Bel, Ren, Tok, Tri |
| total | 20 | complete acceleration stack |
verifier jets are pure Tri. binary jets are pure Bt. transformer jets are the first cross-language composite jets — they compose six of the fourteen proof languages. this validates the cyb/languages architecture: the languages are independently irreducible, but their compositions produce the complex operations needed for intelligence.
new Ren operations needed
hull_attention requires one new operation in Ren:
Ren operation nox composition jet GFP primitive
───────────────── ────────────────────────── ────────── ────────────
geometric_product mul/add over components geo_mul jet fma
hull_supporting_point convex hull binary search hull jet fma + cmp
hull_insert incremental hull update hull_upd jet fma + cmp
these should be added to the Ren language spec as native operations in G(2,0,0).
why this validates the architecture
Percepta built a WASM interpreter inside transformer weights to make LLMs compute. they needed a new architecture (2D heads, custom KV-cache, execution trace encoding).
we need none of that. the existing language set already covers every algebraic domain their construction requires. their "programs into weights" vision is our §6.6 compile_weights — already specified. their "exponentially fast attention" is one new Ren primitive. their "execution traces" are our append-only cybergraph (axiom A3).
the languages doc states: "Add any plausible new language — say, a concurrent process calculus or an optimization language — and it turns out to reduce to a composition of existing ones via Nox." this proposal confirms it: a compiled transformer inference engine — the most complex composite operation we've specified — reduces to compositions of Ten, Arc, Bel, Ren, Tok, and Tri. no new language needed.
open questions
- hull_attention in higher dimensions: 2D hulls give O(log n). 3D hulls give O(log² n). is 2D sufficient for all tri-kernel diffusion steps, or do some semcon heads benefit from 3D?
- Bel readiness: spectral_rank uses Bel (entropy on simplex). Bel is currently "research horizon." should this jet accelerate Bel's move to engineering-ready?
- compilation frequency: recompile per-epoch (slow, high quality) or incremental (fast, approximate)?
- training residual: after compilation, what does fine-tuning learn that the graph cannot encode? quantifying this gap determines the value of the compiled path vs pure focus flow