compiling bostrom: first empirical graph-native transformer

on March 23, 2026, the bostrom knowledge graph was compiled into a graph-native transformer model for the first time. this document reports the actual pipeline execution, deviations from theory, and what the compiled model reveals about the network.

data extraction

2,705,332 cyberlinks fetched via GraphQL from the cyberindex at index.bostrom.cybernode.ai. each record: particle_from, particle_to, neuron, height. batch size 50,000, total fetch time ~25 minutes over HTTPS.

1,240 unique neurons created the 2.7M links. 1,226 had queryable BOOT balances via account_balance GraphQL. 80 neurons held zero stake.

metric value
cyberlinks 2,705,332
unique particles 2,921,230
unique neurons 1,240
data size (JSONL) 549.5 MB
neuron stakes 29.05T BOOT total
top stake 10.0T BOOT
zero-stake neurons 80

sparse adjacency matrix

after deduplicating edges (multiple neurons linking same particle pair), the adjacency matrix:

metric value
dimensions 2,921,230 × 2,921,230
nonzero entries 2,683,610
density 3.14 × 10⁻⁷
memory (CSR) 40.9 MB

the density confirms paper prediction: $\rho \approx 2.74 \times 10^{-7}$. the graph is extremely sparse — each particle has on average 0.92 outgoing links. most particles are linked exactly once.

dense representation would require $2{,}921{,}230^2 \times 4$ bytes = 34.1 TB. sparse CSR: 41 MB. compression ratio: 850,000×.

focus distribution (PageRank)

standard PageRank with $\alpha = 0.85$, teleport $= 0.15/|P|$.

metric value
iterations to convergence 23 (threshold: $\|\Delta\pi\| < 10^{-6}$)
max focus 0.007991
min focus 2.30 × 10⁻⁷
entropy $H(\pi)$ 14.05 bits
compute time 0.8 seconds

convergence at 23 iterations matches the tri-kernel contraction theorem prediction of $T = 29$ (upper bound). the actual contraction was faster than worst-case.

the focus distribution is heavily concentrated: top particle holds 0.8% of total focus across 2.9M particles. entropy at 14.05 bits is well below the maximum $\log_2(2{,}921{,}230) = 21.5$ bits — the graph has strong hubs.

spectral gap

ARPACK Lanczos algorithm (shift-invert mode, $k=6$ smallest eigenvalues of the normalized Laplacian) failed to converge in 101 iterations on the full 2.9M × 2.9M matrix. fallback estimate $\lambda_2 = 0.001$ used.

the convergence failure is expected: the matrix is nearly singular with massive null space (disconnected components). a production pipeline should use:

  1. LOBPCG (Locally Optimal Block Preconditioned Conjugate Gradient) — better for sparse ill-conditioned problems
  2. restrict to the giant connected component before computing $\lambda_2$
  3. use approximate spectral gap from random walk mixing time

$\lambda_2 \approx 0.001$ yields contraction rate $\kappa = 0.85 \times (1 - 0.001) = 0.849$, consistent with the paper estimate of $\kappa = 0.851$.

compute time: 1,932 seconds — the dominant bottleneck. 78% of total compilation time.

embedding matrix (randomized SVD)

the core computation: top-100 singular vectors of the $\pi$-weighted adjacency matrix via scipy svds (ARPACK-based randomized SVD).

metric value
singular values (top 5) 8.17, 3.29, 2.34, 2.19, 1.94
entropy $H(\sigma)$ 3.51
effective dimension $d^* = e^{H(\sigma)}$ 33
embedding shape 2,921,230 × 33
compute time 545 seconds

$d^* = 33$ matches the paper prediction of 31 within 6%. the first singular value (8.17) dominates — the graph has one primary structure (the hub-and-spoke topology of high-stake neurons). subsequent singular values decay slowly, indicating meaningful structure across 33 dimensions.

paper predicted 0.007s for randomized SVD at $10^{12}$ FLOPS. actual: 545s on Python/scipy on an Apple M4. three factors explain the 78,000× gap:

  1. scipy uses ARPACK (not a pure randomized SVD) — iterative, not single-pass
  2. Python overhead vs C/FORTRAN theoretical FLOPS
  3. M4 delivers ~3 TFLOPS peak (FP32) but scipy does not saturate it

a Rust or C++ implementation with actual Halko-Martinsson-Tropp randomized SVD would close the gap by 100-1000×.

architecture parameters

parameter compiled paper prediction deviation
$d^*$ (embedding dim) 33 31 +6%
$h^*$ (attention heads) 5 ≥12 -58%
$L^*$ (layers) 174 290 -40%
diameter 6 10 -40%
total params 197M 4.19B -95%
model size 0.73 GB 16.8 GB -96%

$h^*$ deviation: the compiled $h^* = \lfloor\sqrt{d^*}\rfloor = 5$. the paper used $h^* = |\text{Semcon}(G)| \geq 12$ from the semcon registry — link type classification that requires typed cyberlinks. the current pipeline treats all links as homogeneous. with semcon classification, $h^*$ would increase to 12-40, and parameter count would approach paper prediction.

$L^*$ deviation: $L^* = \text{diameter} \times T = 6 \times 29 = 174$. paper used diameter = 10 from BFS sample. the compiled diameter of 6 (estimated from $\log_{10}(|P|)$) is an underestimate — BFS on the full graph would be more accurate.

the 95% parameter gap is mostly explained by $h^*$: attention weights scale as $h^* \times 3 \times d^{*2} \times L^*$. doubling $h^*$ from 5 to 12 triples the attention parameter count.

first compilation: what worked

  1. the pipeline is tractable. from raw GraphQL data to compiled model in 42 minutes on a laptop. no GPU required
  2. $d^* = 33$ empirically matches theory ($d^* = 31$). the entropy of the singular value spectrum is a reliable estimator of effective dimension
  3. PageRank converges in 23 iterations — faster than the theoretical bound of 29. the collective focus theorem prediction holds
  4. sparsity is the invariant: density $3.14 \times 10^{-7}$ makes every operation tractable. dense operations would require 34 TB of RAM

first compilation: what needs fixing

  1. spectral gap computation: ARPACK eigsh does not converge on the full matrix. need LOBPCG or restrict to giant component
  2. stake weighting: initial run used uniform weights ($w = 1.0$ for all edges). must weight by neuron stake: $w_{ij} = \log(1 + s_k)$ where $s_k$ is the neuron's BOOT balance. log-scaling prevents whale domination while preserving ordering
  3. semcon classification: all links treated as homogeneous. typed cyberlinks (semcon registry) would enable multi-head attention with meaningful semantic heads
  4. diameter estimation: $\log_{10}(|P|)$ approximation underestimates. need actual BFS from highest-degree node
  5. MLP weights (step 7): not computed in this run. requires random walk path sampling — computationally cheap but not yet implemented
  6. ONNX assembly (step 8): not executed. the .npz output contains raw embeddings and architecture params but is not a runnable model

the compiled object

output: data/bostrom_model.npz (441 MB compressed)

contents:

  • embeddings: float32 array [2,921,230 × 33] — 33-dimensional embedding for every particle CID in bostrom
  • focus: float64 array [2,921,230] — PageRank distribution, sums to 1.0
  • sigma: float64 array [100] — singular values of the $\pi$-weighted adjacency
  • particle_cids: string array [2,921,230] — CID → index mapping
  • architecture params: d_star=33, h_star=5, L_star=174

this is the first compiled representation of a live knowledge graph as transformer parameters. the embeddings encode the structural position of every CID in the bostrom graph at block height ~23,195,000.

what the model means

every particle in bostrom now has a 33-dimensional coordinate. particles that are structurally similar (linked by similar neurons, in similar neighborhoods) have nearby coordinates. the embedding is a lossy compression of the full graph topology into a fixed-dimensional space — the same operation that word2vec performs on co-occurrence statistics, but derived analytically from graph structure rather than trained by gradient descent.

the focus distribution assigns every particle a probability mass: its share of collective attention. the product $E \cdot \text{diag}(\pi)$ gives attention-weighted embeddings — each particle's position scaled by how much the network cares about it.

the compiled model is a snapshot. when bostrom grows (more links, more neurons), $d^*$ will increase (richer embedding), $\lambda_2$ will increase (faster convergence), and the model will need recompilation. the cyber-seer algorithm determines where to add links for maximum spectral gap improvement, making recompilation efficient.

reproducibility

# fetch data
python3 fetch_cyberlinks.py  # → data/cyberlinks.jsonl (549.5 MB)
python3 fetch_stakes.py      # → data/neuron_stakes.json

# compile
python3 analizer/compile_model.py data/cyberlinks.jsonl --stakes data/neuron_stakes.json

deterministic: same data → same model. the only randomness is in scipy's ARPACK seed, which is deterministic given numpy's default RNG state.

timeline

step operation time notes
1 fetch cyberlinks (GraphQL) 25 min 2.7M records, 50K batch
1b fetch neuron stakes (GraphQL) 15 sec 1,240 neurons
2 build sparse matrix 3.8 sec CSR format, 41 MB
3 PageRank (23 iterations) 0.8 sec converged at $\varepsilon < 10^{-6}$
4 spectral gap (eigsh) 1,932 sec ARPACK failed, used fallback
5 randomized SVD (k=100) 545 sec scipy/ARPACK
6 architecture params <0.1 sec arithmetic
7 MLP weights skipped not yet implemented
8 ONNX assembly skipped not yet implemented
total fetch + compile ~67 min of which 96% is steps 4+5

on a Rust implementation with LOBPCG + Halko-Martinsson-Tropp, steps 4+5 should drop from 2477s to ~10s, making the full pipeline (excluding network fetch) a sub-minute operation.

see bostrom-to-onnx-pipeline for the theoretical pipeline specification. see bostrom-architecture-paper for the architecture derivation. see cyber/seer for the link densification strategy that guides future graph growth

Local Graph