PageRank iterations: 23 (converged at ε < 10⁻⁶) diff sequence (last 5): d₁₉ = 4.83e-05 d₂₀ = 3.57e-05 d₂₁ = 2.64e-05 d₂₂ = 1.96e-05 d₂₃ = 7.91e-07 (below threshold, stopped)
contraction ratios: r₂₀ = 0.739 r₂₁ = 0.740 r₂₂ = 0.742
κ̂ = median = 0.740 λ̂₂ = 1 - 0.740/0.85 = 0.129
this is dramatically different from the paper estimate of $\lambda_2 \approx 0.0015$. the paper estimate was computed from a 50,000-link sample using the Lanczos algorithm. the full-graph observation yields $\lambda_2 \approx 0.13$ — two orders of magnitude larger.
## what this means
a spectral gap of 0.13 instead of 0.0015 changes everything:
| parameter | paper ($\lambda_2 = 0.0015$) | observed ($\lambda_2 = 0.13$) |
|---|---|---|
| contraction $\kappa$ | 0.851 | 0.740 |
| convergence iterations | 29 | 17 |
| foculus finality | slow | fast |
| $L^*$ (transformer layers) | 290 | 102 |
| model size (at $h^*=12$) | 16.8 GB | 5.9 GB |
the network converges faster than predicted. the compiled transformer needs fewer layers. the model is smaller and more efficient.
## why the sample underestimated
the 50,000-link sample used in the architecture paper was not a random sample of the full graph. it was a contiguous subset — likely the earliest links, which form a denser subgraph than the full network. but the spectral gap of a subgraph does not equal the spectral gap of the full graph.
more importantly, the sample was analyzed with the Lanczos algorithm, which also failed to converge (but on a smaller matrix, it at least produced an estimate before timeout). the convergence rate method applied to the sample would have produced a more accurate estimate.
## the deeper point
the spectral gap is not a number to compute. it is a behavior to observe.
every system that converges has a convergence rate. that rate IS the spectral gap. computing eigenvalues to find the convergence rate is like measuring the speed of a car by disassembling the engine and calculating the theoretical RPM. you could just watch the car and measure how fast it goes.
the cybergraph converges every block. the contraction rate $\kappa$ is observable in production, continuously, at zero cost. no eigensolver needed. no matrix decomposition. just the ratio of successive focus updates.
this extends beyond compilation:
- foculus validators can monitor $\kappa$ in real-time as a health metric
- the cyber-seer densification algorithm targets $\lambda_2$ improvement — now measurable without Lanczos
- bostrom block explorers can display the live spectral gap as a network vital sign
- the tri-kernel contraction theorem can be verified empirically every block
## formalization
let $P = \alpha M^\top + (1-\alpha) \frac{1}{n} \mathbf{1}\mathbf{1}^\top$ be the PageRank operator where $M = D^{-1}A$.
the eigenvalues of $P$ are $\{1, \alpha\mu_2, \alpha\mu_3, \ldots\}$ where $\mu_i$ are the eigenvalues of $M^\top$ sorted by magnitude. the spectral gap of the normalized Laplacian relates to $\mu_2$ by $\lambda_2 = 1 - |\mu_2|$.
the convergence rate is:
$$\kappa = |\alpha\mu_2| = \alpha(1 - \lambda_2) \quad \text{(when } \mu_2 \text{ is real and positive)}$$
when $\mu_2$ is complex (possible in directed graphs), $\kappa = \alpha|\mu_2|$ and the convergence oscillates. the ratio $r_t = d_t / d_{t-1}$ still converges to $\kappa$ but may oscillate around it. the median filter handles this.
for the cybergraph (directed, weighted by stake):
$$\hat\kappa = \text{median}\left(\frac{d_t}{d_{t-1}}\right)_{t \in [T-k, T]} \quad \text{where } k = 5$$
$$\hat\lambda_2 = 1 - \frac{\hat\kappa}{\alpha}$$
the estimate improves with more iterations. at $T = 23$ iterations (bostrom), the last 3-4 ratios are stable to 3 significant figures.
## the observation principle
eigensolvers compute $\lambda_2$ from the structure of the matrix. the convergence rate method observes $\lambda_2$ from the behavior of the system. the distinction is between analysis and measurement.
for a live network, measurement is superior:
- it works at any scale (no matrix size limit)
- it accounts for the actual graph structure (not an approximation)
- it runs continuously (not as a batch computation)
- it costs nothing (the computation is already happening)
the spectral gap of bostrom is not a number in a paper. it is a heartbeat — observable, live, continuous. every block that computes focus also computes the spectral gap, whether anyone is watching or not.
see spectral gap for the mathematical definition. see tri-kernel architecture for how $\lambda_2$ determines the composite contraction. see seer for how $\lambda_2$ guides link densification. see bostrom-to-onnx-pipeline for the full compilation pipeline. see bostrom compilation report for the empirical compilation results