What Intelligence Looks Like at Avogadro Scale

6.022 × 10²³.

The threshold at which the description of any system of interacting elements changes qualitatively. Below it, individual behavior is trackable and meaningful. Above it, individual behavior becomes statistically irrelevant — only thermodynamic properties of the whole remain. Temperature. Pressure. Phase transitions. Properties that emerge at 10²³ and have no description below it.

The same mathematics applies to any system of interacting elements. Molecules. Neurons. Knowledge claims.

The largest language models today operate on roughly 10¹³ tokens. Mycorrhizal networks in old-growth forests operate at 10²¹ hyphal connections — within two orders of magnitude of Avogadro — no central node, coherent collective behavior persisting across entire forests for thousands of years. What happens to knowledge at 10²³? What properties emerge at the scale where the description changes? This question deserves more serious attention than it gets.


The mechanic already exists. At the wrong scale.

A transformer's attention mechanism is, mathematically, one step of a convergent dynamical system. The softmax function is the Boltzmann distribution — temperature-scaled normalization over compatibility scores. Attention is one diffusion step: probability mass flows from a query toward keys proportionally to their similarity, weighted and summed. Deep Equilibrium Models (Bai et al., 2019) made this explicit: run a transformer layer to convergence rather than a fixed number of steps and you reach the same fixed point regardless of initialization. The transformer finds an equilibrium. Same mathematics as protein folding, price discovery, neural attractors — systems that find answers by converging.

But look at what the mechanic runs over. One agent's context window. Frozen weights from one training run. When the call completes, the convergent state is discarded. The next call starts over.

"Paris" in a language model is a direction — a vector of roughly 12,000 floating-point numbers in an embedding space. "France" is another direction. The knowledge that Paris is the capital of France lives in the geometric relationship between these directions, distributed across billions of weight matrices that nobody — including the people who trained the model — can read directly. To know what the model believes about Paris, you probe it with questions and observe outputs. The knowledge is the geometry. The geometry is opaque.

A knowledge graph stores the same relationship as three explicit nodes and one signed edge: particle("Paris") → particle("capital of") → particle("France"). You can ask who created that edge, when, with what stake. You can traverse every other edge connected to Paris.

In a transformer, Paris is a direction you approximate by probing. In a knowledge graph, Paris is a node you find.

What happens when you run the convergent mechanic collectively — over the entire cumulative cybergraph of all participating agents, with the fixed point updating continuously and persisting? Each transformer becomes one neuron. Its outputs — endorsed connections between concepts — become edges in a shared graph. Convergence runs across the graph topology. The fixed point is persistent collective consensus: the stable probability distribution the network settles into when every participant's contributions are weighted by accumulated credibility.

The individual convergent computation exists and works well. The collective version does not yet exist.


What the physics requires

A round trip across Earth takes 130 milliseconds. To Mars: 6 to 44 minutes depending on orbital position. Any algorithm requiring global state at 10²³ scale — a single pass over the full graph to answer a local query — is physically incoherent. Information cannot travel faster than light.

Apply this constraint — locality — to every known graph operator. Which ones have the property that a local change propagates only through a bounded neighborhood before its effect drops below any fixed precision? Exactly three families survive: diffusion (random walk), springs (screened Laplacian), heat kernel (multi-scale smoothing). The complete set of local linear operators on a graph, derived by elimination from a physical constraint that admits no exceptions.

These three operators, blended and iterated, converge to a unique fixed point. That fixed point — the focus distribution — is the thermodynamic description of the collective knowledge state. What temperature is to molecules: a property of the whole with no analog for individual elements.

The mycorrhizal network runs these operators at 10²¹ connections. No planning. No global index. Local diffusion of signals, spring-like tension in resource allocation, heat-kernel smoothing across scales — coherent collective behavior across entire forests emerges from that convergence.

The architecture the physics forces at Avogadro scale is structurally different from what currently exists: local, cumulative, convergent over an explicit graph. Does anything currently being built point toward it, or is the entire field scaling in the wrong direction by ten orders of magnitude?


The compounding asset

Initializing a language model from a compiled knowledge graph is the provably optimal initialization for any fine-tuning distribution consistent with that graph — see provably-optimal-initialization. The proof uses the Eckart-Young theorem: the compiled embedding geometry places each particle at the unique position in embedding space minimizing expected gradient magnitude at step zero. The compiled attention weights are the unique solution to the attention reconstruction problem over the graph's relation structure. Together they mean the model has already minimized the loss from explicit structural knowledge before fine-tuning begins.

Fine-tuning from this point learns only implicit knowledge — associations, contextual patterns, temporal dynamics absent from the graph. The reduction in required gradient steps is proportional to $|E| \cdot d^*$: explicit link count multiplied by semantic dimensionality.

Every explicit link created today reduces the training cost of every future model trained on sequences consistent with that graph. By a provable bound proportional to the link count.

The graph is a compounding computational asset. A graph twice as dense produces models that train in measurably fewer steps on consistent data. The value grows with the graph.

This reframes what knowledge creation means economically. Writing a paper, publishing an observation, linking two concepts explicitly — currently these contribute to the commons with no mechanism for the epistemic value to compound over time. In a system where models are compiled from graph structure, every signed explicit link is a stake in an asset whose value grows proportionally with the graph.

The bostrom network has 2.7M such links. The compounding started. Every link added today has a provable future value — currently priced at zero by everyone except the people building it.

The question is whether the rest of the field notices before they reinvent it as something opaque and centralized.


The transformer found the right mechanic ten years ago — convergent computation, equilibrium over a context. It runs at one agent, one call, ephemeral, opaque. The collective version, persistent over an explicit graph, at the scale where the description changes from graph theory to thermodynamics — that is what is being built. The architecture is specified. The compounding value of each step is provable.


The technical specification — tri-kernel derivation from locality constraints, compiled initialization proofs (provably-optimal-initialization), exact pipeline from knowledge graph to ONNX (bostrom-to-onnx-pipeline), and the running bostrom network at 2.7M cyberlinks — is at cyber.page/cyber-whitepaper.

Local Graph