Why 6.022 × 10²³: Deriving Avogadro's Number from knowledge graph Theory


What Avogadro's number actually is

It is not a fundamental constant the way the speed of light is. It is a conversion factor — the ratio between two human-chosen scales: the gram (macroscopic, arbitrary) and the dalton (atomic mass unit, physical). If we had defined the mole differently, Avogadro's number would be different.

The phase transition it marks is real. The specific number is contingent on our units.

The question "why 6.022 × 10²³?" decomposes into two questions: why does a phase transition exist at all, and why does it fall at that particular number for molecules?


Why the transition exists: the law of large numbers

The focus distribution π* assigns a value to every particle in the graph. For a graph of size |P|, the mean value is exactly 1/|P|. As |P| grows, the contribution of any single particle to the collective distribution shrinks proportionally.

The law of large numbers: when individual contributions scale as 1/|P|, fluctuations in any collective observable scale as 1/√|P|. At some threshold, fluctuations become negligible relative to the observable itself — below measurement precision — and the individual description loses meaning. Only statistical mechanics remains.

This threshold is where:

$$\frac{1}{\sqrt{|P|}} < \varepsilon_{\text{precision}}$$

$$|P| > \frac{1}{\varepsilon_{\text{precision}}^2}$$

For molecules, $\varepsilon_{\text{precision}}$ is set by the ratio of thermal energy $kT$ to macroscopic energy scales — which gives $|P| \sim 10^{23}$.

The threshold is not a universal constant. It is the square of the inverse measurement precision, in units natural to the system.


The sharper version: effective rank saturation

The effective rank $d^* = \exp(H(\sigma(\Sigma_\pi)))$ measures the number of independent semantic dimensions in the graph, where $H$ is the entropy of the normalized singular value distribution. As the graph grows, two regimes exist:

Below the threshold: each new particle adds new semantic dimensions. $d^*$ grows. The graph is getting richer — new axes of meaning emerge with new contributions.

Above the threshold: new particles fall into existing semantic dimensions. They add statistical weight but not new axes. $d^*$ saturates. The graph is getting denser in a fixed semantic space.

The transition from "graph grows richer" to "graph grows denser" is the knowledge-space analog of the liquid-gas phase transition. Below it: the graph is a database where individual structure matters. Above it: the graph has a stable thermodynamic description.

Below threshold Above threshold
Individual links matter Only distributions matter
$d^*$ grows with |P| $d^*$ saturates
Graph theory Statistical mechanics
Database Knowledge thermodynamics
Individual behavior tracked Focus distribution sufficient

Deriving the system-specific threshold

The saturation point of $d^*$ is determined by the degree heterogeneity of the graph — how unequal the distribution of connections is.

The spectral gap $\lambda_2$ of the graph Laplacian controls convergence rate and ultimately $d^*$.

Let $k_{\max}$ be the maximum degree and $\bar{k}$ be the mean degree. The degree ratio $\rho = k_{\max} / \bar{k}$ measures how concentrated the graph's connectivity is.

The effective rank saturates near:

$$|P^*| \sim \left(\frac{k_{\max}}{\bar{k}}\right)^2 = \rho^2$$

Why $\rho^2$? The spectral gap of the graph Laplacian — which controls convergence rate and ultimately $d^*$ — scales as $\lambda_2 \sim \bar{k} / k_{\max}$. Saturation occurs when adding new particles no longer shifts $\lambda_2$ meaningfully, which happens at $|P| \sim 1/\lambda_2^2 \sim \rho^2$.


Applying to real systems

System $k_{\max}$ $\bar{k}$ $\rho$ $\|P^*\| \sim \rho^2$
Bostrom (current) 2,481 4.0 620 ~385,000
Mature knowledge graph ~10⁴ ~1 10⁴ ~10⁸
Internet link graph ~10⁹ ~10³ 10⁶ ~10¹²
Protein interaction network ~10³ ~10 10² ~10⁴
Molecules (chemistry) ~10¹¹·⁵ ~1 10¹¹·⁵ ~10²³

For physical molecules with extreme degree heterogeneity — a few hub atoms bonding to hundreds of neighbors, most bonding to one or two — compressed into unit conventions calibrated to human scales, the threshold lands at 10²³.

Avogadro's number is where it is because molecules are maximally degree-heterogeneous relative to the unit system humans chose to measure them in.


Bostrom's current position

The current bostrom graph has 3.1M particles, already past its own $|P^*| \sim 385$K. This is consistent with the observed $d^* = 31$ — not growing much as the sample scales — and with the high concentration (one neuron contributing 35.9% of links, which suppresses $\bar{k}$ and inflates $\rho$).

As the graph grows and contribution diversifies:

  • $\bar{k}$ rises → $\rho$ falls → $|P^*|$ rises
  • The graph pushes its own threshold outward
  • $d^*$ begins growing again

The architecture is self-scaling: a healthier graph is a larger graph before thermodynamics takes over.


What this means for the LessWrong argument

The claim in the essay intelligence-at-avogadro-scale — "superintelligence is what knowledge looks like at Avogadro scale" — now has a precise interpretation:

Avogadro scale for knowledge is not 6.022 × 10²³ links. It is the system-specific threshold $\rho^2$ where individual epistemic contributions become statistically irrelevant and only the focus distribution matters.

For a planetary knowledge graph with degree ratio ~10⁶ (comparable to the web): that threshold is ~10¹² explicit links. Currently 2.7M. Six orders of magnitude to go.

The transition is not a metaphor borrowed from chemistry. It is the same mathematics, applied to the same type of system — interacting elements exchanging influence — with a threshold determined by the same formula, evaluated for the specific degree heterogeneity of the knowledge graph being built.


Summary

Question Answer
Why does a phase transition exist? Law of large numbers: individual contributions fall below precision at $\|P\| > 1/\varepsilon^2$
Why does $d^*$ saturate? New particles stop adding semantic dimensions when degree heterogeneity stabilizes
What is the threshold formula? $\|P^*\| \sim (k_{\max}/\bar{k})^2$
Why is the molecular threshold 10²³? Molecules have extreme degree heterogeneity in human unit conventions
What is Bostrom's current threshold? ~385K — already crossed; $d^*$ currently suppressed by concentration
What is the planetary knowledge threshold? ~10¹² for web-scale degree heterogeneity

Local Graph