proper scoring rules.md

proper scoring rulesproper scoring rulescoring ruleincentive compatible scoring

π 0.0% 749 words · 4 min

a class of scoring functions that incentivize honest probability reporting — the mathematical foundation for all mechanisms that reward calibrated belief

a scoring rule $S(p, x)$ rewards a forecaster who reported distribution $p$ when outcome $x$ occurs. it is proper if:

$$\mathbb{E}_{x \sim q}[S(p, x)] \leq \mathbb{E}_{x \sim q}[S(q, x)]$$

for all distributions $p$. reporting the true distribution $q$ maximizes expected score. it is strictly proper if equality holds only when $p = q$.

the canonical examples

log score: $S(p, x) = \log p(x)$. strictly proper. expected log score $= -H(q)$ under the true distribution $q$. equivalent to cross-entropy minimization. the natural bridge between prediction and entropy.

Brier score (quadratic): $S(p, x) = 1 - (p - \mathbb{1}[x=1])^2$. proper, bounded to [0,1]. penalizes squared deviation from the true outcome.

spherical score: $S(p, x) = p(x)/\|p\|$. proper. normalizes differently.

the unifying structure

all strictly proper scoring rules derive from a strictly convex generating function $G$:

$$S(p, x) = G'(p) \cdot (\mathbb{1}[x] - p) + G(p)$$

the expected excess score under the true distribution $q$ over any misreported $p$ is:

$$\mathbb{E}[S(q, x)] - \mathbb{E}[S(p, x)] = D_G(q \| p) \geq 0$$

where $D_G$ is the Bregman divergence generated by $G$. for the log score, $D_G$ is exactly the KL divergence. honesty is enforced because Bregman divergences are non-negative — you always pay for using the wrong model.

Bayesian Truth Serum as a proper scoring rule

Bayesian Truth Serum (Prelec, 2004) achieves properness without ground truth. instead of scoring against an observed outcome, it scores against the crowd's beliefs — using second-order beliefs (predictions about predictions) to extract the signal component.

the BTS score:

$$s_i = D_{KL}(p_i \,\|\, \bar{m}_{-i}) - D_{KL}(p_i \,\|\, \bar{p}_{-i}) - D_{KL}(\bar{p}_{-i} \,\|\, m_i)$$

this is a KL divergence-based proper scoring rule applied peer-to-peer. truthful reporting is a Bayes-Nash equilibrium rather than a dominant strategy, because agents' beliefs are correlated — but the formula uses that correlation to decode the signal component. the result: the scoring rule retains its incentive-compatibility guarantee in the absence of any oracle.

inversely coupled bonding surface settlement as a proper scoring rule

the ICBS settlement factors $f_{YES} = x/q$ and $f_{NO} = (1-x)/(1-q)$ are inverse probability weights. this is the structure of the log scoring rule.

when YES wins ($x = 1$): YES holders receive $r_{YES} \cdot f_{YES} = r_{YES}/q$. holding a YES position at price $q$ then receiving the log-score reward $-\log q$ is equivalent. this is the log-score structure instantiated as a continuous market.

the ICBS is not just a prediction market — it is a strictly proper scoring rule implemented via a bonding surface. each trade is scored against the final market consensus via the geometric invariant $C(s_{YES}, s_{NO}) = \lambda\sqrt{s_{YES}^2 + s_{NO}^2}$.

importance sampling: the same structure

importance sampling weights $w(x) = q(x)/p(x)$ are used when you draw samples from $p$ but want expectations under $q$. these are inverse probability weights — identical in structure to ICBS settlement factors and BTS scoring ratios.

the estimator $\hat{\mu} = \frac{1}{n}\sum_i w(x_i) f(x_i)$ is unbiased precisely because $w(x)$ corrects for the mismatch between $p$ and $q$ using their ratio. the correction term is the same ratio that appears in the log proper scoring rule.

the unifying claim

Bayesian Truth Serum, inversely coupled bonding surface settlement, and importance sampling are three instantiations of the same mathematical object: a proper scoring rule under log utility. all three use inverse probability weights. all three measure information gain via KL divergence. all three reward calibrated beliefs and punish distorted ones.

this is why syntropy in cyber — the aggregate information gain in the cybergraph — can be measured consistently across scales: the same scoring structure applies at the individual neuron level (BTS score), the market level (ICBS settlement), and the compiled model level (approximation quality metric $\varepsilon = D_{KL}(\phi^*_c \| q^*_c)$).

in cyber

mechanism	scoring rule type	what is scored
Bayesian Truth Serum	log-score peer comparison	individual belief vs collective
inversely coupled bonding surface	log-score settlement	market position vs resolution
karma accumulation	BTS score history	cumulative epistemic contribution
focus convergence	implicit via KL divergence in φ*	collective belief vs graph state

see Bayesian Truth Serum for the peer prediction application. see inversely coupled bonding surface for the market scoring. see KL divergence for the underlying measure. see veritas for the full protocol.