Roadmap
Trident exists to write CORE — Conserved Observable Reduction Equilibrium, a self-verifying substrate for planetary collective intelligence. 16 reduction patterns, field-first arithmetic, BBG state, focus dynamics — all written in Trident, all provable.
Kelvin versioning: versions count down toward 0K (frozen forever). Lower layers freeze first.
512K released 2026-02-26. Hot, not production ready.
Developer preview and request for comment.
cargo install trident-lang · GitHub
Three targets before 256k release:
- Self-hosting — compiler compiles itself in Trident
- Atlas — on-chain package registry live
- Revolution demos — small proven inference, FHE circuit compilation, quantum circuit simulation
Layer Current First Release
───────────────────────────────────────
CORE 256K 64K
vm spec 32K 16K
language 64K 32K
TIR 64K 64K
compiler 32K 32K
std.* 128K 64K
os.* 128K 64K
tooling 64K 32K
AI 256K 128K
Privacy 256K 128K
Quantum 256K 128K
256K — primitives land
- [ ] CORE 16 patterns implemented in Trident (reference evaluator)
- [x] AI Tensor operations (dot, matmul, relu, dense) — DONE: std.nn.tensor
- [x] AI Neural TIR-TASM optimizer (91K params, evolutionary training, speculative compilation)
- [x] Privacy Polynomial ops and NTT for FHE — DONE: std.private.poly
- [x] Quantum Quantum gate set (H, X, Y, Z, S, T, CNOT, CZ, SWAP) — DONE: std.quantum.gates
- [ ] tooling Amazing cli
- [ ] tooling Integration tests and formal verification
- [ ] tooling Beautiful website
- [ ] tooling Complete benchmark coverage
- [ ] CORE Hemera hash migration
128K — the machine assembles
CORE Poseidon + Merkle as CORE programs, BBG prototype
TIR Lowering works for stack, register, and tree targets
compiler ✓ All 6 stages + pipeline rewritten in .tri (9,195 LOC)
lexer (824) → parser (2,723) → typecheck (1,502) →
codegen (1,979) → optimize (733) → lower (1,121) →
pipeline (313)
std.* std.token, std.coin, std.card shipped
os.* os.neptune.* complete, Atlas on-chain registry live
AI Small model inference compiles to provable Trident
Privacy Trident programs compile to FHE circuits
Quantum Quantum circuit simulation backend
64K — proof of concept
CORE Transaction circuit, STARK verifier as CORE program
language Indexed assignment (arr[i] = val, s.field = val)
TIR ✓ TIR builder, optimizer, lowerer self-hosted in .tri
compiler ✓ All stages self-hosted — wire lower when core/warrior ready
std.* 23 std.skill.* shipped
os.* 3+ OS namespaces operational
tooling Web playground: compile .tri in browser
AI On-chain model registry — verified accuracy, no trust
Privacy Encrypted smart contracts — execute without revealing state
Quantum Hybrid programs: classical control + quantum subroutines
32K — first release
Compiler compiles itself. Atlas live. Revolution demos ship.
CORE Self-verifying: CORE proves its own execution
vm spec Intrinsic set stable: no new vm.* builtins
language Protocols: compile-time structural typing, grammar frozen
TIR TIROp set stable (5+ OS, 1 VM per type prove op set complete)
compiler ✓ Pipeline in Trident — needs lower wiring + self-compilation
std.* #[requires]/#[ensures] contracts on all public functions
os.* Per-OS namespace governance established
AI Proven training: gradient computation inside proof
Privacy FHE + ZK: prove correctness of encrypted computation
Quantum Quantum error correction in std.quantum
16K — the industries fall
CORE Recursive composition — proofs verify proofs
vm spec Triton backend emission proven correct
language Type system finalized — no new type rules
TIR Per-function benchmarks < 1.2x, optimization passes land
compiler Each compilation produces a proof certificate (self-proving)
std.* std.crypto.* formally verified (poseidon, merkle, ecdsa)
os.* os.neptune.* frozen
tooling GPU proving, ZK coprocessor integrations
AI GPT-class proven inference (billion+ parameters)
Privacy Multi-party FHE: N parties compute, none sees others' data
Quantum Real hardware backends (IBM, Google, IonQ)
8K — proven everything
CORE Focus dynamics live — collective intelligence emerges
vm spec 3+ backends passing conformance suite
language Every language feature has a formal soundness proof
TIR Stack effect contracts proven for all ops
compiler Incremental proving (per-module proofs, composed)
std.* All modules verified — every public function proven
os.* All active OS namespaces frozen
tooling FPGA proving backend
AI Federated learning with proven aggregation
Privacy Practical performance: <10x overhead vs plaintext
Quantum Quantum advantage: problems classical can't touch
4K — hardware era
CORE BBG formally verified, all state transitions proven
vm spec TerrainConfig / StackBackend / CostModel traits frozen
language Protocol system proven sound (composability without dispatch)
TIR Every lowering path formally verified
compiler Proof verified on-chain, src/ deleted
std.* Public APIs frozen, no new exports
os.* Cross-OS portability proven (same .tri runs on any OS)
tooling Tool chain self-hosts (trident builds trident tooling)
AI Autonomous agents that prove every decision they make
Privacy Hardware-accelerated FHE (FPGA/ASIC)
Quantum Post-quantum crypto native (lattice-based std.crypto)
2K — last mile
CORE 16 patterns proven correct, conservation laws verified
vm spec Every intrinsic has a formal cost proof
language Formal semantics published
TIR TIR-to-target roundtrip proven equivalent
compiler Compiler proves its own correctness
std.* Cross-module composition proofs complete
os.* Every OS binding formally verified
tooling ASIC proving backend
AI Any model, any size — proving scales linearly
Privacy Any Trident program runs encrypted by default
Quantum Quantum-classical proofs: STARK verifies quantum computation
0K
CORE sealed The substrate verifies itself.
vm spec sealed Intelligence without trust.
language sealed Privacy without permission.
TIR sealed Computation without limits.
compiler sealed
std.* sealed Write once, prove anywhere.
os.* sealed
tooling sealed
Techniques: Everything That Makes a Proof-Native Language Unprecedented
Scope: trident → TIR → TASM → Triton VM → STARK. The language, the compiler, the prover. Each section identifies techniques no existing programming language implements, because no existing programming language operates over a prime field with an algebraic proof system as its native execution target.
Part I: Algebraic Compilation — The Unbounded Opportunity
No general-purpose compilers has ever exploited the algebraic structure of its execution domain, because general-purpose execution domains (x86, ARM, WASM) have no algebra worth exploiting. Trident operates over the Goldilocks field $p = 2^{64} - 2^{32} + 1$. This field has extraordinary mathematical properties that a sufficiently smart compiler can weaponize.
1.1 Goldilocks Field Properties the Compiler Can Exploit
The Goldilocks prime has structural field properties no other execution substrate offers:
The Golden Ratio Identity: $\varphi = 2^{32}$ satisfies $\varphi^2 = \varphi - 1 \pmod{p}$, which means $2^{64} \equiv 2^{32} - 1 \pmod{p}$. Any 128-bit product reduces to 64 bits with one shift and one subtraction. The compiler can replace modular reduction with bit manipulation.
Roots of Unity Hierarchy (number theory): $2^{32}$ is a 6th root of unity. $2$ is a 192nd root of unity. $2^{24}$ is an 8th root of unity. Multiplication by any of these is a bit shift — zero multiplications, zero Processor table rows for the multiply.
Root of Unity Ladder:
ω₆ = 2³² (multiplication = 32-bit shift)
ω₁₂ = 2¹⁶ (multiplication = 16-bit shift)
ω₁₉₂ = 2 (multiplication = 1-bit shift)
ω₃₈₄ = √2 = 2²⁴ - 2⁷² (two shifts + subtraction)
Algebraic Square Roots: $\sqrt{2} = 2^{24} - 2^{72} \pmod{p}$ and $\sqrt{3} = 2^{16} - 2^{80} \pmod{p}$. These are multiplication-free: only shifts and subtracts. The compiler can replace sqrt(2) * x with (x << 24) - (x << 72) modulo $p$.
Smallest Generator: $g = 7$ generates the entire multiplicative group. Every non-zero element is $7^k$ for some $k$. The compiler can use discrete log tables for small constants.
Large 2-Adic Subgroup: $p - 1 = 2^{32} \times (2^{32} - 1)$. fourier transform (NTTs) up to size $2^{32}$ are supported natively. The compiler can replace convolutions with NTTs for any power-of-2 length up to 4 billion.
Quadratic Extension: $\mathbb{F}_{p^2}$ via irreducible $x^2 - 7$. Complex-like arithmetic with elements $(a_0, a_1)$ where multiplication uses only 3 base-field multiplies (linear algebra / Karatsuba).
1.2 Algebraic Simplification Passes
These are compiler optimization passes that exploit field-theoretic identities. No existing compiler has them because no existing compiler targets a prime field.
Pass 1: Fermat Reduction (number theory)
Identity: $a^{p-1} \equiv 1$ for $a \neq 0$.
Consequence: Any exponent $k$ can be reduced modulo $p - 1$. The compiler detects pow(x, k) and replaces $k$ with $k \bmod (p-1)$.
Example: pow(x, p) → x (Frobenius is identity in the base field). pow(x, 2*p - 1) → pow(x, p) → x * x (one multiply).
Deep case: pow(x, k) where $k > p-1$ is common in iterative algorithms. The naive loop runs $k$ times. After Fermat reduction it runs $k \bmod (p-1)$ times. For astronomically large $k$ (e.g., from recursive formulas), this can reduce millions of iterations to a handful.
Pass 2: Inversion via Exponentiation
Identity: $a^{-1} \equiv a^{p-2} \pmod{p}$.
Consequence: The compiler can replace invert(x) with pow(x, p-2). But more importantly, it can choose which representation is cheaper based on the AET profile. Explicit inversion uses the invert instruction (Processor table). Exponentiation uses repeated squaring (also Processor table, but different instruction mix). The compiler picks the variant that balances tables better.
Optimization: For $p - 2 = 2^{64} - 2^{32} - 1$, the addition chain for exponentiation has only ~95 multiplications (using the special structure of $p-2$). The compiler can hardcode this optimal chain rather than generic binary exponentiation.
Pass 3: Strength Reduction via Root-of-Unity Shifts
Identity: Multiplication by $2^k$ (for small $k$) is a shift, not a multiply.
Consequence: The compiler detects x * C where $C$ is a power of 2 and replaces it with a shift. But it goes further — for constants that are sums of few powers of 2, it decomposes: x * 5 → (x << 2) + x (one shift + one add, saving a multiply instruction).
Goldilocks-specific: Because $2^{64} \equiv 2^{32} - 1 \pmod{p}$, the reduction after a shift of ≥64 bits is itself just a shift and subtract. So even "large" shifts are cheap.
Hamming weight optimization: For constant $C$ with Hamming weight $w$, multiplication costs $w - 1$ additions and some shifts. If $w < 6$, this is cheaper than a general multiply. The compiler computes the Hamming weight of every constant and chooses the cheaper path.
Pass 4: Batch Inversion (Montgomery's Trick) (optimization)
Identity: Given $k$ elements to invert, compute all inverses using 1 inversion + $3(k-1)$ multiplications instead of $k$ inversions.
Algorithm:
prefix[0] = a[0]
for i in 1..k: prefix[i] = prefix[i-1] * a[i]
inv_all = invert(prefix[k-1]) // SINGLE inversion
for i in (k-1)..1:
result[i] = prefix[i-1] * inv_all
inv_all = inv_all * a[i]
result[0] = inv_all
Compiler optimization: The compiler detects multiple invert() calls in the same scope and automatically batches them using Montgomery's trick. This is invisible to the programmer — they write let y = invert(x) multiple times, and the compiler rewrites into batch form.
AET impact: Inversions are expensive (many Processor rows for the addition chain). Batch inversion replaces $k$ expensive operations with 1 expensive + $3(k-1)$ cheap. For $k = 10$, this is ~10× cheaper.
Pass 5: Quadratic Residue Short-Circuit
Identity: $a^{(p-1)/2} = 1$ iff $a$ is a quadratic residue (has a square root).
Consequence: When the compiler sees sqrt(x), it can first compute pow(x, (p-1)/2) to check if the root exists. If the value is statically known (constant propagation), the compiler resolves at compile time whether sqrt will succeed — no runtime check needed.
For conditionals: Code like if has_sqrt(x) { ... } can be partially evaluated when x is a known constant or derived from known values.
Pass 6: Polynomial Identity Collapse (complexity theory)
Schwartz-Zippel: Two polynomials of degree $d$ over a field of size $p$ agree on a random point with probability $\leq d/p$. For Goldilocks, $d/p < 2^{-32}$ for any polynomial of degree $< 2^{32}$.
Compiler use: When the optimizer needs to check if two expressions compute the same value for all inputs, it evaluates both at a random point. If they agree, they're equal with probability $> 1 - 2^{-32}$. This gives probabilistic equivalence checking that's vastly cheaper than symbolic proof.
Application in optimization: The compiler can speculatively simplify complex field expressions, verify the simplification probabilistically via Schwartz-Zippel, and only fall back to formal verification if the check fails.
Pass 7: NTT Auto-Vectorization
Identity: Convolution of two sequences in the field can be computed via NTT (fourier transform) in $O(n \log n)$ instead of $O(n^2)$.
Compiler detection: The compiler identifies nested loops of the form for i: for j: result[i+j] += a[i] * b[j] and replaces with NTT-based convolution. This is the field-arithmetic analogue of auto-vectorization in CPU compilers.
Goldilocks advantage: NTT roots of unity are powers of 2 (bit shifts), making each butterfly operation shift + add instead of multiply + add. The NTT is unusually cheap in Goldilocks.
AET impact: NTT replaces $n^2$ multiplications with $n \log n$ multiplications where each "multiplication" by a root of unity is actually a shift. For $n = 256$, this is $65536 \to 2048$ multiplies — 32× reduction in Processor table height.
Pass 8: Multi-Exponentiation Fusion
Identity (Shamir's trick): Computing $a^x \cdot b^y$ is cheaper than computing $a^x$ and $b^y$ separately. Instead of two addition chains (2 × ~64 squarings + ~32 multiplies each), use one joint chain (~64 squarings + ~48 multiplies total).
Compiler detection: Multiple pow() calls whose results are multiplied together. The compiler fuses them into a single multi-exponentiation.
Extension (Pippenger): For $k$ simultaneous exponentiations $\prod g_i^{e_i}$, Pippenger's algorithm reduces cost from $O(k \cdot \log p)$ to $O(k \cdot \log p / \log k)$. The compiler applies Pippenger automatically when $k > 4$.
Pass 9: Vanishing Polynomial Optimization
Identity: For a known evaluation domain $D = \{d_1, \ldots, d_n\}$, the vanishing polynomial $Z_D(x) = \prod (x - d_i)$ has special structure. When $D$ is a coset of a multiplicative subgroup (common in STARKs), $Z_D$ can be computed in $O(\log n)$ instead of $O(n)$.
Compiler use: When the program evaluates a polynomial over a structured domain, the compiler detects the structure and uses the fast vanishing polynomial. This is critical for WHIR-related computations in the STARK prover itself.
Pass 10: Lagrange Basis Caching
Identity: Lagrange basis polynomials over roots-of-unity domains can be precomputed and reused. For domain $\{\omega^0, \omega^1, \ldots, \omega^{n-1}\}$, the basis polynomials have closed-form expressions involving the vanishing polynomial.
Compiler optimization: The compiler detects polynomial interpolation over power-of-2 domains and replaces the naive $O(n^2)$ Lagrange interpolation with NTT-based $O(n \log n)$ interpolation, caching basis polynomials when the domain is reused.
Pass 11: Extension Field Strength Reduction
Identity: In $\mathbb{F}_{p^2}$ (quadratic extension), multiplication requires 3 base-field multiplies (Karatsuba). But multiplication by a base-field element requires only 2 multiplies (no cross term).
Compiler detection: When one operand of an extension field multiplication is known to be in the base field, the compiler generates the cheaper 2-multiply variant. More generally, multiplication by $(a_0, 0)$ is just $(a_0 \cdot b_0, a_0 \cdot b_1)$.
Squaring optimization: Squaring in $\mathbb{F}_{p^2}$ requires only 2 base-field multiplies (instead of 3 for general multiply), using $(a + b)(a - b) = a^2 - b^2$. The compiler detects x * x in extension field code and generates the squaring-specific circuit.
Pass 12: Constant Expression Evaluation
Standard: Evaluate constant expressions at compile time.
Field-specific twist: This includes field operations — 3 * 5 → 15, invert(7) → the actual field element $7^{-1} \bmod p$, pow(2, 32) → the actual value. No general compiler does constant folding over finite field arithmetic because no general compiler operates over finite fields.
Recursive: Constant propagation through function calls. If a function is called with all-constant arguments, the compiler evaluates it entirely at compile time and replaces the call with the result (a single field element).
AET impact: Every constant-folded operation is one fewer Processor table row. For programs with many known constants (common in cryptographic circuits), this can eliminate 30-50% of the trace.
Pass 13: Addition Chain Optimization for Known Exponents
Problem: Computing $x^k$ for known constant $k$ via repeated squaring uses $\lfloor \log_2 k \rfloor$ squarings + $\text{popcount}(k) - 1$ multiplications. But optimal addition chains can do better.
Compiler optimization: For each constant exponent encountered, the compiler searches for the shortest addition chain. For small exponents ($k < 2^{15}$), optimal chains are known and can be table-looked-up. For larger exponents, heuristic search (binary method, windowed method, or Brauer chains) finds near-optimal chains.
Goldilocks-specific: The exponent $p - 2$ (used for inversion) has a known optimal chain exploiting the structure $2^{64} - 2^{32} - 1$. The compiler hardcodes this chain rather than using generic binary exponentiation — saving ~15 multiplications per inversion.
Pass 14: Dead Field Operation Elimination
Standard dead code elimination, but field-aware: A computed value that's never used is dead. But in field arithmetic, a value might be used only for its side effects on the STARK trace. The compiler must distinguish:
- Truly dead (result unused, no proof dependency) → eliminate
- Proof-relevant (result unused in program logic but needed for STARK constraints) → keep
- Partially dead (some components used, others not) → partial elimination
Pass 15: Algebraic Common Subexpression Elimination (CSE)
Standard CSE, but over the field: a * b + c * d and c * d + a * b are the same (commutativity of addition). The compiler normalizes field expressions into canonical form (e.g., sorted by operand hash) before CSE, catching equivalences that syntactic CSE misses.
Deeper: $a * (b + c)$ and $a * b + a * c$ are the same (distributivity). The compiler can factor or distribute based on which form produces fewer operations. Factoring reduces multiplies but increases dependency depth; distributing increases multiplies but enables parallelism. The compiler chooses based on AET impact.
1.3 Supercompilation for Proof Machines
Supercompilation — Turchin's technique of driving, folding, and generalizing program states — has never been applied to algebraic virtual machines. The combination is uniquely powerful.
What Supercompilation Does for Trident
Driving: The supercompiler symbolically executes the program, propagating known values and constraints. For Trident, this means propagating field elements and field identities through the code. A loop that iterates over a known-size array with known values can be fully unrolled and every intermediate field expression constant-folded.
Folding: When the supercompiler encounters a state it's seen before (up to generalization), it creates a recursive call instead of continuing to unfold. For Trident, this detects when iterative field computations have reached a fixed point — the compiler can replace an iterative convergence loop with a closed-form expression.
Key win — Loop-to-closed-form: Consider a loop computing $x_{n+1} = a \cdot x_n + b$ for $n$ iterations with known $a, b$. The supercompiler recognizes this as a linear recurrence and replaces it with the closed form $x_n = a^n \cdot x_0 + b \cdot (a^n - 1) / (a - 1)$. This collapses $n$ Processor table rows into ~$\log n$ (for the exponentiation).
For Goldilocks specifically: Since $2$ is a 192nd root of unity and roots of unity have special multiplicative structure, recurrences involving powers of 2 often have extremely compact closed forms that the supercompiler can discover.
Supercompilation + Algebraic Passes = Multiplicative Gains
The algebraic passes (§1.2) perform local optimizations. Supercompilation performs global optimizations. Combined:
- Supercompiler unfolds a loop and discovers a recurrence
- Algebraic pass replaces the recurrence with a closed-form field expression
- Constant folder evaluates the closed-form if arguments are known
- Dead code eliminator removes the now-unused loop variables
Each pass multiplies the benefit of the previous. A loop of 1000 iterations might collapse to 10 multiplications → 7 multiplications (addition chain optimization) → 4 multiplications (multi-exponentiation fusion) → 2 shift operations (if the base is a power of 2).
Partial Evaluation as First-Class Operation
fn generic_hash<const ROUNDS: Field>(input: Field) -> Field {
let mut state = input;
for _ in 0..ROUNDS {
state = state * state + ROUND_CONSTANT;
}
state
}
// Compile-time specialization:
let hash_5 = specialize(generic_hash, ROUNDS = 5);
// hash_5 is fully unrolled, constant-folded, algebraically optimized
// Its TASM is a straight-line sequence of ~10 instructions
The specialize keyword triggers supercompilation at compile time. The result is a new function with all constants inlined and all algebraic identities applied. No runtime overhead. The specialized function's STARK proof is minimal.
Part II: Type System Innovations
2.1 Proof-Cost Types
fn transfer(a: Account, b: Account, amount: Field) -> Result<(), Error>
cost [processor: 800..1200, hash: 50..100, ram: 200..400]
{
// implementation
}
The type theory system tracks which AET tables a function touches and how many rows it adds to each. The compiler statically verifies that the implementation's cost falls within the declared bounds. If it exceeds, compilation fails with a cost-violation error.
Implications:
- Developers see proof cost at the type level, before running anything
- API contracts include cost guarantees — a library function promises its proof cost
- Cost bounds compose: calling
fthenghas costcost(f) + cost(g)per table - The compiler rejects optimizations that violate declared cost bounds (even if they improve total cost, they might shift cost between tables in ways that break composition)
2.2 Table-Aware Types
type HashFree<T> = T where tables_touched(T) ∩ {Hash, Cascade, Lookup} = ∅
type ArithOnly<T> = T where tables_touched(T) ⊆ {Processor, OpStack}
Functions annotated with table constraints can be scheduled independently. Two functions touching disjoint table sets can be interleaved without interference — enabling parallel proving of independent program segments.
2.3 Linear Types for Cryptographic Values
type Nonce = Linear<Field>; // must be consumed exactly once
type Witness = Affine<Field>; // must be consumed at most once
fn use_nonce(n: Nonce) -> Commitment {
// After this call, `n` is consumed. Cannot be reused.
commit(n)
}
// Compile error: nonce used twice
let n = fresh_nonce();
let c1 = use_nonce(n);
let c2 = use_nonce(n); // ERROR: `n` already consumed
The type system prevents cryptographic misuse — double-spending nonces, reusing randomness, leaking secret witnesses — at compile time. Rust's borrow checker for cryptographic hygiene.
2.4 Refinement Types over Field Arithmetic
type Positive = { x: Field | x > 0 && x < p/2 };
type Probability = { x: Field | x >= 0 && x <= SCALE_FACTOR };
type NonZero = { x: Field | x != 0 };
fn safe_divide(a: Field, b: NonZero) -> Field {
a * invert(b) // guaranteed safe — b cannot be zero
}
Refinements compile to STARK constraints. The proof of execution automatically includes the proof that all refinement predicates were satisfied. No separate verification step.
2.5 Dependent Types over Field Values
type Vector<const N: Field> = [Field; N];
type Matrix<const R: Field, const C: Field> = [Vector<C>; R];
fn matmul<const M: Field, const N: Field, const P: Field>(
a: Matrix<M, N>,
b: Matrix<N, P>
) -> Matrix<M, P> {
// N must match — checked at compile time
// All dimensions are field elements — native to the proof system
}
Dimension mismatches are compile-time errors. No runtime bounds checking needed, no wasted Processor rows on bounds checks.
Part III: Built-In Formal Verification
3.1 Specifications as Trident Code
fn sqrt_approx(x: Field) -> Field
requires x < p/2
ensures |result * result - x| < EPSILON
{
// Padé approximant implementation
let y = x * INITIAL_GUESS;
// Newton-Raphson iterations
for _ in 0..3 {
y = (y + x * invert(y)) * INV_2;
}
y
}
requires and ensures clauses compile to additional zero knowledge proofs (STARK constraints). The program's execution proof IS the formal verification compliance proof. One proof, two purposes. No separate verification tool.
3.2 Invariant-Carrying Loops
fn sum_array(arr: [Field; N]) -> Field
invariant acc == sum(arr[0..i]) // maintained every iteration
{
let mut acc: Field = 0;
for i in 0..N {
acc = acc + arr[i];
}
acc
}
Loop invariants become inductive STARK constraints. The prover checks the invariant at every iteration as part of the execution trace. If the invariant ever fails, the STARK proof is invalid.
3.3 Termination Proofs as Compilation Artifacts
Trident already requires bounded loops. The compiler goes further: it generates a proof of termination for every program. Not just "this program halts" but "this program halts in exactly $N$ steps for input $x$."
The termination proof is embedded in the STARK proof: the Processor table has exactly $N$ rows, and the proof commits to this fact. Deterministic cost is not just a feature — it's a proven property.
Part IV: Cryptographic Primitives as Language Features
4.1 zero knowledge proofs as a Type Modifier
zk fn secret_transfer(
amount: Private<Field>,
sender_balance: Private<Field>,
receiver_balance: Private<Field>
) -> Public<(Commitment, Commitment)> {
assert!(sender_balance >= amount);
let new_sender = sender_balance - amount;
let new_receiver = receiver_balance + amount;
(commit(new_sender), commit(new_receiver))
}
Private<T> values are witness inputs — never revealed in the proof. Public<T> values are public outputs — included in the proof. The compiler automatically generates the witness/public-input split. The developer never manually constructs circuits.
4.2 Commitment Schemes as Syntax
let c = commit(value); // Tip5 hash under the hood
let (v, proof) = reveal(c); // Opening proof
assert!(verify(c, v, proof)); // Verification
// Batch optimization — compiler merges into one sponge:
let (c1, c2, c3) = commit_batch(v1, v2, v3);
Not library calls — language primitives. The compiler can optimize across commitment boundaries (e.g., batching multiple commitments into one Tip5 hash sponge absorption, reducing Hash table rows).
4.3 merklezation Proofs as Iterators
for (leaf, auth_path) in merkle_tree.verified_walk(root) {
// Inside this loop:
// - `leaf` is STARK-proven to be in the tree with the given root
// - `auth_path` is the authentication path
// - The merkle_step instructions are generated automatically
process(leaf);
}
The compiler generates merkle_step TASM instructions and manages the authentication path. The developer iterates; the compiler proves.
Part V: Self-Hosting and Bootstrapping
5.1 Trident Compiler Written in Trident
The endgame: trident.trd — a single Trident source file that, when compiled and executed on Triton VM, takes a Trident source as input and produces TASM as output. The execution produces a STARK proof.
Implications:
- The compiler's correctness is not argued — it's proven via verification, every time it runs
- Any compiler bug produces an invalid proof (the STARK catches it)
- You don't trust the developer (verify the proof of execution)
- You don't trust the compiler (verify the proof of compilation)
- You don't trust the optimizer (verify the proof of optimization)
- Mathematics, all the way down
Bootstrapping sequence:
- Write initial Trident compiler in Rust (trusted, hand-audited)
- Write Trident compiler in Trident
- Compile (2) using (1) → produces TASM + STARK proof
- Run the compiled Trident compiler to compile itself → produces new TASM + new STARK proof
- Verify that outputs of (3) and (4) are identical (fixed point)
- If fixed point reached: the compiler is self-consistent. Trust only the STARK verifier.
5.2 Self-Verifying Compiler Optimization
From the Self-Optimizing Compilation paper: the neural optimizer, the verifier, and the training loop are all Trident programs compiled by the system they improve. This creates a convergent fixed point where the compiler can no longer improve its own compilation.
The compiler optimizes its own code. The optimizer is itself compiled code. The optimization of the optimizer's compilation is verified by a STARK proof generated by the same system. Turtles all the way down — but with proofs at every level.
Part VI: Novel Execution Models
6.1 Lazy Proving
defer_proof {
let x = expensive_computation_1();
let y = expensive_computation_2();
let z = x + y;
}
// One STARK proof for the entire block, not three separate proofs
// Amortizes fixed proving overhead (WHIR commitment, grinding)
The developer controls proof granularity. Fine-grained proofs for high-security operations, batched proofs for throughput-critical paths.
6.2 Incremental Proving
let proof_v1 = prove(program_v1, input);
// Program changes slightly (one branch modified)
let proof_v2 = prove_delta(proof_v1, diff(program_v1, program_v2), input);
// Only re-proves affected AET rows, reuses unaffected WHIR layers
When a program changes slightly, the STARK proof can be incrementally updated. Only the affected table rows are recomputed; unaffected WHIR layers are reused. The developer controls this via prove_delta.
6.3 Speculative Execution with Proof Rollback
speculate {
// Aggressively optimized path (may be incorrect for edge cases)
let result = fast_but_risky_algorithm(input);
} fallback {
// Conservative path (always correct)
let result = slow_but_safe_algorithm(input);
}
// Runtime: try speculative path, generate proof, verify
// If proof fails: rollback, execute fallback, prove that
Enables aggressive compiler optimizations that are usually correct. The proof system catches the rare failures. No correctness sacrifice.
Part VII: Interoperability
7.1 proof-carrying data
Trident programs distributed as (TASM + STARK_proof) bundles. The recipient doesn't trust the sender — they verify the proof. Like signed code, but with mathematical guarantees instead of identity-based trust.
my_library.trd → compile → my_library.tasm + my_library.stark_proof
↓
distribute
↓
recipient verifies proof
(no re-execution needed)
7.2 Cross-VM Proof Composition
A Trident program proven on Triton VM produces a proof. That proof can be verified inside a Miden VM program (or SP1, or OpenVM). The Miden program produces its own proof. Recursive proof composition across heterogeneous VMs.
// On Triton VM:
let result = compute_something(input);
let proof = current_proof(); // intrinsic: the proof of this execution
// On Miden VM:
let verified = verify_triton_proof(proof, expected_output);
assert!(verified);
// This Miden execution + its proof now transitively proves the Triton computation
7.3 Foreign Function Proofs
extern verified fn external_hash(input: Field) -> Field
with proof: StarkProof;
// Calling this function:
// 1. Executes the foreign function (Rust, C, whatever)
// 2. Receives the result + a STARK proof
// 3. Verifies the proof inside the Trident execution
// 4. The Trident proof transitively covers the foreign call
Part VIII: Developer Experience Innovations
8.1 Proof-Cost Highlighting in IDE
Every line of Trident code is colored by its contribution to STARK proving cost:
fn example(x: Field) -> Field {
let a = x * x; // 🟢 1 Processor row
let b = hash(a); // 🔴 ~200 Hash rows (bottleneck!)
let c = a + b; // 🟢 1 Processor row
let d = invert(c); // 🟡 ~95 Processor rows
d
}
// Total cost: dominated by hash() — compiler suggests batching
The IDE shows which table is the bottleneck, which lines contribute most, and which optimizations would help.
8.2 Automatic Proof-Cost CI/CD
# trident-ci.yml
proof_cost_limits:
transfer_circuit:
processor: 2048
hash: 512
ram: 1024
max_total: 4096
# CI rejects PRs that exceed these limits
# Metric is proof cost, not execution time
8.3 Interactive Proof Explorer
A developer tool that visualizes the STARK proof:
- See which AET tables are populated and how full each is
- Click on a Processor table row → see the source line that generated it
- Highlight "hot zones" where table height approaches a power-of-2 cliff
- Simulate the effect of code changes on the proof profile before compiling
8.4 Trident REPL with Proof Feedback
trident> let x = 42;
x = 42 [cost: 1 row, tables: Processor]
trident> let y = hash(x);
y = 0x3f2a... [cost: 198 rows, tables: Processor+Hash]
trident> let z = invert(y);
z = 0x7b1c... [cost: 97 rows, tables: Processor]
trident> :proof_summary
Total trace: 296 rows, pads to 512
Bottleneck: Hash (198/296 = 67%)
Suggested: batch hash operations to amortize
Every expression in the REPL shows its proof cost immediately. The developer builds intuition for cost by experimenting interactively.
Neural Network Techniques
The Core Thesis
Trident compiles to arithmetic over the Goldilocks field ($p = 2^{64} - 2^{32} + 1$). This field is not a generic algebraic structure — it has deep internal symmetries: a multiplicative group of order $2^{32}(2^{32} - 1)$, primitive $2^{32}$th roots of unity, subgroups with cheap inversion, Frobenius-like shortcuts in extension field embeddings, and an unbounded hierarchy of polynomial identities.
Every one of these symmetries is a potential compilers optimization. Human algebraists find them one at a time, by studying theory and having insights. A neural network can search the space systematically, discovering identities no human would find, at a rate no human can match.
This changes the nature of compilation. A traditional compiler has a fixed set of optimization passes written by engineers. A neural-augmented Trident compiler has a growing set of algebra identities discovered by exploration. The compiler improves forever — every new identity reduces proving cost for every program that matches the pattern, past and future.
There is no bottom. Every branch of algebra offers new passes. The deeper the network explores, the more it finds. And every discovery compounds.
Priority 0 — The Algebraic Identity Explorer
0.1 The Unbounded Optimization Surface
The field has layers of algebraic structure, each offering optimization opportunities:
Layer 0 — Arithmetic identities: $x + 0 = x$, $x \cdot 1 = x$, $x \cdot 0 = 0$. Trivial. Any compiler finds these.
Layer 1 — Goldilocks-specific constants: $2^{32}$ has special properties because $2^{32} \equiv 2^{64} - p + 1$ in this field. Multiplication by $2^{32}$ reduces to a shift plus a small correction. Multiplication by $(p-1)/2$ is equivalent to conditional negation. These are specific to $p = 2^{64} - 2^{32} + 1$ and do not exist in any other field.
Layer 2 — Subgroup structure: The multiplicative group has a subgroup of order $2^{32}$. Elements in this subgroup have cheap inversions — Fermat's little theorem (number theory) with exponent $2^{32} - 2$ instead of $p - 2$. Exponentiation chains for these elements are dramatically shorter.
Layer 3 — Roots of unity: The field has primitive $2^{32}$th roots of unity. Polynomial evaluations at these roots decompose into butterfly networks (fourier transform structure). A sequence of 8 multiplications might reduce to 3 if the constants are roots of unity. This applies to every program that touches polynomial arithmetic — which includes the STARK prover itself.
Layer 4 — Extension field shortcuts: Triton VM's hash function Tip5 operates on extension field elements embedded in the base field. Algebraic relationships in the extension (Frobenius endomorphism, norm maps, trace maps) create shortcuts invisible at the base field level. The NN discovers that certain hash-related instruction sequences have cheaper equivalents without understanding the theory behind them.
Layer 5 — Algebraic geometry: Programs computing polynomial evaluations, interpolations, and multi-point evaluations have structure related to the field's geometry. Evaluation at structured point sets shares intermediate values. Interpolation through subgroup cosets has closed-form shortcuts. This layer is essentially unbounded — every new polynomial identity is a new compiler pass.
Layer N — Unknown: The NN explores beyond named mathematical territory. Identities that emerge from the interaction of Triton VM's specific instruction set with the Goldilocks field's specific structure. No textbook covers this intersection. The NN maps it empirically.
0.2 Discovery Architecture
┌────────────────────────────────────────────────────────────────┐
│ ALGEBRAIC IDENTITY EXPLORER │
├────────────────────────────────────────────────────────────────┤
│ │
│ PROPOSER (GFlowNet) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Input: │ │
│ │ - Known identity database (patterns + embeddings) │ │
│ │ - Instruction vocabulary (~44 TASM ops) │ │
│ │ - Frequency data from real program corpus │ │
│ │ │ │
│ │ Output: │ │
│ │ - Candidate pair (sequence_A, sequence_B) │ │
│ │ - Each sequence: 2-12 TASM instructions + operands │ │
│ │ │ │
│ │ Reward: identity_found × usefulness_score │ │
│ │ Diversity: GFlowNet samples proportional to reward │ │
│ │ Exploration bonus: novel instruction combinations │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ VALIDATOR (brute-force field execution) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Stage 1: Execute both sequences on 10,000 random inputs │ │
│ │ If ANY output differs → reject (not identity) │ │
│ │ │ │
│ │ Stage 2: Execute on 10,000,000 inputs (high confidence) │ │
│ │ Probability of false positive: < 10^{-7} │ │
│ │ │ │
│ │ Stage 3: Symbolic execution → algebraic proof │ │
│ │ Express both sequences as polynomial maps │ │
│ │ Verify polynomial equality via Schwartz-Zippel │ │
│ │ Or: exhaustive verification for small domains │ │
│ │ │ │
│ │ Stage 4 (optional): STARK proof of equivalence │ │
│ │ The identity itself becomes a proven theorem │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ USEFULNESS SCORER │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Scan corpus of existing Trident programs │ │
│ │ Count: how often does sequence_A appear? (frequency) │ │
│ │ Measure: cost(sequence_A) - cost(sequence_B) (savings) │ │
│ │ Check: which AET tables benefit? (table_impact) │ │
│ │ │ │
│ │ score = frequency × savings × table_criticality │ │
│ │ │ │
│ │ table_criticality = extra weight if the savings hit │ │
│ │ the table that is currently the bottleneck (tallest) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ RULE DATABASE │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Each rule: │ │
│ │ pattern: TASM instruction sequence (LHS) │ │
│ │ replacement: TASM instruction sequence (RHS) │ │
│ │ cost_savings: measured AET reduction │ │
│ │ confidence: validation stage reached (1-4) │ │
│ │ frequency: occurrences in corpus │ │
│ │ layer: algebraic depth (0-5+) │ │
│ │ discovery_date: when found │ │
│ │ composable_with: list of non-conflicting rules │ │
│ │ │ │
│ │ Applied deterministically before neural compiler runs │ │
│ │ Sorted by (frequency × savings) descending │ │
│ │ Conflict resolution: longest match first │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ FEEDBACK TO PROPOSER │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Successful identity → positive reward │ │
│ │ Near-miss (99.9%) → shaped reward (close) │ │
│ │ Redundant (known) → zero reward (stop re-finding) │ │
│ │ Novel structure → exploration bonus │ │
│ │ Compositional → bonus if A∘B reveals new C │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
0.3 The Compounding Flywheel
Every discovered identity triggers four compounding effects:
1. Direct savings. All programs containing the pattern get cheaper to prove. Retroactively — recompile old programs, get smaller proofs for free.
2. Training enrichment. Each identity is a new training example for the proposer. The GFlowNet learns the shape of identities — "what do valid rewrites look like in this field?" — and proposes higher-quality candidates.
3. Compositional explosion. Identity A transforms sequence X→Y. Identity B transforms sequence Y→Z. Composing them gives X→Z, which might be a deeper identity invisible at either layer alone. The rule database enables automated composition search: for every pair of rules where output of A matches input of B, test the composed rule.
4. Corpus shift. As rules are applied to real programs, the program corpus changes. New instruction patterns emerge that didn't exist before. The proposer explores these new patterns, potentially finding second-order identities that only exist because the first-order ones were applied.
Monotonic improvement. The rule database only grows. Rules are never removed (only demoted if usefulness drops). The compiler can only get better. Over months, the database accumulates thousands of rules, representing a collective algebraic knowledge base that no single mathematician could hold in their head.
0.4 Self-Referential Closure
The identity explorer is itself a Trident program. It compiles to TASM. Its own compilation benefits from the identities it discovers. The explorer optimizes its own execution.
The fixed point: when the explorer can no longer improve its own compilation cost, it has extracted the maximum algebraic efficiency reachable by its architecture. This fixed point represents a lower bound on the extractable efficiency of the Goldilocks field for Triton VM's instruction set — a convergent computation.
A larger explorer (more parameters, longer search horizon) might find deeper identities and reach a lower fixed point. The hierarchy of fixed points, indexed by explorer capacity, converges to the theoretical minimum proving cost for each program — the algebraic Shannon limit of the field.
0.5 Estimated Yield by Layer
| Layer | Example | Frequency | Savings per match | Cumulative impact |
|---|---|---|---|---|
| 0 | push 0; add → ∅ |
Very high | 1-2 rows | 5-10% |
| 1 | push 2^32; mul → shift trick |
High | 3-10 rows | 10-20% |
| 2 | Cheap inversion for subgroup elements | Medium | 10-50 rows | 20-30% |
| 3 | NTT butterfly for root-of-unity constants | Medium | 50-200 rows | 30-50% |
| 4 | Hash function internal shortcuts | Low | 100-500 rows | 40-60% |
| 5+ | Polynomial arithmetic restructuring | Low | 200-1000+ rows | 50-70%+ |
Note: savings compound multiplicatively across layers. A program that benefits from Layer 1 + Layer 3 + Layer 4 optimizations could see 3-5× proving cost reduction.
0.6 Implementation
Phase A — Brute-force baseline (1 week): Write a random sequence pair generator. Execute on random inputs. Collect identities. No NN yet — just exhaustive search over short sequences (length 2-4). This builds the initial rule database and establishes the validation pipeline.
Phase B — Guided search (2 weeks): Train a small NN (MLP, ~10K params) to predict "is this pair likely to be an identity?" from instruction sequence features. Use it to filter random proposals. 10× speedup over brute force.
Phase C — GFlowNet proposer (3 weeks): Replace the MLP filter with a GFlowNet that actively constructs candidate pairs. Reward is identity_found × usefulness. Diversity ensures exploration of all algebraic layers, not just the easiest.
Phase D — Compositional search (2 weeks): Automated composition of known rules. For every pair (A, B) where A's RHS overlaps with B's LHS, test A∘B as a new rule. Depth-limited to avoid combinatorial explosion (max composition depth 3).
Phase E — Continuous operation (ongoing): The explorer runs 24/7. Every week, the rule database grows. Every month, recompile the full program corpus with the expanded database. Measure cumulative savings. Publish the rule database as an open artifact.
Priority 1 — Foundation Layer
1.1 Field-Native Neural Network Library (nn.trd)
What: A Trident library implementing neural network primitives — linear algebra (matrix multiply, dot product), layer normalization, activation functions — all in Goldilocks field arithmetic.
Why: Every other technique on this list either IS a neural network running in Trident, or benefits from having one. Including the identity explorer's proposer.
Architecture:
nn.trd
├── field_signed.trd — signed integer convention (x > p/2 → negative)
├── field_fixed.trd — fixed-point arithmetic via scaling factor
├── linalg.trd — matmul, dot product, vector ops
├── activations.trd — GELU (polynomial), ReLU (conditional), tanh (Pade)
├── layers.trd — linear layer, layer norm, residual connection
├── loss.trd — MSE, cross-entropy (field approximations)
└── inference.trd — forward pass orchestration
Key constraints:
- No negative numbers natively → signed representation via convention
- No fractions natively → fixed-point with configurable scale factor (e.g., $2^{16}$)
- No floating point → all activations are polynomial or rational approximations
- Every operation is a field op → every inference produces a valid Triton VM trace
Size: ~500 lines of Trident. A 3-layer MLP with 64-wide hidden layers compiles to ~2,000 TASM instructions.
The result: A neural network whose every inference is a STARK-proven (zero knowledge proofs) computation, compiled from .trd source. The identity explorer (§0) benefits from this immediately — its own NN inference is provable.
Tier: 0+1 only. No hash/sponge or recursive verification needed for NN inference.
1.2 Field-Native Evolutionary Training
What: Train neural networks entirely within Goldilocks field arithmetic using evolutionary optimization.
Why: Gradient descent in field arithmetic requires finite-difference approximation (noisy). Evolution sidesteps this. Crossover = conditional copy. Mutation = random field element replacement. Both are pure Tier 0 ops.
Algorithm:
POPULATION: N = 16 weight vectors (each ≤91K field elements)
MEMORY: 16 × 728 KB = 11.6 MB total
FOR each generation:
FOR each individual:
FOR each training example:
output = inference(individual.weights, input)
individual.fitness += score(output, expected)
SORT by fitness (descending)
survivors = top 25%
FOR i in 0..N:
parent_a, parent_b = random_choice(survivors)
child = uniform_crossover(parent_a, parent_b)
child = mutate(child, rate=0.01)
new_population[i] = child
Performance: ~2.5M field ops per generation. On M4 Pro Metal: ~50μs per generation. 1,000 generations in 50ms.
Hybrid path (optional):
- Phase 1: Fixed-point finite-difference gradients for cold start (~1M steps, ~5s)
- Phase 2: Evolutionary refinement for cliff-jumping (continuous, finds strategies gradients miss)
Deliverable: A self-contained Trident program that trains a neural classifier, proves every training step, and outputs verified weights.
1.3 Predictive Trace Analysis
What: A small NN (trained via §1.2) that predicts the 9 AET table heights from TIR features, before compilation and execution.
Why: Every compilation optimization depends on understanding the cost landscape. This provides it cheaply.
Input: TIR graph features — node count, operation type histogram, max nesting depth, branch count, loop bound sum, memory access count. ~32 field elements.
Output: 9 field elements — predicted height of each AET table.
Model: Single hidden layer, 64 units. ~6K parameters. Trains in seconds.
Training data: Compile and execute N programs, record actual table heights. Start with ~1,000, scale to ~100,000.
Uses:
- Fast cost estimation during compilation (no execution needed)
- Identifying proof-expensive programs before committing
- Cost-aware TIR optimization — reject transformations that increase predicted cost
- Training signal for the neural compiler (§2.1)
- Table criticality weights for the identity explorer's usefulness scorer (§0.2)
Accuracy target: Predict the bottleneck table correctly >90% of the time. Exact height within 20%.
Priority 2 — Compiler Intelligence
2.1 Differentiable STARK Cost Surrogate
What: A learned, differentiable approximation of the STARK proving cost. Feed it a TASM sequence, get a predicted proving time.
Why: The actual cost function — $\text{cost}(S) = 2^{\lceil \log_2(\max_t H_t(S)) \rceil}$ — is non-differentiable (power-of-2 ceiling, max over tables). A smooth surrogate enables gradient-based optimization.
Architecture: 1D CNN over TASM instruction sequences. Input: instruction IDs + operands, padded to 128. Output: scalar cost. ~15K parameters.
Training: (TASM_sequence, actual_proving_time) pairs from real proving runs.
Key insight: Doesn't need absolute accuracy. Needs correct ranking — "Is TASM A cheaper than TASM B?" Pairwise ranking accuracy >95% suffices.
Enables: Backpropagation through cost → gradient-guided compilation. Combines with evolution: gradients for smooth landscape, evolution for cliff-jumping.
2.2 Learned Instruction Scheduling
What: Reorder TASM instructions to minimize AET table heights while respecting data dependencies. Only permutes — never modifies instructions.
Why: Correctness guaranteed by construction (dependency-respecting permutations). No verifier needed. Pure upside.
Architecture: graph neural network on the TASM dependency DAG. Outputs priority score per instruction. Schedule greedily by priority.
The GNN sees: instruction type, current stack depth, which AET tables each instruction touches, distance to dependency predecessors/successors.
The GNN learns:
- Cluster hash instructions to avoid interleaving with arithmetic (minimize Hash table padding)
- Front-load RAM operations for pipeline filling
- Delay U32 operations until after main computation (avoid U32 table domination)
Training: For each program, generate 1,000 random valid schedules, measure actual table heights, train GNN to predict height-minimizing ordering.
Model: ~20K parameters. Runs in microseconds.
Safety: The scheduler only outputs dependency-respecting permutations. Enforced by algorithm (topological sort with learned priorities), not by model.
2.3 Multi-Objective Compiler Ensemble
What: 8-16 specialist TIR→TASM optimizers, each biased toward minimizing a different AET table. Meta-selector picks the winner per program.
Why: Cliff-discontinuity cost function means the bottleneck table shifts between programs. No single model dominates.
Specialist fitness functions:
specialist_0: -max(H_all_tables) # minimize overall max
specialist_1: -H_processor # minimize Processor
specialist_2: -H_hash # minimize Hash
specialist_3: -H_ram # minimize RAM
specialist_4: -(H_processor + H_hash) # minimize joint bottleneck
specialist_5: -(max_H - second_max_H) # maximize balance
...
Meta-selector: Run all specialists, use trace predictor (§1.3) or cost surrogate (§2.1) to pick winner without proving all candidates.
Memory: 16 specialists × 728 KB = 11.6 MB. L2 cache. Latency: 16 × 50μs ≈ 800μs. Still <1ms. Negligible vs. proving (seconds).
2.4 Learned Peephole Patterns
What: 1D convolutional network scanning TASM instruction windows (size 3-8), detecting sequences where local substitution reduces cost.
Why: The evolutionary compiler discovers patterns implicitly. Peephole extraction makes them explicit, composable, transferable.
Architecture: 1D Conv (kernel 5) → ReLU → 1D Conv (kernel 3) → classifier. Input: sliding window. Output: (should_rewrite, replacement_pattern_id).
Training pipeline:
- Run evolutionary compiler on 10,000 programs
- Diff naive lowering vs. evolved output
- Extract per-window changes
- Train conv net to detect replaceable windows and predict replacements
Relationship to Identity Explorer (§0): The explorer discovers algebraically valid identities from field theory. The peephole learner discovers compiler-specific patterns from evolutionary search. They feed the same rule database. The explorer finds "this IS equivalent." The peephole learner finds "this SHOULD be rewritten." Together they produce rules that are both valid AND useful.
Composability: Rules become deterministic passes applied before the neural compiler. Over time, the deterministic rules handle easy cases; the neural compiler focuses on hard ones.
2.5 Neural Decompilation (TASM → TIR)
What: Given optimized TASM, reconstruct a plausible TIR representation.
Why:
- Learn from hand-written TASM (expert knowledge → training data)
- Cross-pollinate optimizations between programs
- Sanity-check neural compiler output via round-trip
- Enable TIR → TASM → TIR' → TASM' equivalence testing
Architecture: Sequence-to-graph model. Attention encoder, autoregressive graph decoder. ~50K parameters.
Training data: Every compilation (TIR → TASM) generates a free training pair. 100,000 compilations → 100,000 pairs.
Correctness: Decompiled TIR is a suggestion. Verify by recompiling and checking equivalence.
Priority 3 — Proving Acceleration
3.1 NN-Guided STARK Prover Configuration
What: RL agent selecting STARK prover parameters per-program to minimize proving time. Proof remains valid regardless — agent only affects speed.
Why: Multiple tunable parameters where optimal settings are program-dependent. Current heuristics leave performance on the table.
Configurable parameters:
| Parameter | Current | Learned |
|---|---|---|
| FRI folding factor | Fixed (e.g., 8) | Per-program adaptive |
| Grinding bits | Fixed (e.g., 20) | Trade proof size vs. time |
| Blowup factor | Fixed (e.g., 4) | Soundness-cost tradeoff |
| Evaluation domain offset | Default coset | Numerically-tuned |
| Memory layout | Sequential | Cache-optimized |
| Parallelism strategy | Default threading | Structure-aware |
Agent: MLP. Input: program features + AET heights + hardware specs. Output: config vector. ~10K parameters.
Training: For each program, run prover with K random configs, record (features, config, time) triples, train agent to predict time-minimizing config.
Safety: Cannot compromise soundness. All configs produce valid proofs or prover rejects and falls back to defaults.
Expected impact: 10-30% proving time reduction. Compounds across all programs.
3.2 Neural Proof Compression
What: Predictor anticipating redundant STARK proof elements, transmitting only the unpredicted delta.
Why: STARK proofs are hundreds of KB. If predictor achieves 80% accuracy, ~5× compression.
Architecture: Small autoregressive model over proof elements. Conditions on program, public inputs, previous elements.
Verification: Verifier runs identical predictor. Uses predictions where correct, transmitted values where not. Full proof is still checked — compression is transport-layer only.
Compatibility: Works with any STARK system. No prover/verifier logic changes.
Requirement: Predictor must be deterministic and version-matched on both sides. Ship as a Trident program (naturally).
3.3 Neural Theorem Prover for TASM Equivalence
What: Given two TASM sequences, find a chain of valid rewrite steps transforming one into the other.
Why: Structural proof of equivalence (formal verification) for ALL inputs, not just tested ones. Generates reusable optimization knowledge.
Rewrite rules (enumerable, finite):
| Rule | From | To | Condition |
|---|---|---|---|
| Dead push | push X; pop |
∅ | X unused |
| Constant fold | push X; push Y; add |
push (X+Y) |
Constants |
| Identity elim | push 0; add |
∅ | Always |
| Swap cancel | swap K; swap K |
∅ | Always |
| Commutativity | push X; push Y; add |
push Y; push X; add |
Always |
| Reorder | A; B |
B; A |
No dependency |
| Explorer rules | from §0 database | from §0 database | proven valid |
Key: The identity explorer (§0) feeds rewrite rules directly into the NTP's rule vocabulary. As the explorer discovers deeper algebraic identities, the NTP can prove equivalences across wider transformations. The NTP and the explorer form a mutual amplification loop.
Architecture: Embed TASM sequences via GNN on dependency DAG. Predict rewrite rule to apply at each step. Search guided by embedding distance to target.
Byproduct: Every successful rewrite path IS a new peephole pattern (§2.4). The NTP feeds the peephole database. The peephole database speeds up the NTP. Flywheel.
Priority 4 — Developer Experience
4.1 Neural Type Inference
What: Predict type theory annotations for Trident programs, choosing types that minimize TASM cost.
Why: Different type representations (bool as field element vs. bit, integer width) produce different AET profiles. The "right" type minimizes proof cost.
Architecture: Tree-LSTM on Trident AST. Two-headed output: (type_prediction, expected_cost_delta).
Interface: LSP-style autocomplete for types. Red underline if type choice is >2× more expensive than optimal.
4.2 Incremental Recompilation via Neural Diff
What: Predict which TIR nodes are affected by a source edit, recompile only those subgraphs.
Why: Full recompilation is slow for large programs. Incremental enables interactive development.
Architecture: GNN on TIR dependency graph. Input: (old_TIR, edit_location, new_fragment). Output: per-node affected/stable classification.
Correctness: Conservative — if uncertain, mark as affected. Bias toward recall >99.9%.
Integration: trident watch mode. Save → predict diff → incremental compile → incremental prove. Target: <100ms for single-line edits.
4.3 Fuzzing-Guided Program Synthesis
What: Given input/output examples as field element pairs, synthesize a Trident program satisfying the spec. Triton VM proves correctness.
Why: Specification-first workflow. Write tests, let the network generate code, prove it correct.
Architecture: Seq2seq. Encoder: set of (input, output) pairs → fixed representation. Decoder: autoregressive TIR ops (vocab 54, max length 32). ~40K parameters.
Verification: Compile synthesized program, execute on spec inputs. Match → STARK proof. No match → generate more candidates (beam search K=16).
Feedback: Failed candidates → negative training signal. Successes → added to training set. Synthesizer improves as corpus grows.
Priority 5 — Adversarial Hardening
5.1 Adversarial Program Generation
What: NN generating Trident programs designed to defeat the neural compiler — programs where optimization yields no improvement.
Why: Finds compiler blind spots. Satisfies "100× adversarial load" quality gate. Automated and continuous.
GAN-like loop:
Each epoch:
1. Adversary generates 100 programs
2. Neural compiler optimizes each
3. Measure improvement ratio
4. Adversary reward = programs where compiler failed
5. Compiler trains on adversarial programs
6. Repeat
Convergence: Adversary and compiler reach equilibrium when adversary can't find programs the compiler handles poorly. Equilibrium IS the quality gate.
Feeds the explorer: Adversarial programs that resist optimization often contain instruction patterns where no known algebraic identity helps. These patterns become priority targets for the identity explorer (§0) — "find an identity that covers THIS pattern."
5.2 Equivalence Checker Stress Testing
What: Generate "almost equivalent" TASM pairs — agree on 99.99% of inputs, differ on edge cases. Test whether verification catches the discrepancy.
Why: STARK verifier is sound in theory. Implementation bugs exist. Stress test the pipeline.
Generation: Take correct TASM, apply single mutation (change one operand, swap two instructions with subtle dependency). Test equivalence checker on (original, mutant) pairs.
Target: Zero false positives (mutant declared equivalent). Track false negatives (equivalent pairs declared different) as optimization opportunities.
Priority 6 — Cross-Target Transfer
6.1 Transfer Learning Between Proof Backends
What: When Trident adds new targets (Miden vm, SP1, OpenVM), transfer compiler knowledge from Triton VM.
Why: TIR-level patterns generalize across backends. Only the lowering changes.
Architecture: Split neural compiler into shared TIR encoder + per-target backend decoder.
Transfer: Train on Triton VM → freeze encoder → train only decoder for new backend → fine-tune if needed.
Data efficiency: New backend needs ~10% of Triton VM's training data.
Identity explorer transfer: Layer 0-1 identities (arithmetic) transfer directly. Layer 2+ identities are field-specific but architecture-dependent — need re-validation per backend. The validation pipeline from §0.2 handles this automatically.
Dependency Graph
┌──────────────────────────────────────────┐
│ [0] ALGEBRAIC IDENTITY EXPLORER │
│ (runs continuously, feeds everything) │
└──────┬──────────────┬────────────────────┘
│ │
│ ├──→ [2.4 Peephole Patterns]
│ ├──→ [3.3 NTP Equivalence] ──→ [2.4]
│ └──→ [5.1 Adversarial Gen] ──→ [0]
│
▼
[1.1 nn.trd] ──→ [1.2 Evolutionary Training]
│
├──→ [1.3 Trace Predictor]
│ │
│ ├──→ [2.1 Cost Surrogate]
│ │ │
│ │ └──→ [2.3 Compiler Ensemble]
│ │
│ └──→ [3.1 Prover Config Agent]
│
├──→ [2.2 Instruction Scheduling]
│
├──→ [2.5 Neural Decompilation]
│
├──→ [4.1 Type Inference]
│
├──→ [4.3 Program Synthesis]
│ │
│ └──→ [5.1 Adversarial Gen]
│
└──→ [6.1 Transfer Learning]
[3.2 Proof Compression] ← needs proof corpus, independent of compiler
[4.2 Incremental Recompile] ← needs TIR graph only, independent
[5.2 Equivalence Stress Test] ← needs TASM mutation + verifier, independent
Implementation Timeline
| Phase | Items | Effort | What you learn |
|---|---|---|---|
| Phase 0a: Explorer Bootstrap | §0 Phase A-B (brute-force + NN filter) | 3 weeks | Field structure empirically, validation pipeline |
| Phase 0b: NN Foundation | §1.1, §1.2 | 2 weeks | Field-native NN, evolutionary training, world-first provable NN |
| Phase 1: Cost Intelligence | §1.3, §2.1 | 2 weeks | AET cost landscape, differentiable optimization |
| Phase 2: Compiler Core | §2.2, §2.3, §2.4 | 3 weeks | Scheduling, ensembles, peephole extraction |
| Phase 3: Explorer Maturity | §0 Phase C-D (GFlowNet + composition) | 3 weeks | Deep algebraic identities, compositional search |
| Phase 4: Proving | §3.1, §3.2 | 3 weeks | Prover internals, proof structure |
| Phase 5: Developer UX | §4.1, §4.2, §4.3 | 4 weeks | Language ergonomics, synthesis |
| Phase 6: Hardening | §5.1, §5.2, §3.3 | 3 weeks | Adversarial robustness, formal equivalence |
| Phase 7: Multi-Target | §6.1, §2.5 | 2 weeks | Backend abstraction, transfer |
| Continuous | §0 Phase E (explorer 24/7) | Ongoing | Unbounded improvement |
Total structured work: ~25 weeks. Phase 0a + 0b alone produce two publishable results: a provable neural network and an automated algebraic identity discoverer.
The explorer never stops. After Phase 3, it runs continuously. Every week the rule database grows. Every month, cumulative proving cost drops. The gap between Trident and every other proof-system language widens — because no other language has a neural network mining its own field theory for optimizations.
The Endgame
A traditional compiler is a fixed artifact. Engineers write passes, ship a version, write more passes, ship another version. The passes are finite. The compiler converges to a local optimum defined by the engineers' knowledge.
A Trident compiler with the identity explorer is an open-ended learning system. The algebraic structure of the Goldilocks field is finite but astronomically large. The explorer maps it continuously. The rule database grows monotonically. The compiler improves without human intervention.
Every program ever written in Trident gets cheaper to prove over time — not because the hardware improved, but because the compiler discovered a new algebraic identity that applies to a pattern in that program.
This is the moat. Not features. Not performance benchmarks at a point in time. A compiler that gets better every day, powered by the bottomless well of algebraic structure in the field it was born to compute over.
The deeper you go into the field theory, the more optimizations you find. There is no bottom.
purpose. link. energy.
Part X: Mathematical Foundations
10.1 category theory Semantics for the Type System
Trident's types form a category. Functions are morphisms. The compiler is a functor from the Trident category to the TASM category. Proving that this functor preserves equivalences gives compiler correctness as a mathematical theorem, not a test suite.
Concrete application: If two Trident programs are equivalent (produce the same output for all inputs), the compiled TASM must also be equivalent. The categorical framework makes this a proven property of the compiler rather than a hoped-for empirical observation.
10.2 Galois Theory (algebra) for Extension Fields
When Trident operates over extension fields ($\mathbb{F}_{p^2}$, $\mathbb{F}_{p^4}$), the Galois group structure enables automatic optimization of extension field arithmetic. Frobenius automorphisms ($x \mapsto x^p$) are free in the base field and cheap in extensions. The compiler can use Galois theory to find the cheapest way to compute extension field operations.
10.3 Algebraic Geometry for Constraint Systems
STARK constraints define an algebraic variety over the Goldilocks field. The geometry of this variety determines the proof's structure. The compiler can use algebraic geometry to:
- Detect redundant constraints (constraints implied by others → remove for free)
- Find the minimal constraint set that defines the same variety
- Identify singular points that could cause prover issues
Implementation Priority
| Priority | Cluster | Items | Effort | Impact |
|---|---|---|---|---|
| P0 | Algebraic Passes | Constant folding, Fermat reduction, strength reduction, batch inversion, dead elimination, algebraic CSE | 4 weeks | 20-40% proof cost reduction for free |
| P1 | Type System | Proof-cost types, refinement types, linear types for crypto | 6 weeks | Catches entire classes of bugs at compile time |
| P2 | Formal Verification | Requires/ensures, invariant-carrying loops | 4 weeks | Proof of execution = proof of correctness |
| P3 | Supercompilation | Driving + folding over field arithmetic, loop-to-closed-form | 6 weeks | Orders of magnitude reduction for iterative code |
| P4 | Neural Compilation | nn.trd, evolutionary training, trace predictor, cost surrogate | 8 weeks | Self-improving compiler |
| P5 | Cryptographic Primitives | Private/Public types, commitment syntax, Merkle iterators | 4 weeks | 10× developer productivity for ZK code |
| P6 | Developer Tooling | Cost highlighting, REPL, proof explorer, CI/CD | 6 weeks | Adoption accelerator |
| P7 | Self-Hosting | Trident compiler in Trident, bootstrap verification | 8 weeks | Ultimate trust elimination |
| P8 | Execution Models | Lazy proving, incremental proving, speculative execution | 6 weeks | Throughput optimization |
| P9 | Interoperability | Proof-carrying code, cross-VM composition, foreign proofs | 6 weeks | Ecosystem integration |
| P10 | Math Foundations | Categorical semantics, Galois optimization, algebraic geometry | Ongoing | Theoretical correctness backbone |
The Algebraic Advantage — Why This Cannot Be Replicated
Every technique in this document exploits a single fact: Trident operates over a mathematically structured execution domain. The Goldilocks field has algebraic identities, group structure, roots of unity, extension field theory, and Galois symmetries. No general-purpose language has any of this.
A C compiler cannot apply Fermat's little theorem because C integers don't live in a field.
A Rust compiler cannot batch inversions because Rust values don't have multiplicative inverses.
A Python interpreter cannot NTT-vectorize because Python lists aren't sequences over a prime-order group.
A Solidity compiler cannot supercompile over field arithmetic because the EVM's 256-bit integers lack the structure of a proper prime field.
Trident can do ALL of these things because it was designed, from genesis, for computation over $\mathbb{F}_p$. The algebraic structure isn't bolted on — it's the foundation. Every optimization in this document is a consequence of that single design decision.
This is the unbounded opportunity. The deeper you go into the field theory, the more optimizations you find. There is no bottom — every branch of algebra offers new compiler passes that no other language can implement. And every new pass reduces proving cost, which compounds across every program ever written in Trident.
The language that proves itself. The compiler that optimizes itself. The proof system that verifies itself. Not by trust, but by algebra.
purpose. link. energy.