⚡ Optimization Guide

Strategies for reducing the proving cost of Trident programs.

Note: This guide focuses on the Triton VM target. Other backends have different cost profiles.

📊 Understanding Cost

Triton VM proves computation correctness using six execution tables. The proving cost is determined by the tallest table, padded to the next power of two. Reducing the tallest table has the most impact; reducing a table that is already shorter than the tallest has no effect on proving cost. See How STARK Proofs Work Section 11 for the exact proving time formula.

The Six Tables

Table	Grows With	Typical Driver
Processor	Every instruction executed	Loop iterations, function calls
Hash	Every `hash` instruction	Tip5 calls (6 rows per hash)
U32	Range checks, bitwise ops	`as_u32`, `split`, `&`, `^`, `log2`, `pow`
Op Stack	Stack underflow handling	Deep variable access, large structs
RAM	Memory read/write	Array indexing, struct field access, spilling
Jump Stack	`call` and `return`	Function calls, if/else branches

Reading a Cost Report

trident build main.tri --costs

Output shows each function's cost across all tables. The dominant table is the one that determines the padded height. Focus optimization efforts there.

Tracking Costs Over Time

Save a baseline and compare after changes:

trident build main.tri --save-costs before.json
# ... make changes ...
trident build main.tri --compare before.json

The comparison shows which functions got cheaper or more expensive.

⚡ Optimization Strategies

1. Reduce Hash Table Cost

The Hash table is often the tallest because each hash / call adds 6 rows.

Strategies

Batch hashing: Each tip5 call costs 6 Hash Table rows regardless of how many of its 10 inputs you actually use. Batching 3 single-value hashes into 1 call saves 12 hash rows. Pack up to 10 field elements into a single tip5 call:

// Expensive: 3 hash calls = 18 hash rows
let h1: Digest = tip5(a, 0, 0, 0, 0, 0, 0, 0, 0, 0)
let h2: Digest = tip5(b, 0, 0, 0, 0, 0, 0, 0, 0, 0)
let h3: Digest = tip5(c, 0, 0, 0, 0, 0, 0, 0, 0, 0)

// Cheaper: 1 hash call = 6 hash rows
let h: Digest = tip5(a, b, c, 0, 0, 0, 0, 0, 0, 0)

Use sponge for streaming: For more than 10 elements, use the sponge API instead of multiple tip5 calls:

sponge_init()
sponge_absorb(e0, e1, e2, e3, e4, e5, e6, e7, e8, e9)
sponge_absorb(e10, e11, e12, e13, e14, e15, e16, e17, e18, e19)
let d: Digest = sponge_squeeze()

Reduce Merkle tree depth: Each level costs 6 hash rows. Depth-3 = 18 hash rows per proof. Depth-4 = 24. Depth-20 = 120. If you're near a power-of-2 boundary, even one extra level can double proving cost.

2. Reduce Processor Table Cost

The Processor table grows with every instruction. Loops are the main contributor.

Strategies

Minimize loop body size: Move invariant computations outside the loop:

// Before: constant recomputed each iteration
for i in 0..100 bounded 100 {
    let threshold: Field = compute_threshold()  // same every time
    process(i, threshold)
}

// After: compute once
let threshold: Field = compute_threshold()
for i in 0..100 bounded 100 {
    process(i, threshold)
}

Reduce iteration count: If possible, restructure to need fewer iterations.
Use tighter bounds: The bounded value determines the worst-case unrolling. Set it as tight as possible.

3. Reduce U32 Table Cost

U32 operations (range checks, bitwise ops) are relatively expensive.

Strategies

Stay in Field when possible: If you don't need range-checked 32-bit arithmetic, use Field instead of U32. Field operations use the Processor table (cheap) rather than the U32 table.

// Expensive: U32 operations
let a: U32 = as_u32(pub_read())
let b: U32 = as_u32(pub_read())
let sum: U32 = a + b

// Cheaper (if range check not needed): Field operations
let a: Field = pub_read()
let b: Field = pub_read()
let sum: Field = a + b

Minimize as_u32 conversions: Each as_u32 call uses split which costs U32 table rows. Convert once and reuse.
Avoid unnecessary split: The /% (divmod) operator implicitly uses split. If you only need the quotient, you still pay for both.

4. Reduce Op Stack and RAM Table Cost

These grow with deep variable access and memory operations.

Strategies

Keep hot variables shallow: Variables accessed frequently should be declared close to their use. The compiler uses LRU-based spilling -- frequently accessed variables stay on the stack, infrequent ones spill to RAM.
Minimize struct size: Larger structs consume more stack depth per access. If a struct has fields you rarely use, consider splitting it.
Prefer stack over RAM: Direct stack operations (dup, swap) are cheaper than RAM read/write. The compiler manages this automatically, but keeping your function's live variable count under 16 field elements avoids spilling entirely.

5. Reduce Jump Stack Cost

Every function call adds 2 rows (call + return) to the Jump Stack table. Every if/else branch also uses calls internally.

Strategies

Inline small functions: If a function is called in a tight loop and has a small body, consider inlining it manually. The compiler does not perform automatic inlining.
Reduce branching in loops: Each if/else inside a loop adds jump stack overhead per iteration.

// More jump stack overhead (2 calls per iteration)
for i in 0..100 bounded 100 {
    if condition {
        do_a()
    } else {
        do_b()
    }
}

💡 Compiler Hints

The compiler provides optimization hints with --hints:

Hint	Meaning	Action
H0001	Hash-dominated cost	Batch hash inputs, reduce Merkle depth
H0002	Large array access pattern	Consider RAM-based access or smaller arrays
H0003	Deep function call chain	Inline hot functions
H0004	Stack boundary warning	Reduce live variables or struct sizes

📝 Per-Line Cost Annotations

Use --annotate to see which lines contribute most:

trident build main.tri --annotate

Each line shows its cost contribution in compact form: cc (clock cycles), hash, u32. Lines with no cost are unmarked. Focus on the lines with the highest numbers.

🔥 Hotspot Analysis

Use --hotspots to see the top 5 most expensive functions:

trident build main.tri --hotspots

This immediately shows where to focus optimization efforts.

📋 Common Patterns

Merkle Proof Verification

Merkle proofs are hash-heavy. Each level adds 6 hash rows + U32 table rows. Use the shallowest tree that fits your data:

Depth	Hash Rows	Padded Height Impact
1	6	Negligible
3	18	Low (Neptune kernel default)
10	60	Moderate
20	120	High (can dominate)
32	192	Very high

// Depth 3: 3 hash calls = 18 hash rows (Neptune default)
std.crypto.merkle.verify3(leaf, root, idx)

// Depth 1: 1 hash call = 6 hash rows
std.crypto.merkle.verify1(leaf, root, idx)

Token Operations

For coins, the main cost drivers are:

Leaf hashing (computing Merkle leaves)
Merkle authentication (proving leaf membership)
State updates (new leaf computation)

Minimize the number of Merkle operations per transaction.

Digest Comparison

Comparing digests requires comparing all 5 field elements:

// Use the std.debug.assert.digest builtin (5 equality checks)
std.debug.assert.digest(actual, expected)

This is cheaper than manual element-by-element comparison because it uses native assert instructions.

RAM-Aware Hashing

When absorbing data that's already in RAM, prefer sponge_absorb_mem over reading to the stack first:

// Expensive: 10 RAM reads + sponge_absorb = 10 cc + 10 RAM rows + 6 hash rows
let a: Field = vm.io.mem.read(addr)
// ... read 9 more values ...
vm.crypto.hash.sponge_absorb(a, b, c, d, e, f, g, h, i, j)

// Cheaper: sponge_absorb_mem = 1 cc + 10 RAM rows + 6 hash rows
vm.crypto.hash.sponge_absorb_mem(addr)

Both cost the same 6 Hash Table rows, but sponge_absorb_mem saves ~10 processor cycles by avoiding individual read_mem instructions.

📋 Summary

Identify the dominant table with --costs
Find the expensive functions with --hotspots
Locate expensive lines with --annotate
Apply targeted optimizations for the dominant table
Verify improvement with --compare

The goal is not to minimize every table, but to bring the tallest table down. Once two tables are similar height, optimize the one that is now tallest.

🔗 See Also

Compiling a Program -- Build pipeline, --costs / --hotspots flags
Generating Proofs -- How proving cost translates to proof generation time
IR Reference -- TIR operations and lowering paths

Dimensions

optimization

math/optimization

Finding the best solution from a set of feasible alternatives by minimizing or maximizing an objective function. convex optimization:: global optimum is guaranteed; the tractable case non-convex optimization:: multiple local optima; requires heuristics or global search gradient descent::…

trident/docs/guides/optimization.md

⚡ Optimization Guide

📊 Understanding Cost

The Six Tables

Reading a Cost Report

Tracking Costs Over Time

⚡ Optimization Strategies

1. Reduce Hash Table Cost

Strategies

2. Reduce Processor Table Cost

Strategies

3. Reduce U32 Table Cost

Strategies

4. Reduce Op Stack and RAM Table Cost

Strategies

5. Reduce Jump Stack Cost

Strategies

💡 Compiler Hints

📝 Per-Line Cost Annotations

🔥 Hotspot Analysis

📋 Common Patterns

Merkle Proof Verification

Token Operations

Digest Comparison

RAM-Aware Hashing

📋 Summary

🔗 See Also

Dimensions

Local Graph