⚡ Optimization Guide
Strategies for reducing the proving cost of Trident programs.
Note: This guide focuses on the Triton VM target. Other backends have different cost profiles.
📊 Understanding Cost
Triton VM proves computation correctness using six execution tables. The proving cost is determined by the tallest table, padded to the next power of two. Reducing the tallest table has the most impact; reducing a table that is already shorter than the tallest has no effect on proving cost. See How STARK Proofs Work Section 11 for the exact proving time formula.
The Six Tables
| Table | Grows With | Typical Driver |
|---|---|---|
| Processor | Every instruction executed | Loop iterations, function calls |
| Hash | Every hash instruction |
Tip5 calls (6 rows per hash) |
| U32 | Range checks, bitwise ops | as_u32, split, &, ^, log2, pow |
| Op Stack | Stack underflow handling | Deep variable access, large structs |
| RAM | Memory read/write | Array indexing, struct field access, spilling |
| Jump Stack | call and return |
Function calls, if/else branches |
Reading a Cost Report
trident build main.tri --costs
Output shows each function's cost across all tables. The dominant table is the one that determines the padded height. Focus optimization efforts there.
Tracking Costs Over Time
Save a baseline and compare after changes:
trident build main.tri --save-costs before.json
# ... make changes ...
trident build main.tri --compare before.json
The comparison shows which functions got cheaper or more expensive.
⚡ Optimization Strategies
1. Reduce Hash Table Cost
The Hash table is often the tallest because each hash / call adds 6 rows.
Strategies
- Batch hashing: Each
tip5call costs 6 Hash Table rows regardless of how many of its 10 inputs you actually use. Batching 3 single-value hashes into 1 call saves 12 hash rows. Pack up to 10 field elements into a singletip5call:
// Expensive: 3 hash calls = 18 hash rows
let h1: Digest = tip5(a, 0, 0, 0, 0, 0, 0, 0, 0, 0)
let h2: Digest = tip5(b, 0, 0, 0, 0, 0, 0, 0, 0, 0)
let h3: Digest = tip5(c, 0, 0, 0, 0, 0, 0, 0, 0, 0)
// Cheaper: 1 hash call = 6 hash rows
let h: Digest = tip5(a, b, c, 0, 0, 0, 0, 0, 0, 0)
- Use sponge for streaming: For more than 10 elements, use the sponge API instead of multiple
tip5calls:
sponge_init()
sponge_absorb(e0, e1, e2, e3, e4, e5, e6, e7, e8, e9)
sponge_absorb(e10, e11, e12, e13, e14, e15, e16, e17, e18, e19)
let d: Digest = sponge_squeeze()
- Reduce Merkle tree depth: Each level costs 6 hash rows. Depth-3 = 18 hash rows per proof. Depth-4 = 24. Depth-20 = 120. If you're near a power-of-2 boundary, even one extra level can double proving cost.
2. Reduce Processor Table Cost
The Processor table grows with every instruction. Loops are the main contributor.
Strategies
- Minimize loop body size: Move invariant computations outside the loop:
// Before: constant recomputed each iteration
for i in 0..100 bounded 100 {
let threshold: Field = compute_threshold() // same every time
process(i, threshold)
}
// After: compute once
let threshold: Field = compute_threshold()
for i in 0..100 bounded 100 {
process(i, threshold)
}
-
Reduce iteration count: If possible, restructure to need fewer iterations.
-
Use tighter bounds: The
boundedvalue determines the worst-case unrolling. Set it as tight as possible.
3. Reduce U32 Table Cost
U32 operations (range checks, bitwise ops) are relatively expensive.
Strategies
- Stay in Field when possible: If you don't need range-checked 32-bit arithmetic, use
Fieldinstead ofU32. Field operations use the Processor table (cheap) rather than the U32 table.
// Expensive: U32 operations
let a: U32 = as_u32(pub_read())
let b: U32 = as_u32(pub_read())
let sum: U32 = a + b
// Cheaper (if range check not needed): Field operations
let a: Field = pub_read()
let b: Field = pub_read()
let sum: Field = a + b
-
Minimize
as_u32conversions: Eachas_u32call usessplitwhich costs U32 table rows. Convert once and reuse. -
Avoid unnecessary
split: The/% (divmod)operator implicitly usessplit. If you only need the quotient, you still pay for both.
4. Reduce Op Stack and RAM Table Cost
These grow with deep variable access and memory operations.
Strategies
-
Keep hot variables shallow: Variables accessed frequently should be declared close to their use. The compiler uses LRU-based spilling -- frequently accessed variables stay on the stack, infrequent ones spill to RAM.
-
Minimize struct size: Larger structs consume more stack depth per access. If a struct has fields you rarely use, consider splitting it.
-
Prefer stack over RAM: Direct stack operations (dup, swap) are cheaper than RAM read/write. The compiler manages this automatically, but keeping your function's live variable count under 16 field elements avoids spilling entirely.
5. Reduce Jump Stack Cost
Every function call adds 2 rows (call + return) to the Jump Stack table. Every if/else branch also uses calls internally.
Strategies
-
Inline small functions: If a function is called in a tight loop and has a small body, consider inlining it manually. The compiler does not perform automatic inlining.
-
Reduce branching in loops: Each if/else inside a loop adds jump stack overhead per iteration.
// More jump stack overhead (2 calls per iteration)
for i in 0..100 bounded 100 {
if condition {
do_a()
} else {
do_b()
}
}
💡 Compiler Hints
The compiler provides optimization hints with --hints:
| Hint | Meaning | Action |
|---|---|---|
| H0001 | Hash-dominated cost | Batch hash inputs, reduce Merkle depth |
| H0002 | Large array access pattern | Consider RAM-based access or smaller arrays |
| H0003 | Deep function call chain | Inline hot functions |
| H0004 | Stack boundary warning | Reduce live variables or struct sizes |
📝 Per-Line Cost Annotations
Use --annotate to see which lines contribute most:
trident build main.tri --annotate
Each line shows its cost contribution in compact form: cc (clock cycles), hash, u32. Lines with no cost are unmarked. Focus on the lines with the highest numbers.
🔥 Hotspot Analysis
Use --hotspots to see the top 5 most expensive functions:
trident build main.tri --hotspots
This immediately shows where to focus optimization efforts.
📋 Common Patterns
Merkle Proof Verification
Merkle proofs are hash-heavy. Each level adds 6 hash rows + U32 table rows. Use the shallowest tree that fits your data:
| Depth | Hash Rows | Padded Height Impact |
|---|---|---|
| 1 | 6 | Negligible |
| 3 | 18 | Low (Neptune kernel default) |
| 10 | 60 | Moderate |
| 20 | 120 | High (can dominate) |
| 32 | 192 | Very high |
// Depth 3: 3 hash calls = 18 hash rows (Neptune default)
std.crypto.merkle.verify3(leaf, root, idx)
// Depth 1: 1 hash call = 6 hash rows
std.crypto.merkle.verify1(leaf, root, idx)
Token Operations
For coins, the main cost drivers are:
- Leaf hashing (computing Merkle leaves)
- Merkle authentication (proving leaf membership)
- State updates (new leaf computation)
Minimize the number of Merkle operations per transaction.
Digest Comparison
Comparing digests requires comparing all 5 field elements:
// Use the std.debug.assert.digest builtin (5 equality checks)
std.debug.assert.digest(actual, expected)
This is cheaper than manual element-by-element comparison because it uses native assert instructions.
RAM-Aware Hashing
When absorbing data that's already in RAM, prefer sponge_absorb_mem over reading to the stack first:
// Expensive: 10 RAM reads + sponge_absorb = 10 cc + 10 RAM rows + 6 hash rows
let a: Field = vm.io.mem.read(addr)
// ... read 9 more values ...
vm.crypto.hash.sponge_absorb(a, b, c, d, e, f, g, h, i, j)
// Cheaper: sponge_absorb_mem = 1 cc + 10 RAM rows + 6 hash rows
vm.crypto.hash.sponge_absorb_mem(addr)
Both cost the same 6 Hash Table rows, but sponge_absorb_mem saves ~10 processor cycles by avoiding individual read_mem instructions.
📋 Summary
- Identify the dominant table with
--costs - Find the expensive functions with
--hotspots - Locate expensive lines with
--annotate - Apply targeted optimizations for the dominant table
- Verify improvement with
--compare
The goal is not to minimize every table, but to bring the tallest table down. Once two tables are similar height, optimize the one that is now tallest.
🔗 See Also
- Compiling a Program -- Build pipeline,
--costs/--hotspotsflags - Generating Proofs -- How proving cost translates to proof generation time
- IR Reference -- TIR operations and lowering paths