UniversalHash v4: A Democratic Proof-of-Work Algorithm
Version 1.0 — January 2026
Abstract: UniversalHash v4 is a novel proof-of-work algorithm designed to minimize the performance gap between consumer devices (smartphones, laptops) and specialized hardware (servers, GPUs, ASICs). By combining memory-hard techniques with parallel chain constraints, hardware-accelerated cryptographic primitives, and sequential dependencies, UniversalHash achieves a phone-to-desktop ratio of approximately 1:3-5, compared to 1:50-100+ in traditional PoW algorithms. This paper presents the complete design rationale, comparative analysis with existing CPU-oriented algorithms, security considerations, and reference implementations for both provers and verifiers.
Table of Contents
- Introduction
- Problem Statement
- Design Goals
- Comparative Analysis
- Algorithm Specification
- Security Analysis
- Performance Projections
- Economic Model: Self-Regulating Proof Density
- Bostrom Integration: Pool-less Epoch Mining
- Interplanetary Consensus: Adaptive Epochs
- Implementation: Verifier (Rust)
- Implementation: Prover (Multi-Platform)
- Integration Guidelines
- Future Work
- Conclusion
1. Introduction
The centralization of cryptocurrency mining represents one of the most significant threats to blockchain decentralization. Despite Bitcoin's original vision of "one CPU, one vote," modern mining is dominated by specialized hardware that excludes ordinary participants. This concentration of hash power creates systemic risks: 51% attacks become economically feasible, geographic centralization around cheap electricity introduces geopolitical vulnerabilities, and the barrier to participation undermines the democratic foundations of decentralized systems.
UniversalHash v4 addresses these challenges by designing a proof-of-work algorithm from first principles with democratic participation as the primary objective. Rather than optimizing for raw throughput or energy efficiency in isolation, we optimize for accessibility — ensuring that the billions of smartphones and consumer devices worldwide can meaningfully contribute to network security.
1.1 Key Contributions
- Parallel Chain Architecture: A 4-chain structure that naturally caps parallelism advantages while fitting perfectly on 4-core budget phones
- Triple Primitive Rotation: Mandatory efficient implementation of AES, SHA-256, and Blake3 raises ASIC development costs
- Memory-Optimized Parameters: 2MB total scratchpad (512KB × 4 chains) fits in phone L3 cache while preventing GPU memory coalescing
- Sequential Dependency Chains: Each round depends on the previous, defeating massive parallelism
- Hardware Acceleration Utilization: Leverages AES-NI, SHA extensions, and ARM crypto instructions present in 95%+ of post-2015 devices
2. Problem Statement
2.1 The Hashrate Gap
Current proof-of-work algorithms exhibit extreme performance disparities between device classes:
| Comparison | SHA-256 (Bitcoin) | Ethash (Historical) | RandomX |
|---|---|---|---|
| Phone vs Desktop | 1:10,000+ | 1:500+ | 1:50-100 |
| Phone vs GPU | 1:1,000,000+ | 1:200+ | 1:5-10 |
| Phone vs ASIC | 1:10,000,000,000+ | N/A | N/A |
These ratios effectively exclude consumer devices from meaningful participation. A single ASIC farm produces more hash power than all smartphones on Earth combined.
2.2 Economic Barriers
Beyond raw performance, economic factors compound centralization:
- Capital Requirements: ASIC miners cost MATH_PLACEHOLDER_015,000+ per unit
- Electricity Costs: Industrial operations access $0.02-0.05/kWh rates unavailable to consumers
- Technical Expertise: Operating mining farms requires specialized knowledge
- Geographic Concentration: Mining gravitates toward regions with cheap power and favorable regulations
2.3 Previous Attempts
Several algorithms have attempted to address these issues:
CryptoNight (Monero, pre-2019): 2MB scratchpad with random reads/writes. Ultimately defeated by ASICs due to predictable memory access patterns.
RandomX (Monero, 2019-present): Virtual machine executing random programs. Highly effective against ASICs but requires 2GB memory, excluding phones and resource-constrained devices.
VerusHash 2.1 (Verus): Haraka512-based, optimized for AES-NI. Excellent CPU/GPU parity but not designed for mobile devices.
XelisHash v3 (Xelis, 2024): ChaCha8 + Blake3 with "parallelism tax." Achieves ~1:1 CPU/GPU ratio but borderline for phone L3 cache (544KB scratchpad).
3. Design Goals
UniversalHash v4 was designed with the following prioritized objectives:
3.1 Primary Goals
-
Democratic Accessibility (Phone:Desktop ≤ 1:5)
- Any smartphone manufactured after 2018 should meaningfully participate
- Budget devices (MATH_PLACEHOLDER_15,000)
-
ASIC Resistance (Custom silicon advantage < 3×)
- Economic infeasibility of ASIC development
- Natural adaptation through algorithm parameters if ASICs emerge
-
GPU Neutralization (GPU advantage < 5×)
- Sequential dependencies that prevent massive parallelism
- Random memory access patterns hostile to coalesced reads
3.2 Secondary Goals
-
Implementation Simplicity
- Core algorithm expressible in < 50 lines of code
- Easy auditing and verification
- Portable to WASM, ARM, x86 without exotic dependencies
-
Verification Efficiency
- Block validation in < 5ms on consumer hardware
- Light clients can verify without full scratchpad computation
-
Future-Proofing
- Adjustable parameters without hard forks
- Graceful degradation if hardware landscape changes
4. Comparative Analysis
4.1 CPU-Oriented PoW Algorithms (2025-2026 Landscape)
| Algorithm | Memory | Primitives | Phone Support | CPU/GPU Ratio | ASIC Status |
|---|---|---|---|---|---|
| RandomX | 2080 MB (fast) / 256 MB (light) | VM + AES + Blake2b + Argon2 | ❌ No (memory) | ~1:1 | None (6+ years) |
| VerusHash 2.1 | ~4 KB | Haraka512 (AES-based) | ⚠️ Limited | ~1:0.3 (CPU wins) | FPGA defeated by 2.1 |
| XelisHash v3 | 544 KB | ChaCha8 + Blake3 + AES | ⚠️ Borderline | ~1:1 | None |
| Yescrypt R32 | Variable (8-32 MB) | Scrypt-based | ❌ No | ~1:1 | None |
| GhostRider | 2 MB | 15 algorithms rotating | ⚠️ Limited | ~1:2 | None |
| UniversalHash v4 | 2 MB (4×512KB) | AES + SHA-256 + Blake3 | ✅ Yes | ~1:4-5 | Target: None |
4.2 Detailed Algorithm Comparison
RandomX
Strengths:
- Gold standard for ASIC resistance (no efficient ASIC after 6+ years)
- Virtual machine makes optimization extremely difficult
- Audited by 4 independent security teams
- Successfully deployed on Monero since November 2019
Weaknesses:
- 2GB memory requirement excludes phones (256MB light mode is 4-6× slower)
- Complex implementation (~15,000 lines of C++)
- JIT compilation required for competitive performance
- 32-bit devices cannot run efficiently
ASIC Resistance Mechanism: Random program execution with data-dependent branches, floating-point operations, and memory-hard SuperscalarHash.
VerusHash 2.1
Strengths:
- CPU actually faster than GPU (unique achievement)
- Simple construction based on well-studied Haraka512
- Successfully defeated FPGA attack via algorithm revision
- Hybrid PoW/PoS reduces pure hashrate dependence
Weaknesses:
- Limited adoption and ecosystem
- Relies heavily on AES-NI availability
- Not optimized for mobile devices
- Smaller scratchpad may be vulnerable to future attacks
ASIC Resistance Mechanism: Extreme reliance on AES-NI acceleration; custom silicon would need to match Intel/AMD AES performance.
XelisHash v3
Strengths:
- Elegant "parallelism tax" — memory scales with thread count
- CPU/GPU parity (~1:1 ratio achieved)
- Production-tested on live network (December 2024 hard fork)
- Simpler than RandomX
Weaknesses:
- 544KB scratchpad is borderline for budget phone L3 cache
- Not designed with phones as primary target
- Branching logic adds implementation complexity
- Relatively new, less battle-tested
ASIC Resistance Mechanism: Memory bandwidth wall that scales with parallelism; 16-way branching hurts fixed-function hardware.
4.3 UniversalHash v4 Design Decisions
Based on this analysis, UniversalHash v4 makes the following design choices:
| Decision | Rationale |
|---|---|
| 2MB total memory (4×512KB) | Fits all phone L3 caches; larger than XelisHash for margin |
| 4 parallel chains | Matches budget phone core count; caps server advantage |
| AES + SHA-256 + Blake3 | All have hardware acceleration; Blake3 especially ASIC-hostile |
| Sequential inner loop | Prevents GPU parallelism within single hash |
| Simple XOR merge | Auditable finalization; no complex reduction function |
5. Algorithm Specification
5.1 Parameters
CHAINS = 4 // Parallel computation chains
SCRATCHPAD_KB = 512 // Per-chain scratchpad size (512 KB)
TOTAL_MEMORY = CHAINS × SCRATCHPAD_KB × 1024 // 2 MB total
ROUNDS = 12288 // Iterations per chain
BLOCK_SIZE = 64 // Bytes per memory block (512 bits)
NUM_BLOCKS = SCRATCHPAD_KB × 1024 / BLOCK_SIZE // 8192 blocks per chain
5.2 High-Level Structure
UniversalHash_v4(header, nonce) → hash256:
1. Generate 4 independent seeds from (header, nonce)
2. Initialize 4 scratchpads (512 KB each) using AES expansion
3. Execute 4 parallel mixing chains (12288 rounds each)
4. Merge chain states via XOR
5. Finalize with SHA-256(Blake3(merged_state))
5.3 Detailed Specification
5.3.1 Seed Generation
For each chain c ∈ [0, 3]:
golden_ratio = 0x9E3779B97F4A7C15 // Fibonacci hashing constant
seed[c] = Blake3_256(header || nonce ⊕ (c × golden_ratio))
The golden ratio constant ensures statistically independent seeds even for sequential nonces.
5.3.2 Scratchpad Initialization
Each chain's scratchpad is filled using AES-based expansion:
For chain c:
key = AES_KeyExpand(seed[c][0:16])
state = seed[c][16:32]
For i = 0 to NUM_BLOCKS - 1:
state = AES_4Rounds(state, key)
scratchpad[c][i × 64 : (i+1) × 64] = state || AES_4Rounds(state, key)
This leverages AES-NI on x86 and ARM Crypto Extensions for near-memory-bandwidth-limited initialization.
5.3.3 Chain Mixing (Core Loop)
For chain c (parallelizable across chains):
state = seed[c]
primitive = (nonce + c) mod 3
For i = 0 to ROUNDS - 1:
// Unpredictable address computation
mixed = state[0:8] ⊕ state[8:16] ⊕ rotl64(i, 13) ⊕ (i × 0x517cc1b727220a95)
addr = (mixed mod NUM_BLOCKS) × BLOCK_SIZE
// Memory read (creates latency dependency)
block = scratchpad[c][addr : addr + 64]
// Rotate primitive selection
primitive = (primitive + 1) mod 3
// Apply selected cryptographic primitive
if primitive == 0:
state = AES_Compress(state, block) // 4 AES rounds
else if primitive == 1:
state = SHA256_Compress(state, block) // SHA-256 compression function
else:
state = Blake3_Compress(state, block) // Blake3 compression (7 rounds)
// Write-back (read-after-write dependency)
scratchpad[c][addr : addr + 32] = state[0:32]
5.3.4 State Merge and Finalization
combined = states[0]
For c = 1 to 3:
combined = combined ⊕ states[c]
result = Blake3_256(SHA256_256(combined))
The double-hash finalization provides defense in depth against potential weaknesses in either primitive.
5.4 Primitive Specifications
AES_Compress
Input: 256-bit state, 512-bit block
Process:
state_low = state[0:128]
state_high = state[128:256]
For round = 0 to 3:
state_low = AESENC(state_low, block[round × 128 : (round+1) × 128])
state_high = AESENC(state_high, block[(round+2) × 128 : (round+3) × 128 mod 512])
Output: state_low || state_high
SHA256_Compress
Input: 256-bit state, 512-bit block
Process: Standard SHA-256 compression function
Output: 256-bit compressed state
Blake3_Compress
Input: 256-bit state, 512-bit block
Process: Blake3 compression function (7 rounds, flags=0)
Output: 256-bit compressed state
6. Security Analysis
6.1 ASIC Resistance
Multi-Primitive Requirement: An efficient ASIC must implement optimized circuits for:
- AES encryption (well-understood, ~0.1mm² in modern process)
- SHA-256 compression (well-understood, ~0.05mm²)
- Blake3 compression (less common, ~0.15mm² estimated)
The rotation between primitives prevents specialization. An ASIC optimized for only one primitive would spend 66% of cycles underutilized.
Memory Latency Wall: With 12,288 rounds per chain and random addressing:
- Each round requires a memory read with unpredictable address
- Minimum latency: ~40-80ns for DRAM access
- Theoretical minimum time: 12,288 × 4 chains × 50ns = ~2.5ms per hash
Adding more memory channels provides diminishing returns; the sequential dependency chain is the bottleneck.
Economic Analysis:
- Custom ASIC development: $5-20M
- Expected advantage over CPU: <3× (latency-bound)
- Break-even hashrate: Economically infeasible for reasonable network sizes
6.2 GPU Resistance
Sequential Dependencies: Each round's address computation depends on the previous round's state:
mixed[i] = f(state[i-1]) // Cannot precompute
addr[i] = g(mixed[i]) // Cannot precompute
state[i] = h(state[i-1], scratchpad[addr[i]]) // Must wait
This creates a latency chain that limits effective GPU parallelism to 4 (one per chain).
Random Memory Access: GPU memory systems are optimized for coalesced access (adjacent threads reading adjacent addresses). UniversalHash's random addressing pattern causes:
- Cache thrashing in L1/L2
- Uncoalesced global memory reads
- Memory bandwidth utilization <10% of theoretical peak
Parallel Chain Limit: With only 4 chains, a GPU with 10,000+ cores can only execute 4 concurrent hash attempts per SM. The massive parallelism advantage is eliminated.
6.3 Cryptographic Security
Pre-image Resistance: Finding an input that produces a specific output requires inverting Blake3(SHA256(state)), which inherits the security of both primitives (~256-bit security).
Collision Resistance: The double-hash finalization provides collision resistance from both SHA-256 (~128-bit birthday bound) and Blake3.
Grinding Resistance: The multi-chain XOR merge ensures all chains must be computed; an attacker cannot selectively compute favorable chains.
6.4 Known Attack Vectors
Time-Memory Tradeoff: The 2MB scratchpad is small enough that precomputation attacks are theoretically possible but impractical:
- Storing all possible scratchpad states: 2^256 × 2MB = infeasible
- Partial precomputation: Benefits limited by write-back operation that modifies scratchpad
Side-Channel Attacks: The deterministic primitive rotation (0→1→2→0...) is timing-predictable. However:
- All three primitives have constant-time implementations available
- Rotation pattern is public and identical for all miners
7. Performance Projections
7.1 Estimated Hashrates
Based on analysis of hardware capabilities and algorithm characteristics:
| Device Class | Cores Used | Est. Hashrate | Power | Efficiency (H/J) | Ratio |
|---|---|---|---|---|---|
| Budget Android (2019-2022) | 4 | 600-1,000 H/s | 3-5W | 180-250 | 1× |
| Mid-range Android (2023+) | 6 | 900-1,400 H/s | 4-7W | 200-280 | 1.3-1.5× |
| Flagship Android (2024+) | 8 | 1,200-1,800 H/s | 5-9W | 200-300 | 1.5-2× |
| iPhone 14-16 | 6 | 1,400-2,200 H/s | 5-10W | 220-320 | 1.8-2.5× |
| MacBook Air M2-M4 | 8 | 1,800-3,000 H/s | 10-20W | 150-200 | 2.5-3.5× |
| Gaming Desktop (Ryzen 7/9) | 8 | 2,200-3,800 H/s | 60-100W | 30-50 | 3-4.5× |
| High-end Desktop (i9/Ryzen 9) | 16→8 | 3,000-4,500 H/s | 80-140W | 30-40 | 4-5× |
| Server (EPYC/Xeon) | 32→8 | 4,000-7,000 H/s | 200-350W | 15-25 | 5-8× |
| RTX 4090 | 4 (limited) | 2,500-4,000 H/s | 300-400W | 8-12 | 3.5-5× |
Key Observations:
- Phone vs Desktop: 1:3-5 (target achieved)
- Phone vs Server: 1:5-8 (capped by parallel chain limit)
- Phone vs GPU: 1:3.5-5 (GPU advantage nearly eliminated)
- Per-watt efficiency leader: Smartphones and Apple Silicon
- Per-dollar efficiency leader: Budget Android devices
7.2 Comparison with Other Algorithms
| Algorithm | Phone:Desktop | Phone:GPU | Phone:Server |
|---|---|---|---|
| SHA-256 | 1:10,000+ | 1:1,000,000+ | 1:10,000,000+ |
| RandomX | 1:50-100 | 1:5-10 | 1:200+ |
| XelisHash v3 | 1:15-20 | 1:1 | 1:40-60 |
| VerusHash 2.1 | 1:20-30 | 1:0.3 | 1:50-80 |
| UniversalHash v4 | 1:3-5 | 1:3.5-5 | 1:5-8 |
7.3 Network Security Implications
Assuming a network with 1 billion potential participants:
| Device Distribution | Avg Hashrate | Total Contribution |
|---|---|---|
| 800M budget phones | 800 H/s | 640 GH/s (65%) |
| 150M mid/flagship phones | 1,400 H/s | 210 GH/s (21%) |
| 30M laptops | 2,000 H/s | 60 GH/s (6%) |
| 15M desktops | 3,500 H/s | 52.5 GH/s (5%) |
| 5M gaming PCs | 3,600 H/s | 18 GH/s (2%) |
| 50K servers | 6,000 H/s | 0.3 GH/s (<1%) |
| 100K GPUs | 3,500 H/s | 0.35 GH/s (<1%) |
| Total | ~981 GH/s ≈ 1 TH/s |
In this model, consumer phones contribute 86% of network hashrate despite having lower individual performance, achieving true democratization. Servers and GPUs combined contribute less than 1%.
Realistic Active Mining Scenario:
Not all devices mine continuously. Assuming 1% active participation:
| Active Miners | Network Hashrate |
|---|---|
| 8M budget phones | 6.4 GH/s |
| 1.5M mid/flagship | 2.1 GH/s |
| 300K laptops | 0.6 GH/s |
| 150K desktops | 0.53 GH/s |
| Total Active | ~10 GH/s |
8. Economic Model: Self-Regulating Proof Density
A critical innovation in UniversalHash v4 is its self-regulating economic model. Rather than relying on difficulty adjustment alone to control proof submission rates, we introduce a gas-deducted reward system that creates natural market equilibrium for proof density.
8.1 Core Economics
Net Reward Formula:
net_reward = (epoch_reward × proof_difficulty / total_difficulty) - gas_cost
A rational miner only submits a proof when:
proof_difficulty > gas_cost × total_difficulty / epoch_reward
This creates a dynamic density threshold that emerges from market forces:
┌─────────────────────────────────────────────────────────────────────┐
│ EQUILIBRIUM DYNAMICS │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ High congestion → High gas → Only dense proofs profitable │
│ ↑ ↓ │
│ │ Fewer submissions │
│ │ ↓ │
│ More submissions ←── Low gas ←── Lower congestion │
│ │
└─────────────────────────────────────────────────────────────────────┘
8.2 Permissionless Entry: Deferred Gas Model
The key insight: a valid PoW proof is its own authentication. No signature needed, no prior token balance needed.
// Special transaction type - no upfront payment required
// Proof is self-authenticating
New Miner Journey (Zero Tokens Required):
1. Download app, generate keypair
2. Start mining (no registration, no staking, no tokens)
3. Find hash that beats current min_profitable_difficulty
4. Submit PoWProofTx (no gas payment upfront)
5. Proof included in block (validators know it's economically viable)
6. At epoch end: receive (reward - gas) directly minted to address
7. Now have tokens! Can continue mining or use network
8.3 Epoch Settlement
8.4 Anti-Spam: Minimum Profitable Difficulty
Validators need a rule to reject unprofitable proofs (prevents mempool spam):
8.5 EIP-1559 Style Gas Market
8.6 Equilibrium Analysis
Simulation: How the market finds equilibrium (10-minute epoch, 10M miners):
| Gas Price | Min Profitable Diff | Proofs/Miner/Epoch | Total Proofs | Tx/s |
|---|---|---|---|---|
| 0.0001 | 1× target | ~480 | 4.8B | 8M ❌ |
| 0.001 | 10× target | ~48 | 480M | 800K ❌ |
| 0.01 | 100× target | ~4.8 | 48M | 80K ❌ |
| 0.1 | 1000× target | ~0.48 | 4.8M | 8K ⚠️ |
| 1.0 | 10000× target | ~0.048 | 480K | ~800 ✅ |
The market naturally converges to ~500-1000 tx/s when gas price adjusts to make only dense proofs profitable.
8.7 Economic Attack Resistance
| Attack | Mechanism | Defense |
|---|---|---|
| Spam minimum-difficulty proofs | Flood mempool with cheap proofs | Gas price rises → min_profitable_difficulty rises → spam unprofitable |
| Self-dealing (submit to self as validator) | Validator includes own low-diff proofs | Still pays gas to network; other validators can reject |
| Difficulty gaming | Withhold proofs to manipulate next epoch | Proofs timestamped; late submission rejected |
| Sybil proofs | Split hashrate across addresses | No advantage; same total difficulty, same rewards |
8.8 Summary: Self-Regulating Economics
| Property | Mechanism |
|---|---|
| Permissionless entry | Proof = authentication; gas deducted from rewards |
| Spam resistance | Economic threshold rejects unprofitable proofs |
| Load balancing | EIP-1559 adjusts gas → adjusts proof density |
| Fair distribution | Difficulty-weighted rewards; no pool needed |
| Attack resistance | Economic incentives align with honest behavior |
9. Bostrom Integration: Pool-less Epoch Mining
UniversalHash v4 is designed for integration with Bostrom's high-performance PoS consensus. Rather than traditional PoW block production, we propose an epoch-based proof submission model that leverages Bostrom's existing infrastructure.
9.1 Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ BOSTROM PoS LAYER │
│ (5s blocks, 10k tx/s) │
├─────────────────────────────────────────────────────────────────┤
│ Epoch N │ Epoch N+1 │ Epoch N+2 │ ... │
│ (10 min) │ (10 min) │ (10 min) │ │
├─────────────────┴─────────────────┴─────────────────┴───────────┤
│ PoW PROOF SUBMISSIONS │
│ (Miners submit proofs as transactions) │
└─────────────────────────────────────────────────────────────────┘
Key Insight: PoW doesn't require separate blocks. Miners submit PoW proofs as regular Bostrom transactions, and rewards are distributed at epoch boundaries.
9.2 Pool-less Mining Model
Traditional pools exist because:
- Variance reduction (small miners find blocks infrequently)
- Payout smoothing
- Infrastructure (stratum servers, etc.)
Why pools aren't needed here:
| Factor | Traditional PoW | Bostrom PoW |
|---|---|---|
| Block production | Miners produce blocks | PoS produces blocks |
| Proof submission | Must win race | Submit anytime in epoch |
| Variance | Winner-take-all per block | Proportional per epoch |
| Infrastructure | Need pool servers | Use existing PoS chain |
9.3 Proof Submission Protocol
Miner Workflow:
1. Fetch current epoch parameters (seed, difficulty, gas price)
2. Compute UniversalHash proofs locally
3. When proof meets min_profitable_difficulty: submit as Bostrom tx
4. Continue mining until epoch ends
5. At epoch boundary: receive (reward - gas) minted to address
Proof Transaction Structure (Self-Authenticating):
Why No Signature?
hash = UniversalHash(epoch_seed || miner_address || timestamp || nonce)
If attacker changes miner_address:
→ hash changes → no longer below difficulty → proof invalid
The PoW itself authenticates the miner_address.
This enables permissionless mining with zero prior tokens.
Verification (on-chain):
9.4 Epoch Parameters
Recommended Configuration:
| Parameter | Value | Rationale |
|---|---|---|
| Epoch duration | 10 min (120 PoS blocks) | Fast feedback loop, good UX |
| Target proofs/epoch | 300,000 - 500,000 | ~500 tx/s proof load |
| Difficulty adjustment | Per-epoch | Track active hashrate |
| Proof size | ~120 bytes | Minimal (no signature needed) |
Difficulty Targeting:
target_proofs_per_epoch = 300,000
actual_proofs_last_epoch = measured
new_difficulty = old_difficulty × (actual / target)
max_adjustment = 4× per epoch
Note: The gas market (Section 8) provides additional self-regulation on top of difficulty adjustment.
9.5 Transaction Load Analysis
Scenario: 10M Active Miners with Self-Regulating Economics
With the gas-deducted reward model (Section 8), the market naturally finds equilibrium:
Miners: 10,000,000
Epoch: 10 min = 600 seconds
Gas equilibrium: ~0.03 proofs/miner/epoch (only dense proofs profitable)
Proof submissions: 300K / 600s = 500 tx/s
This leaves 9,500 tx/s for cyberlinks and other operations.
| Scenario | Total Proofs/Epoch | Proof Tx/s | Remaining Capacity |
|---|---|---|---|
| Low gas (early network) | 1M | 1,667 | 8,333 tx/s |
| Equilibrium | 300K | 500 | 9,500 tx/s |
| High congestion | 100K | 167 | 9,833 tx/s |
The gas market automatically throttles proof density when network is congested.
9.6 PoW DAG for Fast Sync (Kaspa-inspired)
For blockchain synchronization without frequent blocks, we can use a lightweight DAG structure:
Epoch N Seed ────┬────────────────────────────────────────┐
│ │
▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Proof 1 │────▶│ Proof 2 │────▶│ Proof 3 │──▶ ...
│ (refs 0)│ │ (refs 1)│ │ (refs 2)│
└─────────┘ └─────────┘ └─────────┘
│ │ │
└──────────────┴──────────────┘
│
▼
Epoch N+1 Seed
DAG Proof Structure:
Benefits:
- Fast sync: New nodes download DAG proofs, verify PoW chain
- Ordering: DAG provides partial ordering within epoch
- No separate PoW blocks: Everything lives on Bostrom chain
- Parallel verification: Multiple proofs can reference same parents
DAG Parameters:
| Parameter | Value | Rationale |
|---|---|---|
| Max parents | 3 | Limit verification cost |
| Parent selection | Random from recent 100 | Prevent centralization |
| DAG depth per epoch | ~1000 layers | ~3.6s average between layers |
9.7 Reward Distribution
Per-Epoch Distribution:
epoch_reward = base_emission × epoch_duration
For each miner m with valid proofs:
work_share[m] = sum(difficulty_of_proofs[m]) / total_difficulty
reward[m] = epoch_reward × work_share[m]
Difficulty-Weighted Shares:
Rather than counting proofs, weight by actual difficulty beaten:
9.8 Optimal Epoch Duration Analysis
For Bostrom integration, "block time" is actually "epoch time". With self-regulating economics (Section 8), the gas market naturally controls proof density regardless of epoch length:
| Epoch Duration | Proofs/Epoch | Tx Load | UX Quality | Mars Compatible |
|---|---|---|---|---|
| 10 minutes | ~300K | ~500 tx/s | Excellent | ❌ (max distance) |
| 20 minutes | ~300K | ~250 tx/s | Good | ⚠️ (partial) |
| 30 minutes | ~300K | ~167 tx/s | Moderate | ✅ (27% window) |
| 45 minutes | ~300K | ~111 tx/s | Okay | ✅ (51% window) |
| 1 hour | ~300K | ~83 tx/s | Slow | ✅ (63% window) |
Note: Proof count is approximately constant due to gas-market equilibrium. Longer epochs just spread the same proofs over more time.
Recommendation: 10-minute epochs (Earth default) provide:
- Fast feedback loop for miners
- ~500 tx/s leaves 9,500 tx/s for other operations
- Self-regulating via gas market (no manual tuning)
- Can extend to 45-60 minutes when Mars participation required (see Section 10)
9.9 Sync Without Pools
Traditional sync problem:
- Light clients need to verify PoW chain
- Without pools, many small miners = slow block times = slow sync
Bostrom solution:
- PoS chain is canonical (fast finality)
- PoW proofs are just transactions
- Sync PoS chain normally
- PoW history is fully embedded
Fast sync protocol:
1. Sync Bostrom PoS chain (existing fast sync)
2. PoW proofs are already included in blocks
3. Optionally verify PoW DAG for specific epochs
4. No separate PoW sync needed!
9.10 Summary: Bostrom PoW Design
| Aspect | Design Choice |
|---|---|
| Consensus | Bostrom PoS (unchanged) |
| PoW role | Fair distribution + device verification |
| Block production | None (proofs are transactions) |
| Pooling | Not needed (proportional rewards) |
| Epoch duration | 10 minutes (Earth default) |
| Target proof rate | ~500 tx/s (~300K proofs/epoch) |
| Gas model | Deferred (deducted from rewards) |
| DAG structure | Optional, for ordering within epoch |
| Difficulty adjustment | Per-epoch, targeting proof count |
| Rewards | Proportional to difficulty-weighted work minus gas |
10. Interplanetary Consensus: Adaptive Epochs
As humanity expands beyond Earth, consensus systems must account for light-speed communication delays. UniversalHash v4 includes provisions for interplanetary mining from genesis.
10.1 The Earth-Mars Latency Problem
| Mars Position | One-Way Delay | Round-Trip |
|---|---|---|
| Closest approach | ~3 min | ~6 min |
| Average distance | ~12.5 min | ~25 min |
| Maximum (opposition) | ~22 min | ~44 min |
Critical insight: For a miner to participate in an epoch, they need:
- Receive epoch parameters (or compute them deterministically)
- Mine proofs
- Submit proofs before epoch ends (plus grace period)
10.2 Why Fixed 10-Minute Epochs Fail for Mars
Epoch length: 10 min = 600 seconds
Mars max latency: 22 min = 1320 seconds (one way)
Even with deterministic epoch seeds (Mars can start immediately):
Useful mining window = epoch_length - submission_latency
= 600 - 1320 = -720s ❌ IMPOSSIBLE
Mars literally cannot participate in 10-minute epochs at maximum distance.
10.3 Epoch Length vs Mars Mining Window
Assuming deterministic epoch seeds (best case, Mars can start immediately):
| Epoch Length | Mars Window (max latency) | Mars:Earth Ratio | Earth UX |
|---|---|---|---|
| 10 min | ❌ Negative | 0% | Excellent |
| 30 min | 8 min | 27% | Good |
| 45 min | 23 min | 51% | Okay |
| 1 hour | 38 min | 63% | Moderate |
| 2 hours | 98 min | 82% | Poor |
10.4 Astronomical Epoch Duration
Since Earth-Mars distance is deterministically computable from orbital mechanics, epochs can adapt automatically:
// JPL ephemeris simplified - production uses SPICE kernels
Epoch schedule over 26-month Mars synodic period:
┌─────────────────────────────────────────────────────────────────┐
│ CLOSE │ RECEDING │ FAR │ APPROACHING │
│ ~3 months │ ~5 months │ ~10 months │ ~8 months │
├─────────────────────────────────────────────────────────────────┤
│ 10 min epochs│ 20 min epochs│ 45-60 min │ 20 min epochs │
│ Fast UX │ Moderate │ Slow but fair│ Moderate │
└─────────────────────────────────────────────────────────────────┘
10.5 Deterministic Epoch Seeds for Zero-Wait Mining
For Mars to start mining immediately without waiting for Earth signals:
// WRONG: Requires waiting for previous epoch results
// RIGHT: Fully deterministic from genesis
With deterministic seeds, Mars can pre-compute future epoch seeds and start mining immediately when each epoch begins.
10.6 Grace Period for Deep Space
Even with adaptive epochs, edge cases exist. Add a grace window for late arrivals:
10.7 Phased Deployment
| Phase | Epoch Duration | Features |
|---|---|---|
| Phase 1 (Earth-only) | 10 min fixed | Standard deployment |
| Phase 2 (Mars prep) | 10 min + deterministic seeds | Enable zero-wait mining |
| Phase 3 (Mars live) | Adaptive (10-60 min) | Astronomical computation on-chain |
| Phase 4 (Outer solar system) | Extended (1-6 hours) | Jupiter, Saturn colony support |
10.8 Summary: Interplanetary Design
| Parameter | Value | Mechanism |
|---|---|---|
| Base epoch (Earth-only) | 10 minutes | Fast UX, low variance |
| Adaptive epoch | 10-60 minutes | Based on Earth-Mars distance |
| Epoch seed | Deterministic | Zero-wait mining for all locations |
| Grace period | +max_latency | Accept late proofs with valid timestamps |
| Epoch computation | On-chain astronomical model | Deterministic, no coordination |
11. Implementation: Verifier (Rust)
The verifier implementation prioritizes correctness, readability, and auditability over raw performance.
//! UniversalHash v4 Reference Implementation - Verifier
//!
//! This implementation is optimized for correctness and clarity.
//! Production deployments should use hardware-accelerated primitives.
use ;
use blake3;
use Aes128;
use ;
/// Algorithm parameters
pub const CHAINS: usize = 4;
pub const SCRATCHPAD_KB: usize = 512;
pub const SCRATCHPAD_SIZE: usize = SCRATCHPAD_KB * 1024;
pub const ROUNDS: usize = 12288;
pub const BLOCK_SIZE: usize = 64;
pub const NUM_BLOCKS: usize = SCRATCHPAD_SIZE / BLOCK_SIZE;
/// Golden ratio constant for seed diversification
const GOLDEN_RATIO: u64 = 0x9E3779B97F4A7C15;
/// Multiplicative constant for address mixing
const MIX_CONSTANT: u64 = 0x517cc1b727220a95;
/// 256-bit state representation
/// Block header structure
/// Main hash function
/// Generate independent seed for chain c
/// Initialize scratchpad using AES expansion
/// Execute the mixing chain for a single chain
/// AES-based compression function
/// SHA-256 compression function
/// Blake3 compression function
/// Final hash computation
/// Verify a proof-of-work solution
12. Implementation: Prover (Multi-Platform)
9.1 C Reference Implementation (Portable)
/**
* UniversalHash v4 - Portable C Implementation
*
* Compile with: gcc -O3 -march=native -o prover prover.c -lcrypto -lblake3
*
* For production: use platform-specific intrinsics (see below)
*/
typedef struct state256_t;
typedef struct block_header_t;
// Rotate left
static inline uint64_t
// Generate seed for chain
static void
// Initialize scratchpad
static void
// AES compress
static void
// SHA-256 compress
static void
// Blake3 compress
static void
// Execute chain
static void
// Finalize
static void
// Main hash function
void
// Mining function
int
9.2 ARM-Optimized Implementation (Mobile/Apple Silicon)
/**
* UniversalHash v4 - ARM NEON + Crypto Extensions
*
* Optimized for: iPhone, Android (ARMv8+), Apple Silicon
* Compile with: clang -O3 -march=armv8-a+crypto -o prover_arm prover_arm.c
*/
// Hardware-accelerated AES round
static inline uint8x16_t
// Hardware-accelerated SHA-256
static inline void
// Optimized scratchpad initialization using NEON AES
static void
9.3 x86-64 Optimized Implementation (Intel/AMD)
/**
* UniversalHash v4 - x86-64 with AES-NI and SHA Extensions
*
* Compile with: gcc -O3 -march=native -maes -msha -o prover_x86 prover_x86.c
*/
// AES-NI round
static inline __m128i
// SHA-NI acceleration
static inline void
// Optimized scratchpad initialization
static void
9.4 WebAssembly Implementation (Browser Mining)
/**
* UniversalHash v4 - WebAssembly Implementation
*
* For browser-based mining via WASM
*/
// AssemblyScript version (compiles to WASM)
// Compile with: asc prover.ts -o prover.wasm -O3
const CHAINS: u32 = 4;
const SCRATCHPAD_SIZE: u32 = 512 * 1024;
const ROUNDS: u32 = 12288;
const BLOCK_SIZE: u32 = 64;
const NUM_BLOCKS: u32 = SCRATCHPAD_SIZE / BLOCK_SIZE;
// Memory layout: 4 scratchpads + working memory
// Total: 4 * 512KB = 2MB + overhead
@inline
: u64
: void
// Execute chains
const states = <StaticArray<u64>>;
// Merge states
const combined = <u64>;
// Finalize and store result
;
}
: StaticArray<u64>
// Export for JavaScript
: u64
13. Integration Guidelines
10.1 Blockchain Integration
Difficulty Adjustment:
new_difficulty = old_difficulty × (target_time / actual_time)
target_time = 60 seconds (recommended)
adjustment_period = 720 blocks (~12 hours)
max_adjustment = 4× per period
Block Header Structure:
struct BlockHeader {
version: u32,
previous_hash: [u8; 32],
merkle_root: [u8; 32],
timestamp: u64,
difficulty_bits: u32,
nonce: u64, // Incremented by miners
extra_nonce: [u8; 8] // Extended search space
}
Validation:
10.2 Mining Pool Protocol
Stratum-compatible work distribution:
Share submission:
10.3 Light Client Verification
For resource-constrained verifiers, provide pre-computed scratchpad commitments:
scratchpad_commitment = Blake3(scratchpad[0..2MB])
Light clients can verify:
- Block hash is valid (full computation)
- OR trust scratchpad commitment from full nodes (SPV-style)
14. Future Work
11.1 Parameter Tuning
Post-launch monitoring should track:
- Actual device hashrate distribution
- ASIC/FPGA development attempts
- GPU optimization progress
- Network decentralization metrics
Parameters adjustable via soft fork:
ROUNDS: Increase if ASICs gain advantageSCRATCHPAD_KB: Increase if GPU optimization succeedsCHAINS: Adjust for new device landscapes
11.2 Hybrid PoW/PoS
UniversalHash is designed to complement stake-based systems:
effective_weight = hash_weight × (1 + stake_multiplier × normalized_stake)
This rewards both computational contribution and economic commitment, further democratizing consensus.
11.3 Useful Work Integration
Future versions may incorporate:
- Federated learning gradient verification
- Scientific computation proofs
- Storage proofs (Proof of Space-Time hybrid)
15. Conclusion
UniversalHash v4 represents a practical approach to democratic proof-of-work mining. By carefully balancing memory requirements, parallel chain constraints, and cryptographic primitive selection, we achieve unprecedented device accessibility without sacrificing security.
Key Achievements:
- Phone:Desktop ratio of 1:3-5 (vs 1:50-100 in RandomX)
- GPU advantage reduced to 3.5-5× (vs 10-100× in traditional algorithms)
- Server advantage capped at 5-8× (vs 1000× in SHA-256)
- Simple, auditable implementation (<50 lines core logic)
- Production-ready with multi-platform support
The billions of smartphones worldwide represent an untapped resource for decentralized consensus. UniversalHash v4 provides the algorithmic foundation to harness this potential, returning proof-of-work to its original vision of democratic participation.
References
- Tevador et al., "RandomX: A Proof-of-Work Algorithm Designed for General-Purpose CPUs," 2019
- Toutonghi, M., "VerusHash: Phase I zk-SNARK Privacy," Verus Foundation, 2019
- XELIS Project, "XelisHash v3 Specification," 2024
- Dryja, T., "Efficient Authenticated Dictionaries with Skip Lists and Commutative Hashing," MIT DCI, 2017
- Biryukov, A., Khovratovich, D., "Argon2: the memory-hard function for password hashing," 2015
- O'Connor et al., "Blake3: One Function, Fast Everywhere," 2020
Appendix A: Test Vectors
Header: 0x0000...0000 (80 zero bytes)
Nonce: 0
Expected Hash: [implementation-specific, to be computed]
Header: 0x0100...0000 (first byte = 1)
Nonce: 12345
Expected Hash: [implementation-specific, to be computed]
Appendix B: License
This specification and reference implementations are released under the MIT License.
MIT License
Copyright (c) 2026 UniversalHash Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.