Complete Feature Taxonomy of hash Functions

A compact reference to every capability a hash function can provide.

1. Classical Security Properties

Property	Definition
Preimage resistance	Given h, finding m such that H(m) = h requires brute force
Second preimage resistance	Given m₁, finding m₂ ≠ m₁ with H(m₁) = H(m₂) requires brute force
Collision resistance	Finding any m₁ ≠ m₂ with H(m₁) = H(m₂) requires brute force
Random oracle behavior	Output indistinguishable from random for any input
Length extension resistance	Given H(m), computing H(m‖suffix) requires knowing m
Near-collision resistance	Finding inputs whose hashes differ in few bits requires brute force
Multi-target resistance	Security holds constant regardless of number of targets

These properties form the foundation of cryptography. See cryptographic proofs for how they compose into larger arguments.

2. Performance Properties

Property	What it means	Examples
Hardware throughput	Raw GB/s on commodity CPUs	BLAKE3 ~2 GB/s, SHA-256 ~500 MB/s
SIMD acceleration	Exploits vector instructions (AVX2/512, NEON)	BLAKE3, SHA-NI
Parallelizability	Splits work across cores for single input	BLAKE3 (tree mode), ParallelHash
Latency	Time-to-first-hash for small inputs	Matters for interactive apps
Constant-time	Free of timing side channels (no secret-dependent branches)	Critical for key derivation
Small message efficiency	Fast for inputs < 1KB	Some sponge constructions have high initialization cost

3. Structural / Compositional Properties

Property	Definition	Who has it
Tree hashing	Built-in Merkle tree mode for parallel hashing of large inputs	BLAKE3, KangarooTwelve, ParallelHash
Incremental hashing	Update hash when input changes without full rehash	Homomorphic hashes (LtHash, AdHash)
Streaming	Process input in chunks without buffering entire input	All sponge/Merkle-Damgård constructions
Verified streaming	Verify data integrity chunk-by-chunk as it arrives (Bao)	BLAKE3 (native via Bao), must be built for others
Slice proofs	Prove integrity of arbitrary byte range without full content	BLAKE3/Bao, any tree-hash with proof extraction
Extendable output (XOF)	Produce arbitrary-length output	SHAKE128/256, BLAKE3, KangarooTwelve
Sponge construction	Absorb-then-squeeze paradigm, yields both hash and XOF	SHA-3/Keccak, Poseidon, Rescue, Tip5
Compression function	Fixed-input-length primitive, used inside Merkle-Damgård or standalone	Poseidon2, Anemoi/Jive
Domain separation	Provably different outputs for different use cases from same primitive	BLAKE3 (key derivation, MAC, hash all from same core)
Duplex construction	Interleave absorb/squeeze for online authenticated encryption	Keccak duplex, Xoodyak

See hash path accumulator for how these compositional properties enable accumulator constructions.

4. Algebraic / Proof-System Properties

The new frontier — properties that matter for zero knowledge proofs, MPC, and FHE.

Property	Definition	Who has it
Arithmetization-friendly	Low multiplicative complexity when expressed as arithmetic circuit over 𝔽ₚ	Poseidon/2, Rescue, Griffin, Anemoi, Tip5, Monolith, MiMC
R1CS-efficient	Few constraints in Rank-1 Constraint Systems (Groth16)	Poseidon, Anemoi (~2× better than Poseidon)
Plonk-efficient	Few constraints in Plonkish arithmetization	Poseidon2, Anemoi (21-35% better than Poseidon)
AIR/STARK-efficient	Low trace width and degree in Algebraic Intermediate Representation	Tip5, Monolith, RPO (Rescue Prime Optimized)
Lookup-compatible	Uses lookup tables for nonlinearity (requires lookup arguments in prover)	Tip5, Monolith, Reinforced Concrete
Field-native	Operates natively over a specific prime field	All AO hashes; field choice matters (Goldilocks field, BN254, BLS12-381)
Low multiplicative depth	Few sequential multiplications (critical for MPC and FHE)	Poseidon2, MiMC
MPC-friendly	Efficient in multi-party computation protocols	Poseidon2, LowMC, PASTA
FHE-friendly	Efficient under Goldilocks homomorphic encryption and other FHE schemes	LowMC, PRINCE, SIMON (low AND-depth)
Recursive-proof friendly	Cheap enough to verify inside itself for proof-carrying data composition	Tip5 (designed specifically for this), Poseidon2

See cyber/stark for STARK verification in the cyber protocol. incrementally verifiable computation relies on recursive-proof friendly hashes.

S-box Design Strategies (determines security/efficiency tradeoff)

Strategy	How it works	Algebraic degree	Examples
Low-degree power map	x → x^d (d=3,5,7)	Low per round, accumulates	Poseidon, Rescue, Griffin
Split-and-lookup	Decompose field element → small S-box per chunk → reassemble	High (~p) from lookup	Tip5, Monolith, Reinforced Concrete
Feistel/Lai-Massey	Structured permutation over field pairs	Depends on round function	Neptune (hash), Anemoi/Flystel
Legendre symbol	x → Legendre(x) ∈ {-1,0,1}	High-degree map, cheap to evaluate	Grendel

Security Tradeoffs in AO Hashes

Attack class	What it exploits	Strong against it	Weak against it
Groebner basis	Low-degree polynomial representation	Lookup-based (Tip5, Monolith)	Low-degree power maps (Poseidon)
Resultant attacks	Structured polynomial systems	—	Poseidon, Griffin, Neptune (CRYPTO 2025 improvements)
CICO problem	Constrained-Input-Constrained-Output on sponge	High-degree S-boxes	Recent breaks on some low-degree constructions
Statistical/differential	Probability propagation through rounds	Power maps	Lookup-based (need careful design)
Subspace trails	Linear subspace propagation through partial rounds	Full-round S-box layers	Partial-round designs (Hades strategy)

5. Content Addressing Properties

Property	Definition	Who has it
Deterministic identity	Same bytes → same hash, always	All raw-byte hashes (distinct from CIDv0/UnixFS)
Deduplication	Identify identical content by hash equality	Universal property
Self-certifying names	Hash IS the unforgeable name of content	ipfs CIDs, BLAKE3 content addresses
Multihash/multicodec	Self-describing hash format (includes algorithm ID + length)	CIDv1 ecosystem, extensible
Content integrity	Verify content matches its hash	Universal
Content availability proofs	Prove you store content without revealing it	Algebraic hashes + STARK (Filecoin)
Provable replication	Prove unique copy exists on specific storage	Filecoin PoRep (slow encoding via VDF-like construction)

In cyber, every particle is content-addressed. See hash function selection for how the protocol chose its hash. data availability strategy covers availability in the cyber context.

6. Homomorphic Properties

Property	Definition	Examples
Additive homomorphism	H(A∪B) = H(A) + H(B) for set hashing	LtHash, AdHash, ECMH, MuHash, HexaMorphHash
Incremental updates	H(S ∪ {x}) = H(S) + H(x), H(S \ {x}) = H(S) - H(x)	LtHash (Facebook), HexaMorphHash
Lattice-based	Security from Short Integer Solution (SIS) problem, post-quantum	HexaMorphHash
Elliptic-curve-based	Security from ECDLP, compact digests	ECMH, MuHash
Multiset hashing	Order-independent hash of collections	All homomorphic hashes above

Use case: databases integrity where elements are frequently added/removed without rehashing everything.

7. Similarity / Fuzzy Hashing Properties

Property	Definition	Examples
Locality-sensitive (LSH)	Similar inputs → same bucket with high probability	MinHash, SimHash, p-stable LSH
Locality-preserving	Preserves neighborhood structure in reduced dimension	LPH, data-dependent methods
Perceptual hashing	Visually similar images → similar hashes	pHash, dHash, DinoHash (adversarially robust, DINOV2-based)
Fuzzy hashing	Similar files → similar hashes (edit-distance aware)	ssdeep, TLSH, sdhash
Minhash	Estimate Jaccard similarity between sets	Used in deduplication, plagiarism detection

Critical difference: cryptography hashes maximize avalanche (1-bit change → 50% bits flip). Similarity hashes minimize it for similar inputs.

8. Time-Based Properties

Property	Definition	Examples
Verifiable Delay Function (VDF)	Requires sequential time T to compute, O(log T) to verify	Wesolowski, Pietrzak (RSA groups)
Time-lock puzzle	Encrypt message recoverable only after T sequential steps	RSA time-lock (Rivest-Shamir-Wagner)
Proof of Sequential Work	Prove T sequential hash iterations were performed	Cohen-Pietrzak (Chia blockchain)
Iterated hashing	H(H(H(...H(x)...))) as proof of elapsed time	SHA-256 chains (Bitcoin-adjacent)
Slow encoding	Transform data such that encoding takes time T, decoding is fast	Filecoin PoRep

VDFs and proof of work both rely on sequential computation. See consensus algorithms for how these properties compose into consensus mechanisms.

9. Quantum Resistance Properties

Property	Status
Grover's algorithm	Halves effective security bits: 256-bit hash → 128-bit quantum security
Hash-based signature	Post-quantum secure (SPHINCS+, XMSS, LMS) — rely only on hash preimage resistance
Lattice-based hash	Post-quantum from SIS/LWE hardness assumptions
VDF quantum vulnerability	RSA/DLP-based VDFs broken by Shor's algorithm; hash-based VDFs survive
AO hash quantum status	Algebraic hashes over large fields: Grover applies to preimage, algebraic structure may introduce additional quantum attack surfaces (under-studied)

10. Protocol-Level Properties

Properties that emerge when hashes are composed into larger data structures:

Property	Definition	Relevant hash feature
Merkle tree	O(log n) proof of membership in set	Any hash; algebraic hashes give cheap in-circuit verification
Merkle Mountain Range (MMR)	Append-only accumulator with O(log n) proofs	Any hash; on-chain MMR needs algebraic hash
Sparse Merkle tree	Prove membership AND non-membership	Any hash; expensive in ZK without algebraic hash
Vector commitment	Commit to vector, open at any position	Poseidon-based, or polynomial commitments
Accumulator	Compact representation of set with membership proofs	RSA accumulator, hash-based (Merkle)
Hash chain	Sequential composition for authentication/signature	MPC-friendly chains need AO hashes
Key derivation (KDF)	Derive keys from shared secret	HKDF (HMAC-based), Argon2
Password hashing	Intentionally slow, memory-hard	Argon2, bcrypt, scrypt
Commitment scheme	H(m‖r) commits to m without revealing it	Any collision-resistant hash
Random oracle	Idealized hash modeling for proof techniques	Instantiated by SHA-256, BLAKE, etc.

See authenticated_graphs for how hash-based structures enable verifiable graph data structures. The cybergraph uses these properties to maintain provable state.

11. The Landscape: Every Named Hash Function Family

Classical (bit-oriented, hardware-optimized)

SHA-256, SHA-3/Keccak, BLAKE2/3, MD5†, RIPEMD-160

AO: Pure Power Map (no lookups)

MiMC, GMiMC, Poseidon, Poseidon2, Neptune, Rescue/RPO, Griffin, Anemoi, Ciminion

AO: Lookup-Based (require lookup arguments)

Reinforced Concrete, Tip5, Monolith, Skyscraper

AO: Legendre/Other

Grendel (Legendre symbol), RAIN

FHE-Friendly

LowMC, PRINCE, SIMON/SPECK, PASTA, Kreyvium

Homomorphic Set Hashing

LtHash, AdHash, MuHash, ECMH, HexaMorphHash

Similarity/Fuzzy

MinHash, SimHash, ssdeep, TLSH, pHash, DinoHash

Password/KDF

Argon2, bcrypt, scrypt, PBKDF2, HKDF, Balloon

VDF-Adjacent

Iterated SHA-256, Sloth, MinRoot, MiMC-based VDFs

See universal hash for a democratic proof of work algorithms approach.

12. Decision Matrix: Which Features Matter For What

Building...	Must-have features	Nice-to-have	Irrelevant
Content-addressed storage	Deterministic identity, fast throughput, verified streaming	Slice proofs, tree hashing	Algebraic, homomorphic
ZK proof system	Arithmetization-friendly, field-native, low constraint count	Lookup compatibility, recursive-proof friendly	Throughput, verified streaming
Provable storage network	Algebraic + content addressing, Merkle proofs in-circuit	Slow encoding (PoRep), availability proofs	Similarity, password hardness
Blockchain consensus	Collision resistance, VDF/PoSW capability	Post-quantum	Algebraic, FHE
MPC protocol	Low multiplicative depth, MPC-friendly	Post-quantum	Throughput, content addressing
Encrypted computation (FHE)	Low AND-depth, FHE-friendly	—	Most other properties
databases integrity	Incremental/homomorphic, fast updates	Post-quantum (HexaMorphHash)	Algebraic, ZK
Document deduplication	Fast throughput, similarity hashing	Fuzzy hashing	Everything else
knowledge graph with proofs	ALL OF: deterministic identity + algebraic + tree hashing + in-circuit Merkle + content addressing	Homomorphic, recursive proofs, VDF	Similarity, password

13. The Impossibility Constraints

Every hash function optimizes for a subset of axes. The fundamental tensions:

Speed vs. Algebraic Efficiency — Bit-oriented operations (rotations, XOR) are fast on CPUs but catastrophically expensive in arithmetic circuits. Algebraic operations (field multiplication, power maps) are cheap in circuits but 10-100× slower on CPUs.
Portability vs. Optimization — Lookup-based hashes (Tip5, Monolith) are faster in proof systems that support lookups, but incompatible with systems that lack lookup arguments. Power-map hashes (Poseidon2) work everywhere but are slower.
Cryptanalysis Maturity vs. Innovation — Newer designs (Tip5, Monolith) may have better theoretical properties but less real-world cryptanalysis. Older designs (SHA-256, even Poseidon) have more known attacks — which paradoxically means better-understood security margins.
Field Choice Locks Ecosystem — An algebraic hash over Goldilocks field is cheap in Plonky2/Miden/Triton VM but expensive in BN254-based SNARKs (Groth16/Ethereum). And vice versa. The field IS the ecosystem choice. See Goldilocks field processor for dedicated hardware.
Similarity vs. Security — Locality-sensitive hashing requires collisions by design. Cryptographic hashing requires collision resistance. These are mathematically opposed goals.

Every hash function is a point in this multidimensional feature space. The art is knowing which dimensions matter for your system. For how cyber navigates this space, see data structure for superintelligence.

Linked References

crypto/hashing

crypto/hash/features.md