honeycrisp.md

honeycrisp

bare-metal Rust drivers for every compute unit on Apple Silicon. experimental, API unstable.

Apple gives you Accelerate — a black box that picks algorithms for you, hides the hardware, and decides what's fast enough. honeycrisp gives you the hardware itself. every NEON lane, every AMX tile, every Metal dispatch, every ANE program — yours to control, yours to schedule, yours to push past what the framework authors thought you'd need.

the focus is workloads Apple never optimized Accelerate for: LLM inference, zero-knowledge proving, and real-time rendering. hand-written NEON and AMX assembly eliminates the abstraction tax — 1.2–10× faster than Accelerate across elementwise, SGEMM, crypto, and media workloads. full benchmark table below.

honeycrisp is also the most complete open-source documentation of Apple Silicon's undocumented hardware. AMX instruction encoding, ANE MIL bytecode format, IOSurface internals, PMU counter access — everything Apple ships without docs, reverse-engineered and captured in Rust code and specs/ files. if you want to understand what the chip actually does, start here.

the hardware

Apple Silicon has three compute units sharing unified memory. each speaks a different protocol: NEON intrinsics and .word-encoded AMX instructions for CPU, Metal framework for GPU, MIL bytecode for ANE. four crates — a shared memory foundation and one driver per compute unit.

architecture

unimem          memory: IOSurface, arena, pool (no internal deps)
  ↑
acpu            driver: CPU/AMX compute (NEON, AMX inline asm)
  ↑               ↑
rane            aruminium       (both depend on unimem + acpu)
ANE hardware    Metal GPU
  ↑ drivers — raw hardware access, no model knowledge
────────────────────────────────────────────────────────
  ↓ runtimes — model graphs, scheduling, inference
cyb/llm         runtime: graph IR, jets, scheduling, model loading

drivers expose raw capabilities. runtimes compose them.

build

cargo build --release --workspace
cargo test --workspace

requires macOS on Apple Silicon (aarch64-apple-darwin).

benchmark

cargo run --release -p acpu --example bench_summary

80 operations across 16 categories, compared against Apple Accelerate, CommonCrypto, and scalar baselines. representative results on M1 Pro (8P+2E):

category	acpu	vs	highlight
elementwise f32	exp, log, tanh, sigmoid, gelu, silu	Apple vvexpf/vvlogf/vvtanhf	1.25–1.55× faster
reductions f32	sum, dot, length, max, min	Apple vDSP/cblas	parity to 1.88×
SGEMM f32	32×32 → 4096×4096	Apple cblas_sgemm	1.01–10× (small sizes dominate)
AI inference	FFN 4K, llama FFN, attention, softmax	Apple cblas + vDSP chain	pipeline ops 1.1–1.4×
media	blend, clamp, RGB↔YUV, histogram, resize	Apple vDSP, scalar	1.3–8× at 1080p
crypto	SHA-256, AES-128, PMULL	CommonCrypto	SHA 7×, PMULL 70×+
ZK Goldilocks	field mul, inv, Poseidon2, NTT	nebu pure Rust	1.1–2×
memory BW	STREAM copy/scale/add/triad	M1 Pro reference	parity (95+ GB/s copy)

full table: cargo run --release -p acpu --example bench_summary

crates

crate	what	crates.io
unimem	zero-copy memory — IOSurface pinned buffers, Tape bump allocator, Grid tensor pool	crates.io
acpu	CPU/AMX compute — NEON vector, AMX matrix, crypto, ZK field arithmetic, PMU	crates.io
aruminium	Metal GPU — shader compile, pipeline, compute dispatch, pre-resolved IMP	crates.io
rane	Apple Neural Engine — MIL compile, SRAM load, hardware dispatch	crates.io

// unimem — one allocation, every device sees it
let block = unimem::Block::open(n * 4)?;

// acpu — AMX matrix multiply, NEON softmax
acpu::matmul_f32(a.as_f32(), b.as_f32(), block.as_f32_mut(), m, n, k);
acpu::vector::softmax(block.as_f32_mut());

// aruminium — Metal GPU compute
let gpu = aruminium::Gpu::open()?;
let buf = gpu.wrap(&block)?;  // zero-copy: MTLBuffer over same physical pages

// rane — ANE hardware dispatch
let program = rane::mil::matmul(64, 64, 64);
let mut model = rane::Program::compile(&program, &[])?;
model.load()?;
model.run(&input, &output)?;

license

cyber license: don't trust. don't fear. don't beg.

Folder

Synonyms

github/subgraphs/honeycrisp

honeycrisp

honeycrisp.md

honeycrisp

the hardware

architecture

build

benchmark

crates

license

Folder

Synonyms

Neighbours