rane
every Mac has a neural engine. no one lets you use it.
rane is a pure Rust driver for Apple Neural Engine — the 15.8 TOPS accelerator sitting idle inside every Apple Silicon chip. no ObjC compiler. no Swift. no CoreML. no Python. no dependencies.
Apple provides zero public API for ANE. CoreML decides if and when your model runs on the neural engine — you have no control and no visibility. rane bypasses CoreML entirely: three private frameworks loaded via dlopen, MIL bytecode format reverse-engineered from CoreML model bundles, weight blob headers decoded from binary inspection. specs/ and docs/explanation/ capture everything discovered about ANE internals — the compilation pipeline, the XPC protocol to aned, the IOKit driver interface, the SRAM upload lifecycle.
experimental. API unstable.
compile a MIL program, load it into ANE SRAM, dispatch, read the output.
use rane::{Program, Buffer};
let program = rane::mil::matmul(64, 64, 64);
let mut model = Program::compile(&program, &[]).unwrap();
model.load().unwrap();
let input = Buffer::new(program.input_size()).unwrap();
let output = Buffer::new(program.output_size()).unwrap();
model.run(&input, &output).unwrap();
requires macOS + Apple Silicon.
numbers
M1 Pro:
surface create (1 KB): 0.07 ms
MIL compile (64x64): 18 ms
compile + load + unload: 23 ms
dispatch overhead: 0.24 ms
matmul throughput scaling:
64×64×64: 143us 0.004 TFLOPS
256×256×256: 161us 0.209 TFLOPS
1024×1024×256: 332us 1.619 TFLOPS
vs CoreML predict path: ~10-20x lower dispatch overhead. rane skips MLFeatureProvider, NSDictionary wrapping, MLMultiArray extraction. goes straight from IOSurface → ANE → IOSurface.
zero-copy memory
ANE surfaces can use unimem::Block — IOSurface-backed pinned memory shared with CPU (acpu) and GPU (aruminium):
use rane::{Block, Program};
let block = Block::open(size)?;
// fill block with fp16 data via block.as_u16_mut()
unsafe { program.run_direct(block.handle(), out.handle())? };
// ANE reads/writes same physical pages — zero copies
one allocation. three devices. no copies.
why this exists
Apple ships a 15.8 TOPS neural accelerator in every M-series chip. the only official way to use it is CoreML — a framework that decides when (and whether) your model runs on ANE. no direct access. no documentation. no public API.
rane reaches ANE hardware through the same private ObjC classes that
CoreML uses internally. it calls objc_msgSend directly from Rust —
no ObjC compiler needed, objc_msgSend is just a C function. three
private frameworks loaded via dlopen. weight data passes through
IOSurfaces — kernel-managed shared memory, zero copies.
the stack
your Rust code
→ rane crate (objc_msgSend to libobjc)
→ AppleNeuralEngine.framework (dlopen)
→ MIL text → ANE bytecode
→ XPC to aned daemon
→ IOKit H11ANEIn driver
→ ANE hardware
three barriers. three bypasses:
| barrier | how rane gets through |
|---|---|
| IOKit entitlement | XPC to aned (the daemon that has the entitlement) |
| private frameworks | dlopen at runtime, class names via ObjC runtime |
| undocumented MIL format | reverse-engineered from CoreML model bundles |
api
// compile MIL to ANE bytecode
Program::compile(program: &Source, weights: &[(&str, &[u8])]) -> Result<Self>
// upload bytecode to ANE SRAM
model.load() -> Result<()>
// dispatch on hardware (synchronous)
model.run(input: &Buffer, output: &Buffer) -> Result<()>
// free SRAM (also automatic on drop)
model.unload() -> Result<()>
// shared-memory tensor buffer (IOSurface)
Buffer::new(bytes) -> Result<Self>
Buffer::with_shape(channels, spatial) -> Result<Self>
surface.read(|&[u16]|)
surface.write(|&mut [u16]|)
// MIL program builder
rane::mil::matmul(ic, oc, seq) -> Source
// fp16 conversion (NEON-accelerated)
rane::f32_to_fp16(f32) -> u16
rane::fp16_to_f32(u16) -> f32
rane::cast_f32_f16(&mut [u16], &[f32]) // bulk NEON
rane::cast_f16_f32(&mut [f32], &[u16]) // bulk NEON
build
structure
two crates. the core library has zero external dependencies.
src/ rane crate — zero deps, system frameworks only
lib.rs Program, Buffer, Source, AneError
model.rs compile / load / run / unload
surface.rs IOSurface wrapper, NEON fp16
ffi.rs IOKit, CoreFoundation, IOSurface, libobjc
mil/mod.rs MIL program builder (matmul, header, footer)
probe/ rane_probe — 7-level reverse engineering tool
examples/
matmul.rs 64x64 matmul on ANE hardware
benches/ separate crate with bench + compare binaries
tests/
integration.rs 13 tests: surface, fp16, MIL, model lifecycle
specs/
api.md API specification with Apple ObjC mapping
docs/
tutorials/ first-dispatch — step-by-step first ANE program
guides/ surface-lifecycle — create, read, write, release
explanation/ how-ane-works — three barriers, three bypasses
model-specific code (transformer kernels, CPU ops, training, inference, tools) lives in cyb/llm where rane serves as the ANE backend driver.
contributing
we welcome pull requests. especially:
- hardware — tested only on M1 Pro. M2/M3/M4 reports are gold
- MIL operations — more verified ops expand what rane can compile
- performance — faster compile/load/dispatch cycles
- safety — soundness fixes in the FFI layer
license
cyber license: don't trust. don't fear. don't beg.