Skip to content

Fe VM Benchmark Results

Fe version: 1.0 (rxi/fe, 879 LOC) Compiler: Apple clang, -O2 Related: ADR-0004 (VM Selection), ADR-0001 (perf budgets)


Four benchmarks validate the three risks flagged in ADR-0004 and the performance budgets from ADR-0001:

  1. Closure Arena — Creates closures in an 8 KB arena (the ADR-0001 working SRAM budget). Measures max closure count, arena reset safety, allocation latency, and representative cart workload fit.

  2. Dispatch Latency — Invokes three handler variants (simple conditional, complex with FFI calls, string operations) 1000 times each. Measures p50/p95/p99/p100 latency per invocation.

  3. Procgen Latency — Generates ICE Breaker-style network topologies (8 and 16 nodes) and runs per-frame tick updates (100 frames). Measures against the 20 fps frame budget and the 500 ms “feels instant” generation budget.

  4. Flash Footprint — Links Fe runtime + 20 representative FFI stubs. Measures .text + .rodata via size command.

All benchmarks use the KN86 test framework (kn86_test.h) and fail on budget violation. CTest integration: ctest -R bench.


MetricMeasuredBudgetStatus
Closures in 8 KB arena15≥ 10PASS
Arena reset safetyClean (no crash/corruption)Must not crashPASS
Closures after resetCorrect valuesMust workPASS
Allocation latency (1000 objects)0.004 ms (4 ns/alloc)< 100 msPASS
Cart workload (16 KB arena)20/20 handlers, 10/10 strings≥ 15 handlers, ≥ 5 stringsPASS

Key finding: An 8 KB arena supports 15 concurrent closures. This is sufficient for a typical cartridge (5 cell types x 2-3 handlers = 10-15 closures). A 16 KB arena (the ADR-0004 recommended “typical cart allocation”) comfortably fits a full cartridge with 20 handlers + 10 string constants with room to spare.

Handler variantp50p95p99p100Budget (p99/p100)Status
Simple (conditional + FFI)< 1 us1 us1-2 us1-4 us5000 / 10000 usPASS
Complex (nested + arithmetic + FFI)< 1 us1 us2 us2 us5000 / 10000 usPASS
String (GC pressure + FFI)< 1 us1 us2 us4 us5000 / 10000 usPASS

Key finding: Handler dispatch latency is 3 orders of magnitude below the ADR-0001 budget on desktop. Pi Zero 2 W (Cortex-A53 @ 1 GHz) is roughly 3-5x slower than the Apple Silicon bench; handlers will still clear the 5 ms ceiling by a wide margin.

Procgen Latency (ADR-0004 Risk #2, procgen variant)

Section titled “Procgen Latency (ADR-0004 Risk #2, procgen variant)”
WorkloadMeasuredBudgetStatus
8-node network generation3 us50,000 us (20 fps frame)PASS
16-node network generation4 us500,000 us (instant)PASS
Per-frame tick (8 nodes, 100 frames)5 us max50,000 us (20 fps frame)PASS

Key finding: Procgen is extremely fast. The 20 fps frame budget has >10,000x headroom on desktop. Pi Zero 2 W has orders of magnitude of margin remaining.

ComponentSizeBudgetStatus
__text (code)16,492 bytes
__cstring (string literals)1,435 bytes
__const (constants)336 bytes
Total .text + .rodata18,263 bytes (~18 KB)49,152 bytes (48 KB)PASS

Key finding: Fe runtime + 20 FFI stubs uses 18 KB — 37% of the 48 KB budget. This leaves 30 KB for additional FFI bindings (the full 54-primitive ADR-0005 surface), the cartridge loader, and future expansion. The budget is generous.

Note: This is a macOS/Apple Silicon measurement. Cortex-A53 code size on the Pi Zero 2 W will differ slightly (AArch64 instruction encoding vs x86-64), but there is no flash budget to fit — the device has a full SD filesystem.


Status: PENDING — Benchmarks have not yet been re-run on a Pi Zero 2 W. The Fe VM is pure portable C and requires no platform port; the same benchmark binaries will run under Linux on the device once bring-up time is available.

Desktop-to-Pi Zero scaling estimate (rough):

  • Latency: Pi Zero 2 W @ 1 GHz Cortex-A53 vs desktop @ ~3 GHz Apple Silicon = roughly 3-5x slower for this workload (integer-heavy, cache-friendly). Desktop p99 of 2 us → estimated ~10 us on Pi Zero. Still well under 5 ms budget.
  • RAM: Identical arena sizes apply; the 16 KB cart arena is trivial on 512 MB.
  • Storage: The flash-footprint measurement above (18 KB) is academic on a Pi Zero — the device has gigabytes of SD storage. It still matters for verifying the Fe runtime stays compact.

All three flagged risks have been validated:

  1. Closure arena semantics (Risk #1): CONFIRMED SAFE. Fe’s mark-sweep GC within the arena works correctly. Arena reset via fe_close() + fe_open() cleanly invalidates old state. Closures captured before reset do not dangle — the entire context is discarded. The policy recommendation: reset arenas at mission boundaries (already the intended design). 8 KB supports 15 closures; 16 KB supports a full cartridge workload.

  2. Procgen latency (Risk #2): CONFIRMED WITHIN BUDGET. Both network generation and per-frame tick updates complete in single-digit microseconds on desktop. Even with a 5x desktop-to-Pi Zero scaling factor, all operations stay well under the 50 ms frame budget. No need to push procgen primitives back into C.

  3. Hot-reload (Risk #3): DEFERRED. Per Josh’s nEmacs-slips decision, hot-reload is post-launch. Not benchmarked. Arena reset semantics validate that the mechanism would work (close + reopen context with new source).

  • Fe lacks > and >= operators. Only < and <= are built-in. FFI should expose > and >= as convenience builtins or document the (< b a) pattern. Minor ergonomic issue, not a performance concern.
  • Fe’s let is single-binding, unlike Scheme. Cart authors need to use (= var val) for multiple sequential bindings or nest let forms. Document in the cartridge authoring guide.
  • Flash budget has 30 KB headroom. Sufficient for the full 54-primitive FFI surface (ADR-0005) plus cartridge loader code.

BenchmarkBudgetMeasuredVerdict
Closure arena (8 KB)≥ 10 closures15 closuresPASS
Cart workload (16 KB)≥ 15 handlers20 handlersPASS
Dispatch p99≤ 5,000 us1-2 usPASS
Dispatch p100≤ 10,000 us1-4 usPASS
Procgen frame≤ 50,000 us3-5 usPASS
Procgen total≤ 500,000 us3-4 usPASS
Flash footprint≤ 48 KB~18 KBPASS