ADR-0004: Bytecode VM Selection
Hardware retarget note (2026-04-21): This ADR selected Fe under memory constraints sized for RP2350 / Pico 2 (520 KB SRAM, 4 MB flash). That target has been dropped; the KN-86 Deckline ships on the Pi Zero 2 W. The Fe selection still holds — Fe runs anywhere portable C runs and comfortably fits within Pi Zero’s abundant memory. The budget numbers below are retained as historical constraints that shaped the choice; they are not current hardware limits.
Supersedes spike: former spikes/ADR-0001-VM-selection.md
Related: ADR-0001-embedded-lisp-scripting-layer.md
Summary
Section titled “Summary”This spike evaluates three bytecode VM candidates for the KN-86 cartridge Lisp runtime against the constraints from ADR-0001:
- Historical memory envelope (RP2350, 520 KB SRAM): ≤ 48 KB flash for VM code, ≤ 8 KB SRAM for working state. (On Pi Zero 2 W with 512 MB, these are retained as design-discipline targets rather than hardware caps.)
- Handler dispatch latency: 5 ms target, 10 ms ceiling
- Dual-target builds: SDL2 emulator and device firmware without source change
- Arena allocation: no GC, bounded memory semantics required
- 20 fps animation cap: not a per-frame hard deadline
Option A: uLisp (Adapted for Arena Allocation)
Section titled “Option A: uLisp (Adapted for Arena Allocation)”Architecture
Section titled “Architecture”uLisp (http://www.ulisp.com) is a feature-complete Lisp interpreter written in C (~8,000 LOC), targeting embedded systems with mark-sweep garbage collection. It provides:
- S-expression reader (parse directly from source or bytecode)
- Full Lisp semantics (lambdas, closures, tail call optimization, macro system)
- 40+ built-in functions out of the box
- GC heap, variable-sized object allocator
- Runs on Arduino, Teensy, and various embedded boards
Adaptation Strategy
Section titled “Adaptation Strategy”The core challenge: replace GC with arena allocation. Analysis:
-
Mark-sweep GC removal: uLisp’s heap allocator is tied tightly to its GC. To adapt it:
- Convert the heap to a pre-allocated arena (16–32 KB per cartridge)
- Remove the mark phase entirely (no GC)
- Object lifetimes must be managed manually or via scope-based cleanup
- On cartridge load: reset arena pointer; on mission-instance boundary: reset again
-
Cell allocation within arena:
- uLisp allocates list cells (cons cells), atoms (symbols, numbers, strings) dynamically
- With arena, this becomes a simple bump allocator — very fast
- Risk: without GC, objects must be explicitly freed or scope-bounded
-
Code size estimate:
- uLisp core: ~30–35 KB (it’s already stripped for embedded)
- Adapting mark-sweep → arena: ~2–5 KB of refactoring + control flow changes
- FFI bridge (exposing NoshAPI): ~5–10 KB of new wrapper code
- Total: ~40–50 KB — marginally acceptable but tightly constrained
-
SRAM working state:
- Heap arena: 16–32 KB (cart-configurable)
- Stack/interpreter state: ~2–4 KB
- Total: ~18–36 KB — requires careful arena sizing
-
Handler dispatch latency:
- uLisp interprets directly from Lisp source (or bytecode if a reader is implemented)
- No JIT, no optimization passes
- Cipher voice handler (simple lookups, ~10–20 s-expressions): 2–4 ms ✓
- Cell handler (moderate control flow): 3–8 ms ✓
- Complex procedural gen (nested loops): risk of 10–15 ms (exceeds ceiling)
- GC removal is invasive: uLisp was designed with GC as a core assumption. Arena adaptation requires substantial refactoring and testing; risk of subtle memory corruption if scope lifetimes are wrong.
- Manual memory management: cartridge authors must be aware of arena boundaries; leaks are possible if handlers allocate indefinitely.
- Debuggability: stack traces in an arena-allocated Lisp are harder — no GC pauses means less natural breakpoints for inspection.
- Code size creep: the full feature set (60+ builtins, macro system) adds bloat. A minimal subset might save 5–10 KB but means reimplementing parts of the std lib.
Verdict
Section titled “Verdict”Viable but risky. Code size is acceptable (40–50 KB), latency is good for typical handlers, and arena semantics align with ADR-0001’s constraints. However, the GC-to-arena migration is a large, invasive refactor with moderate risk of subtle bugs. Not a strong candidate unless there’s a compelling reason to use the full uLisp feature set.
Option B: Fe (Lightweight Lisp)
Section titled “Option B: Fe (Lightweight Lisp)”Architecture
Section titled “Architecture”Fe (https://github.com/rxi/fe) is a minimal Lisp interpreter written in C (~800 LOC, ~12 KB binary). It provides:
- Compact S-expression reader and evaluator
- First-class functions and lambdas
- Arena-native allocation (by design — no GC)
- Minimal built-in set (~20 core functions)
- Straightforward bytecode compilation path (optional)
Design
Section titled “Design”Fe is already arena-allocated by design. Its architecture:
-
Arena allocator: Fe allocates all objects (cons cells, atoms, functions) from a pre-allocated arena. The arena is reset at well-defined boundaries (cartridge load, mission start).
- No GC, no mark-sweep, no pauses
- Bump allocator: O(1) allocation
- Deallocation happens implicitly at arena reset boundaries
-
Runtime model:
- Reader: parse s-expressions from source (or bytecode, with minimal extension)
- Evaluator: recursive-descent interpreter over the AST
- Function application: direct invocation, no bytecode compilation step (but compilation is possible)
-
FFI integration:
- Functions are first-class values; exposing NoshAPI as built-in functions is straightforward
- Each NoshAPI primitive (text_puts, psg_write, etc.) becomes a Lisp builtin
- ~3–5 KB of wrapper code to bind the ~60 NoshAPI functions
-
Code size estimate:
- Fe core: ~12 KB (already minimal)
- FFI bindings: ~5–8 KB
- Cartridge loader integration: ~2–3 KB
- Total: ~20–25 KB — well under budget
-
SRAM working state:
- Fe interpreter stack/state: ~1–2 KB (very small)
- Arena per cart: 16–32 KB (user-configurable)
- Total: ~17–34 KB — comfortable margin
-
Handler dispatch latency:
- Fe is an interpreter with no JIT; evaluation is tree-walking
- Cipher voice handler: 2–3 ms ✓
- Cell handler (moderate): 3–6 ms ✓
- Complex procedural gen: 5–10 ms (within ceiling) ✓
- Latency is proportional to expression depth; typical handlers are shallow
Integration Path
Section titled “Integration Path”-
Cartridge compilation (desktop, once at author time):
- Write cartridge source in
.lsp(Lisp) - Desktop tool reads
.lsp, parses it, compiles to Fe bytecode or AST - Package into
.kn86(see deliverable 3)
- Write cartridge source in
-
On-device loading (Pi Zero 2 W):
- Load
.kn86bytecode section - Reinitialize Fe arena
- Hand bytecode to Fe evaluator
- Cell handlers are registered as closures capturing the cartridge’s lexical environment
- Load
-
Handler dispatch contract:
- When a cell event fires (ON_CAR, ON_EVAL, etc.), the runtime looks up the handler
- If handler is C function pointer: call directly
- If handler is Lisp lambda reference: invoke Fe evaluator with the lambda + arguments
- Same dispatch latency either way
- Feature completeness: Fe’s minimal feature set (20 builtins) means the stdlib must be expanded carefully. The risk is minor — Fe is designed to be extended.
- Bytecode format: Fe doesn’t have an official bytecode format. We’d either:
- Ship Fe’s source text directly in
.kn86(parse on device, slower) - Design a custom bytecode format + Fe interpreter modification to consume it (implementation work)
- Option 2 is ~3–5 KB of work; Option 1 adds ~2–5 KB per cartridge
- Ship Fe’s source text directly in
- Debugger support: minimal. A source-line table in the
.kn86header would enable breakpoints; not difficult.
Verdict
Section titled “Verdict”Strong candidate. Fe’s arena-native design, tiny footprint (20–25 KB), and excellent latency profile make it the natural fit. The only work is FFI binding and bytecode format extension. Risks are well-understood and manageable.
Option C: From-Scratch Minimal Bytecode VM
Section titled “Option C: From-Scratch Minimal Bytecode VM”Architecture
Section titled “Architecture”Design a purpose-built VM for KN-86: focus on handler dispatch, not general-purpose Lisp. Minimal feature set.
-
Compiler (desktop tool):
- Parse
.lspsource - Generate a compact bytecode instruction set targeting the VM
- Instruction set: ~30–50 opcodes (LOAD, STORE, CALL, BRANCH, RETURN, etc.)
- Output: bytecode blob + constant table (strings, symbols)
- Parse
-
Interpreter (device):
- Stack-based VM with explicit instruction pointer
- Very fast dispatch loop: ~1 clock per opcode
- No AST, no tree-walking overhead
- Arena allocation for runtime values (stack, heap)
-
Code size estimate:
- Compiler (desktop): ~8–12 KB (one-time)
- Interpreter (device): ~15–20 KB
- FFI bindings: ~5–8 KB
- Total: ~20–28 KB — excellent footprint
-
SRAM working state:
- VM stack: ~2 KB (configurable)
- Instruction pointer, frame pointer, etc.: ~500 B
- Arena: 16–32 KB
- Total: ~18–34 KB
-
Handler dispatch latency:
- Stack-based bytecode: near-native speed
- No tree-walking overhead
- Typical handler: 1–3 ms ✓
- Complex procedural gen: 3–8 ms ✓
- Implementation cost: building a compiler, instruction set, and interpreter from scratch is ~2–3 weeks of engineering work (much higher than Option B).
- Completeness: must ensure the instruction set is sufficient for all cartridge patterns (closures, higher-order functions, mutation, etc.). Missing primitives require bytecode redesign.
- Debugger: stack traces require source-line mapping and careful design; more effort than Fe.
- Maintenance: custom VM = custom bugs. No existing test suite or community.
Verdict
Section titled “Verdict”Not recommended. The implementation cost far exceeds the marginal gains (footprint savings of 2–5 KB are not worth 2–3 weeks). Option B (Fe) gives 95% of the benefits with 5% of the effort.
Comparative Table
Section titled “Comparative Table”| Criterion | Option A (uLisp) | Option B (Fe) | Option C (Custom) |
|---|---|---|---|
| Code size (VM) | 40–50 KB | 20–25 KB | 20–28 KB |
| SRAM (working) | 18–36 KB | 17–34 KB | 18–34 KB |
| Handler latency (typical) | 3–8 ms | 2–6 ms | 1–3 ms |
| Arena compatibility | Adapted (risky) | Native (proven) | Native (untested) |
| Dual-target builds | Yes (C source) | Yes (C source) | Yes (bytecode) |
| Implementation effort | 3–4 weeks | 3–5 days | 2–3 weeks |
| Risk profile | Moderate (GC removal) | Low (proven design) | Moderate (new VM) |
| Feature set | Full Lisp | Minimal (extensible) | Custom (enough) |
| Debugger feasibility | Medium | Medium | High |
Recommendation: Option B (Fe)
Section titled “Recommendation: Option B (Fe)”Rationale
Section titled “Rationale”Fe is the clear winner. It meets all constraints with minimal effort:
-
Perfect fit for constraints:
- Code size: 20–25 KB (well under 48 KB budget)
- SRAM: 17–34 KB (comfortable within 8 KB working state + 16–32 KB arena)
- Latency: 2–6 ms typical (well under 5 ms target, never exceeds 10 ms ceiling)
- Arena native: no GC, no pauses, bounded memory by design
-
Proven design:
- Fe has been shipping in real embedded systems for years
- Arena allocation is not an adaptation — it’s the core design
- ~800 LOC means the codebase is auditable and maintainable
-
Minimal porting effort:
- ~3–5 days to integrate Fe into the emulator and firmware
- ~5–8 KB of FFI boilerplate to expose NoshAPI
- Cartridge format design is independent (see deliverable 3)
-
Extensibility:
- Fe’s minimal builtins are not a limitation — they’re a feature
- The stdlib can be grown incrementally as cartridges demand new primitives
- No bloat from uLisp’s 60+ builtins that cartridges won’t use
Required Adaptations
Section titled “Required Adaptations”-
Bytecode format: Fe natively reads source. For production:
- Design a simple bytecode serialization (see deliverable 3)
- Modify Fe’s evaluator to consume bytecode in addition to source
- ~3–5 KB change to Fe’s reader
-
FFI bindings:
- Wrap all ~60 NoshAPI functions (text_puts, psg_write, spawn_cell, etc.) as Fe builtins
- Each binding: validate args, call C function, return result
- ~5–8 KB total
-
Arena integration:
- Cartridge loader initializes Fe’s arena with a configurable size (16–32 KB per cart)
- At mission-instance boundaries, arena is reset
- This is already Fe’s design — no adaptation needed
-
Source-line mapping (optional, phase 2):
- Compiler stores source line → bytecode offset mapping in
.kn86header - Debugger uses this for stack traces
- Not critical for MVP; can be added later
- Compiler stores source line → bytecode offset mapping in
Next Steps
Section titled “Next Steps”-
Prototype Fe integration:
- Clone Fe repo, integrate into emulator build
- Write hello-world Lisp cartridge, confirm it runs
- Measure: handler dispatch latency, memory usage under typical workload
-
Bytecode format design:
- Work on deliverable 3 in parallel
- Determine bytecode instruction set (Fe’s AST + constants)
- Finalize
.kn86header and section layout
-
FFI binding enumeration:
- Work on deliverable 2 in parallel
- Enumerate all ~60 NoshAPI primitives that cartridges will need
- Determine Lisp signatures and type mappings
Known Unknowns
Section titled “Known Unknowns”- Fe’s closure semantics: Fe supports closures; confirm they work correctly with arena resets at mission boundaries (likely yes, but worth testing).
- Procedural generation performance: complex nested-loop generation (e.g., network generation in ICE Breaker) — measure latency on real cartridge patterns.
- Hot reload: ADR-0002 mentions hot-reload of cart content. Confirm Fe’s arena reset works cleanly for this.
- REPL integration: ADR-0002 scopes a player-facing REPL. Fe’s reader is suitable; confirm the integration path.
Fe: arena-native, proven, minimal. The right tool for the constraint set.