ADR-0004: Bytecode VM Selection
Hardware retarget note (2026-04-21): This ADR selected Fe under memory constraints sized for RP2350 / Pico 2 (520 KB SRAM, 4 MB flash). That target has been dropped; the KN-86 Deckline ships on the Pi Zero 2 W. The Fe selection still holds — Fe runs anywhere portable C runs and comfortably fits within Pi Zero’s abundant memory. The budget numbers below are retained as historical constraints that shaped the choice; they are not current hardware limits.
Supersedes spike: former spikes/ADR-0001-VM-selection.md
Related: ADR-0001-embedded-lisp-scripting-layer.md
Summary
Section titled “Summary”This spike evaluates three bytecode VM candidates for the KN-86 cartridge Lisp runtime against the constraints from ADR-0001:
- Historical memory envelope (RP2350, 520 KB SRAM): ≤ 48 KB flash for VM code, ≤ 8 KB SRAM for working state. (On Pi Zero 2 W with 512 MB, these are retained as design-discipline targets rather than hardware caps.)
- Handler dispatch latency: 5 ms target, 10 ms ceiling
- Dual-target builds: SDL3 emulator and device firmware without source change
- Arena allocation: no GC, bounded memory semantics required
- 20 fps animation cap: not a per-frame hard deadline
Option A: uLisp (Adapted for Arena Allocation)
Section titled “Option A: uLisp (Adapted for Arena Allocation)”Architecture
Section titled “Architecture”uLisp (http://www.ulisp.com) is a feature-complete Lisp interpreter written in C (~8,000 LOC), targeting embedded systems with mark-sweep garbage collection. It provides:
- S-expression reader (parse directly from source or bytecode)
- Full Lisp semantics (lambdas, closures, tail call optimization, macro system)
- 40+ built-in functions out of the box
- GC heap, variable-sized object allocator
- Runs on Arduino, Teensy, and various embedded boards
Adaptation Strategy
Section titled “Adaptation Strategy”The core challenge: replace GC with arena allocation. Analysis:
-
Mark-sweep GC removal: uLisp’s heap allocator is tied tightly to its GC. To adapt it:
- Convert the heap to a pre-allocated arena (
16–32 KB per cartridge256 KB per cartridge per Amendment 2026-04-27 / GWP-233) - Remove the mark phase entirely (no GC)
- Object lifetimes must be managed manually or via scope-based cleanup
- On cartridge load: reset arena pointer; on mission-instance boundary: reset again
- Convert the heap to a pre-allocated arena (
-
Cell allocation within arena:
- uLisp allocates list cells (cons cells), atoms (symbols, numbers, strings) dynamically
- With arena, this becomes a simple bump allocator — very fast
- Risk: without GC, objects must be explicitly freed or scope-bounded
-
Code size estimate:
- uLisp core: ~30–35 KB (it’s already stripped for embedded)
- Adapting mark-sweep → arena: ~2–5 KB of refactoring + control flow changes
- FFI bridge (exposing NoshAPI): ~5–10 KB of new wrapper code
- Total: ~40–50 KB — marginally acceptable but tightly constrained
-
SRAM working state:
- Heap arena:
16–32 KB256 KB (cart-configurable; default per Amendment 2026-04-27 / GWP-233) - Stack/interpreter state: ~2–4 KB
- Total: ~18–36 KB — requires careful arena sizing
- Heap arena:
-
Handler dispatch latency:
- uLisp interprets directly from Lisp source (or bytecode if a reader is implemented)
- No JIT, no optimization passes
- Cipher voice handler (simple lookups, ~10–20 s-expressions): 2–4 ms ✓
- Cell handler (moderate control flow): 3–8 ms ✓
- Complex procedural gen (nested loops): risk of 10–15 ms (exceeds ceiling)
- GC removal is invasive: uLisp was designed with GC as a core assumption. Arena adaptation requires substantial refactoring and testing; risk of subtle memory corruption if scope lifetimes are wrong.
- Manual memory management: cartridge authors must be aware of arena boundaries; leaks are possible if handlers allocate indefinitely.
- Debuggability: stack traces in an arena-allocated Lisp are harder — no GC pauses means less natural breakpoints for inspection.
- Code size creep: the full feature set (60+ builtins, macro system) adds bloat. A minimal subset might save 5–10 KB but means reimplementing parts of the std lib.
Verdict
Section titled “Verdict”Viable but risky. Code size is acceptable (40–50 KB), latency is good for typical handlers, and arena semantics align with ADR-0001’s constraints. However, the GC-to-arena migration is a large, invasive refactor with moderate risk of subtle bugs. Not a strong candidate unless there’s a compelling reason to use the full uLisp feature set.
Option B: Fe (Lightweight Lisp)
Section titled “Option B: Fe (Lightweight Lisp)”Architecture
Section titled “Architecture”Fe (https://github.com/rxi/fe) is a minimal Lisp interpreter written in C (~800 LOC, ~12 KB binary). It provides:
- Compact S-expression reader and evaluator
- First-class functions and lambdas
- Arena-native allocation (by design — no GC)
- Minimal built-in set (~20 core functions)
- Straightforward bytecode compilation path (optional)
Design
Section titled “Design”Fe is already arena-allocated by design. Its architecture:
-
Arena allocator: Fe allocates all objects (cons cells, atoms, functions) from a pre-allocated arena. The arena is reset at well-defined boundaries (cartridge load, mission start).
- No GC, no mark-sweep, no pauses
- Bump allocator: O(1) allocation
- Deallocation happens implicitly at arena reset boundaries
-
Runtime model:
- Reader: parse s-expressions from source (or bytecode, with minimal extension)
- Evaluator: recursive-descent interpreter over the AST
- Function application: direct invocation, no bytecode compilation step (but compilation is possible)
-
FFI integration:
- Functions are first-class values; exposing NoshAPI as built-in functions is straightforward
- Each NoshAPI primitive (text_puts, psg_write, etc.) becomes a Lisp builtin
- ~3–5 KB of wrapper code to bind the ~60 NoshAPI functions
-
Code size estimate:
- Fe core: ~12 KB (already minimal)
- FFI bindings: ~5–8 KB
- Cartridge loader integration: ~2–3 KB
- Total: ~20–25 KB — well under budget
-
SRAM working state:
- Fe interpreter stack/state: ~1–2 KB (very small)
- Arena per cart:
16–32 KB256 KB (user-configurable; default per Amendment 2026-04-27 / GWP-233) - Total: ~17–34 KB — comfortable margin
-
Handler dispatch latency:
- Fe is an interpreter with no JIT; evaluation is tree-walking
- Cipher voice handler: 2–3 ms ✓
- Cell handler (moderate): 3–6 ms ✓
- Complex procedural gen: 5–10 ms (within ceiling) ✓
- Latency is proportional to expression depth; typical handlers are shallow
Integration Path
Section titled “Integration Path”-
Cartridge compilation (desktop, once at author time):
- Write cartridge source in
.lsp(Lisp) - Desktop tool reads
.lsp, parses it,compiles to Fe bytecode or ASTbundles/validates it — no bytecode is produced; Fe evaluates source (see Amendment 2026-06-14) - Package into
.kn86(see deliverable 3)
- Write cartridge source in
-
On-device loading (Pi Zero 2 W):
- Load
the cart’s Lisp source from the.kn86bytecode section.kn86container - Reinitialize Fe arena
Hand bytecode to Fe evaluatorParse the source and evaluate it with the Fe tree-walker (no bytecode; see Amendment 2026-06-14)- Cell handlers are registered as closures capturing the cartridge’s lexical environment
- Load
-
Handler dispatch contract:
- When a cell event fires (ON_CAR, ON_EVAL, etc.), the runtime looks up the handler
- If handler is C function pointer: call directly
- If handler is Lisp lambda reference: invoke Fe evaluator with the lambda + arguments
- Same dispatch latency either way
- Feature completeness: Fe’s minimal feature set (20 builtins) means the stdlib must be expanded carefully. The risk is minor — Fe is designed to be extended.
- Bytecode format: Fe doesn’t have an official bytecode format. We’d either:
- Ship Fe’s source text directly in
.kn86(parse on device, slower) - Design a custom bytecode format + Fe interpreter modification to consume it (implementation work)
- Option 2 is ~3–5 KB of work; Option 1 adds ~2–5 KB per cartridge
- Ship Fe’s source text directly in
- Debugger support: minimal. A source-line table in the
.kn86header would enable breakpoints; not difficult.
Verdict
Section titled “Verdict”Strong candidate. Fe’s arena-native design, tiny footprint (20–25 KB), and excellent latency profile make it the natural fit. The only work is FFI binding and bytecode format extension. Risks are well-understood and manageable.
Option C: From-Scratch Minimal Bytecode VM
Section titled “Option C: From-Scratch Minimal Bytecode VM”Architecture
Section titled “Architecture”Design a purpose-built VM for KN-86: focus on handler dispatch, not general-purpose Lisp. Minimal feature set.
-
Compiler (desktop tool):
- Parse
.lspsource - Generate a compact bytecode instruction set targeting the VM
- Instruction set: ~30–50 opcodes (LOAD, STORE, CALL, BRANCH, RETURN, etc.)
- Output: bytecode blob + constant table (strings, symbols)
- Parse
-
Interpreter (device):
- Stack-based VM with explicit instruction pointer
- Very fast dispatch loop: ~1 clock per opcode
- No AST, no tree-walking overhead
- Arena allocation for runtime values (stack, heap)
-
Code size estimate:
- Compiler (desktop): ~8–12 KB (one-time)
- Interpreter (device): ~15–20 KB
- FFI bindings: ~5–8 KB
- Total: ~20–28 KB — excellent footprint
-
SRAM working state:
- VM stack: ~2 KB (configurable)
- Instruction pointer, frame pointer, etc.: ~500 B
- Arena:
16–32 KB256 KB (per Amendment 2026-04-27 / GWP-233) - Total: ~18–34 KB
-
Handler dispatch latency:
- Stack-based bytecode: near-native speed
- No tree-walking overhead
- Typical handler: 1–3 ms ✓
- Complex procedural gen: 3–8 ms ✓
- Implementation cost: building a compiler, instruction set, and interpreter from scratch is ~2–3 weeks of engineering work (much higher than Option B).
- Completeness: must ensure the instruction set is sufficient for all cartridge patterns (closures, higher-order functions, mutation, etc.). Missing primitives require bytecode redesign.
- Debugger: stack traces require source-line mapping and careful design; more effort than Fe.
- Maintenance: custom VM = custom bugs. No existing test suite or community.
Verdict
Section titled “Verdict”Not recommended. The implementation cost far exceeds the marginal gains (footprint savings of 2–5 KB are not worth 2–3 weeks). Option B (Fe) gives 95% of the benefits with 5% of the effort.
Comparative Table
Section titled “Comparative Table”| Criterion | Option A (uLisp) | Option B (Fe) | Option C (Custom) |
|---|---|---|---|
| Code size (VM) | 40–50 KB | 20–25 KB | 20–28 KB |
| SRAM (working) | 18–36 KB | 17–34 KB | 18–34 KB |
| Handler latency (typical) | 3–8 ms | 2–6 ms | 1–3 ms |
| Arena compatibility | Adapted (risky) | Native (proven) | Native (untested) |
| Dual-target builds | Yes (C source) | Yes (C source) | Yes (bytecode) |
| Implementation effort | 3–4 weeks | 3–5 days | 2–3 weeks |
| Risk profile | Moderate (GC removal) | Low (proven design) | Moderate (new VM) |
| Feature set | Full Lisp | Minimal (extensible) | Custom (enough) |
| Debugger feasibility | Medium | Medium | High |
Recommendation: Option B (Fe)
Section titled “Recommendation: Option B (Fe)”Rationale
Section titled “Rationale”Fe is the clear winner. It meets all constraints with minimal effort:
-
Perfect fit for constraints:
- Code size: 20–25 KB (well under 48 KB budget)
- SRAM: 17–34 KB (comfortable within 8 KB working state +
16–32 KB256 KB arena per Amendment 2026-04-27) - Latency: 2–6 ms typical (well under 5 ms target, never exceeds 10 ms ceiling)
- Arena native: no GC, no pauses, bounded memory by design
-
Proven design:
- Fe has been shipping in real embedded systems for years
- Arena allocation is not an adaptation — it’s the core design
- ~800 LOC means the codebase is auditable and maintainable
-
Minimal porting effort:
- ~3–5 days to integrate Fe into the emulator and firmware
- ~5–8 KB of FFI boilerplate to expose NoshAPI
- Cartridge format design is independent (see deliverable 3)
-
Extensibility:
- Fe’s minimal builtins are not a limitation — they’re a feature
- The stdlib can be grown incrementally as cartridges demand new primitives
- No bloat from uLisp’s 60+ builtins that cartridges won’t use
Required Adaptations
Section titled “Required Adaptations”-
Bytecode format: [DEFERRED 2026-06-14 — not implemented; design parked. See the Amendment Log and
kec-lisp/docs/bytecode-vm.md.] Fe natively reads source. For production:- Design a simple bytecode serialization (see deliverable 3)
- Modify Fe’s evaluator to consume bytecode in addition to source
- ~3–5 KB change to Fe’s reader
-
FFI bindings:
- Wrap all ~60 NoshAPI functions (text_puts, psg_write, spawn_cell, etc.) as Fe builtins
- Each binding: validate args, call C function, return result
- ~5–8 KB total
-
Arena integration:
- Cartridge loader initializes Fe’s arena with a configurable size (
16–32 KB256 KB default per Amendment 2026-04-27 / GWP-233) - At mission-instance boundaries, arena is reset
- This is already Fe’s design — no adaptation needed
- Cartridge loader initializes Fe’s arena with a configurable size (
-
Source-line mapping (optional, phase 2):
- Compiler stores source line → bytecode offset mapping in
.kn86header - Debugger uses this for stack traces
- Not critical for MVP; can be added later
- Compiler stores source line → bytecode offset mapping in
Next Steps
Section titled “Next Steps”-
Prototype Fe integration:
- Clone Fe repo, integrate into emulator build
- Write hello-world Lisp cartridge, confirm it runs
- Measure: handler dispatch latency, memory usage under typical workload
-
Bytecode format design:
- Work on deliverable 3 in parallel
- Determine bytecode instruction set (Fe’s AST + constants)
- Finalize
.kn86header and section layout
-
FFI binding enumeration:
- Work on deliverable 2 in parallel
- Enumerate all ~60 NoshAPI primitives that cartridges will need
- Determine Lisp signatures and type mappings
Known Unknowns
Section titled “Known Unknowns”- Fe’s closure semantics: Fe supports closures; confirm they work correctly with arena resets at mission boundaries (likely yes, but worth testing).
- Procedural generation performance: complex nested-loop generation (e.g., network generation in ICE Breaker) — measure latency on real cartridge patterns. (Closed 2026-04-27 by GWP-233 — see Amendment Log; arena pressure measured for all four launch carts.)
- Hot reload: ADR-0002 mentions hot-reload of cart content. Confirm Fe’s arena reset works cleanly for this.
- REPL integration: ADR-0002 scopes a player-facing REPL. Fe’s reader is suitable; confirm the integration path.
Amendment Log
Section titled “Amendment Log”2026-04-27 — Per-cart arena default ratified at 256 KB (GWP-233)
Section titled “2026-04-27 — Per-cart arena default ratified at 256 KB (GWP-233)”Status effect: Accepted (unchanged). Amendment pattern follows the ADR-0006 / ADR-0005 precedent (2026-04-22 / 2026-04-24): new **Amended:** header line + this log section + struck-through value updates in the body. No change to Decision (Fe is still selected) or Options Considered.
Background. ADR-0010 (the initial ICE Breaker reference sketch) named “16–32 KB per cart” as a Pico-class memory target. ADR-0004 inherited that band in six places. Wave 4 silently bumped the in-source constant to 256 KB when icebreaker.lsp landed (kn86-emulator/src/cartridge.c KN86_CART_FE_ARENA_SIZE) on the bet that a 440-line Lisp cart would not fit in 32 KB. That bump was correct — but it was a guess, not a measurement, and it left the ADR text out of sync. GWP-190’s third acceptance criterion called for an empirical pass to validate the new default; this amendment is that follow-up.
Methodology. New bench harness kn86-emulator/bench/bench_fe_arena_scale.c loads each launch cart through the production cartridge_load_v2_with_arena path at five arena sizes and drives a 16-tick CAR/CDR/INFO/BACK navigation flow that exercises the cart’s primary nav loop and triggers handler-driven allocation. Peak arena pressure is sampled via fe_arena_stats() (new public introspection, also added in this PR) after each tick, retaining the running max across the flow.
Bench output is structured for the BASELINE.md table; see kn86-emulator/bench/BASELINE.md for the canonical run.
Measurements (Apple M4 Pro / macOS / Debug, 2026-04-27). Each row is the per-cart peak slot count and byte-equivalent peak (peak_slots × 16 B/slot) at the listed arena size. FAIL means Fe panicked with out of memory during cart-init or the input flow.
| Cart | 32 KB | 64 KB | 128 KB | 256 KB | 512 KB |
|---|---|---|---|---|---|
| icebreaker | FAIL | 54,640 B | 54,640 B | 54,640 B | 54,640 B |
| neongrid | FAIL | 59,472 B | 70,816 B | 70,816 B | 70,816 B |
| depthcharge | FAIL | FAIL | 126,240 B | 126,240 B | 126,240 B |
| blackledger | FAIL | FAIL | 84,640 B | 84,640 B | 84,640 B |
Worst-case observation. Depthcharge at 126,240 bytes (≈ 123 KB) is the worst peak across the matrix. At the in-source 256 KB default, that leaves 135,904 bytes (~51.8%) of arena headroom — comfortably above the 50% safety margin the bench harness asserts as its pass criterion.
Floor observation. No cart loads at 32 KB. depthcharge and blackledger require ≥ 128 KB. icebreaker and neongrid load at 64 KB but with thin margin. Minimum viable arena across the four launch carts: 128 KB.
Stable-peak observation. Each cart’s peak is essentially flat across the 128 KB / 256 KB / 512 KB columns (icebreaker is even flat from 64 KB upward). This is Fe’s mark-sweep GC working as intended: the arena reaches a steady-state working set determined by the cart’s cell registry + handler closure captures + mid-mission live objects, not by arena size. Bigger arenas don’t push the peak higher; they just mean the GC fires less often.
New default: 256 KB (no change to in-source value). The in-source KN86_CART_FE_ARENA_SIZE already sits at 256 KB. The measurement justifies that value rather than overturning it:
- Worst-case headroom. 256 KB ÷ 126,240 B (depthcharge peak) = 2.08× — squarely meets the bench’s 2× safety multiplier.
- Pi Zero 2 W RAM envelope. 512 MB total, ≈ 50–150 MB consumed by Linux + the nOSh runtime + display framebuffer + audio buffers + idle daemons (verify against bring-up notes; current estimate). Per-cart 256 KB is < 0.1% of the post-baseline envelope. Arena scaling up to 1 MB is plausible; staying at 256 KB leaves headroom for Universal Deck State, the runtime REPL/nEmacs context, and future concurrent-cart features without renegotiating the arena.
- Future-proofing. A 4× scale-up to 1 MB would still be < 0.5% of the post-baseline RAM. The chosen 256 KB is the lower bound that satisfies the safety margin; doubling or quadrupling later if a cart genuinely needs it is a one-constant change. The hard ceiling is “Pi Zero 2 W RAM minus baseline minus runtime working set” — roughly 300–400 MB — which leaves 4–5 orders of magnitude of headroom over the current default.
Black Ledger scope decision (closes design-pack open question #2). Black Ledger’s published depth is 1 case × 1 account × 3 transactions and runs at 84,640 B peak inside the 256 KB arena (32% utilization). The cart has substantial headroom to scale up toward its original C-cart target of 4×3-8×8-50. A follow-on task should bump MAX-CASES / MAX-ACCOUNTS / MAX-TXNS in kn86-emulator/carts/blackledger.lsp toward the design target, re-run this bench, and confirm peak stays under 50% utilization. Out of scope for this amendment.
Production hardening side-effect. The bench required Fe to fail recoverably when the arena exhausts. Fe’s default fe_error() calls exit(EXIT_FAILURE), which would have torn down the bench process at the first sub-default arena. kn86-emulator/src/cartridge.c now installs a setjmp-based error handler in cartridge_load_v2_impl so a cart that panics during eval / cart-init reports the error through Row 24’s status buffer + stderr and unloads cleanly. This is also a real production hardening — a malformed cart no longer kills the emulator.
Plumbing additions (this PR).
| File | Change |
|---|---|
kn86-emulator/vendor/fe/fe.h + fe.c | New fe_arena_stats() + fe_object_size() introspection. |
kn86-emulator/src/cartridge.h + cartridge.c | New cartridge_load_v2_with_arena() overload + cartridge_get_default_arena_size() accessor. Internal refactor: cartridge_load_v2() thunks through a shared cartridge_load_v2_impl(). setjmp recovery added. |
kn86-emulator/bench/bench_fe_arena_scale.c | New bench harness — the canonical measurement vehicle. |
kn86-emulator/CMakeLists.txt | Wires the new bench into the benchmarks target. |
kn86-emulator/bench/BASELINE.md | Adds the per-cart × per-arena measurement table. |
Doc-sweep follow-up. ADR-0005 §“Tier 1: All-Carts Primitives” notes “Max string length in cartridge: limited by arena size; typically 256–1024 bytes per string.” The new 256 KB default doesn’t change that practical advice — strings are still bounded by working-set fragmentation, not raw arena size — but the cross-reference is updated in this PR for Spec Hygiene Rule 3 compliance. ADR-0010’s “16–32 KB” framing is a frozen historical artifact and is not revised; Wave 4’s commit message + this amendment supersede it.
Pi Zero 2 W validation. Pending — bench will be re-run on the prototype at Stage 1c bring-up and BASELINE.md will gain a second platform column. Cortex-A53 @ 1 GHz is ~3–5x slower for integer/cache-friendly workloads vs Apple Silicon, but arena measurements are byte-counts, not timings, so the per-cart peak rows here will transfer unchanged. The 50% safety margin assertion holds either way.
Authority trail. Sprint 4 design pack docs/plans/sprints/2026-04-27-sprint4-gwp-233-design.md (the canonical brief). Provisional decisions on amendment target (ADR-0004, not ADR-0005), worst-case-cart treatment (Black Ledger at published depth, scale-up deferred), and new-default landing zone (256 KB) ratified by Josh in the GWP-233 task brief.
2026-06-14 — Runtime is tree-walking; bytecode/AOT path deferred (clarification)
Section titled “2026-06-14 — Runtime is tree-walking; bytecode/AOT path deferred (clarification)”Status effect: Accepted (unchanged). Fe remains the selected runtime. This amendment corrects framing/terminology and records a deferral; it does not re-open Options Considered or change the Decision.
What was conflated. This ADR was written as a “bytecode VM selection” spike and kept that title, but the winning option (Fe) evaluates by walking the cons-cell AST directly — it is a tree-walking interpreter with no compiler, no bytecode, and no instruction set. The ADR’s own Option B analysis already says so (“Fe is an interpreter with no JIT; evaluation is tree-walking”). The standalone KEC Lisp implementation (kec-lisp repo) confirms it: kernel/fe.c is rxi/fe 1.0, a recursive eval(); kec build is a source bundler, not a compiler.
What was never built. The “Integration Path” and “Required Adaptations” sections describe a desktop step that “compiles to Fe bytecode,” an on-device “.kn86 bytecode section,” and a modified Fe evaluator that “consumes bytecode.” None of that was implemented. Cartridges carry Lisp source (.kn86), which the device parses on load. There is no Fe bytecode format. The affected lines are struck through in the body above.
Decision recorded (2026-06-14): keep the tree-walker; defer the bytecode VM. For the current prototyping phase — a terminal-based game system on the Pi Zero 2 W — the tree-walker is the right substrate: maximal malleability (new primitive = a cfunc; new special form = one eval() switch arm), no compiler/VM/GC-rooting surface to maintain while the language and FFI are still in flux, and ample headroom on a 1 GHz A53 driving an 80×25 text grid. Hot computations are handled by the FFI escape hatch (push the one hot thing into C), not by making the whole language fast.
The deferred design is parked, not lost. A complete, implementation-ready design for an in-memory bytecode VM — plus an analysis of the analyzing-interpreter and AOT alternatives, with concrete revisit triggers — lives at kec-lisp/docs/bytecode-vm.md. Sequencing note: in-memory bytecode → AOT is an additive path (shared compiler + VM), so if bytecode is ever built, go straight to the in-memory VM; the analyzing interpreter is a detour.
Revisit triggers (data, not vibe). Reopen when on-device profiling shows the interpreter loop itself (not a cfunc-able hot spot) eating the frame budget; or cart load-time (on-device parsing) becomes a felt delay; or RAM/footprint near ship favors dropping the on-device parser (AOT); or untrusted third-party carts require opcode-level verification/metering.
Doc-sweep follow-up (Spec Hygiene Rule 3). Sweep the downstream “compiles ahead of time to Fe bytecode” / “Fe bytecode VM” phrasings → “tree-walking Fe interpreter.” The kn86-docs corpus sweep (ADR-0001/0006 amendments, runtime docs, grammars, SDK, cartridge docs, marketing/influence essays) is executed in GWP-526 alongside this amendment family. The kinoshita/CLAUDE.md “Runtime stack” note + “Fe bytecode VM” references are a separate same-repo edit (kinoshita repo, not kn86-docs). The kec-lisp repo’s own CLAUDE.md already describes Fe accurately and needs no change.
2026-06-14 — Language extracted to a standalone repo; embedding + memory facts (clarification)
Section titled “2026-06-14 — Language extracted to a standalone repo; embedding + memory facts (clarification)”Status effect: Accepted (unchanged). Fe remains the selected runtime. Companion to the tree-walking amendment above; records the 2026-06-14 language split and the embedding/memory facts that follow from it.
KEC Lisp is now a standalone language repo. The Fe runtime selected here was extracted to github.com/Kinoshita-Electronics-Consortium/kec-lisp — the Fe kernel (vendored rxi/fe), KEC Core (stdlib authored in Lisp), portable host primitives, the embedding API (kec.h), and the kec CLI. The KN-86 firmware vendors that language and registers the NoshAPI device primitives (ADR-0005) onto each Fe context via kec_bind_fe. The “VM selection” decision is unchanged; it is now satisfied by a vendored library rather than an in-tree interpreter. The in/out boundary is specified at the KEC Lisp site.
Embedding entry point (no-malloc). The runtime opens contexts via kec_open_with_arena(buf, size, profile) — a no-malloc entry point (added for the device, GWP-502) that runs Fe on a caller-supplied static/stack buffer, returns NULL cleanly if the buffer is too small to load Core, and never frees the caller’s buffer. The nOSh embedding uses this rather than the malloc-ing kec_open. One kec_State = one Fe context + one arena.
Capability tiers = profiles + binding-set. The standalone language ships two profiles — KEC_PROFILE_SANDBOX and KEC_PROFILE_FULL (FULL adds load/slurp/args/exit). The KN-86’s per-context permission tiers (cart vs mission vs system-render, ADR-0005) are these profiles plus which device primitives are bound into the context — capability is the binding-set, enforced at context creation, not by runtime checks.
Memory model reconciliation. Fe is mark-sweep GC over a fixed object pool carved from the arena — it reclaims dead objects but does not grow the pool, so the 256 KB per-cart default measured above (steady-state working set, not arena size) stands. The GC root stack is GCSTACKSIZE, now compile-time configurable: default 256 (sized for the device), raised to 8192 on the desktop build for recursive-code headroom. Core’s list/sequence functions are written iteratively so a long list won’t exhaust that root stack. (Top-level let binding globally is another KEC kernel delta vs upstream Fe — see the CHANGELOG.)
Fe: arena-native, proven, minimal. The right tool for the constraint set.