ADR-0004: Bytecode VM Selection

Hardware retarget note (2026-04-21): This ADR selected Fe under memory constraints sized for RP2350 / Pico 2 (520 KB SRAM, 4 MB flash). That target has been dropped; the KN-86 Deckline ships on the Pi Zero 2 W. The Fe selection still holds — Fe runs anywhere portable C runs and comfortably fits within Pi Zero’s abundant memory. The budget numbers below are retained as historical constraints that shaped the choice; they are not current hardware limits.

Supersedes spike: former spikes/ADR-0001-VM-selection.md
Related: ADR-0001-embedded-lisp-scripting-layer.md

Summary

This spike evaluates three bytecode VM candidates for the KN-86 cartridge Lisp runtime against the constraints from ADR-0001:

Historical memory envelope (RP2350, 520 KB SRAM): ≤ 48 KB flash for VM code, ≤ 8 KB SRAM for working state. (On Pi Zero 2 W with 512 MB, these are retained as design-discipline targets rather than hardware caps.)
Handler dispatch latency: 5 ms target, 10 ms ceiling
Dual-target builds: SDL3 emulator and device firmware without source change
Arena allocation: no GC, bounded memory semantics required
20 fps animation cap: not a per-frame hard deadline

Option A: uLisp (Adapted for Arena Allocation)

Architecture

uLisp (http://www.ulisp.com) is a feature-complete Lisp interpreter written in C (~8,000 LOC), targeting embedded systems with mark-sweep garbage collection. It provides:

S-expression reader (parse directly from source or bytecode)
Full Lisp semantics (lambdas, closures, tail call optimization, macro system)
40+ built-in functions out of the box
GC heap, variable-sized object allocator
Runs on Arduino, Teensy, and various embedded boards

Adaptation Strategy

The core challenge: replace GC with arena allocation. Analysis:

Mark-sweep GC removal: uLisp’s heap allocator is tied tightly to its GC. To adapt it:
- Convert the heap to a pre-allocated arena (~~16–32 KB per cartridge~~ 256 KB per cartridge per Amendment 2026-04-27 / GWP-233)
- Remove the mark phase entirely (no GC)
- Object lifetimes must be managed manually or via scope-based cleanup
- On cartridge load: reset arena pointer; on mission-instance boundary: reset again
Cell allocation within arena:
- uLisp allocates list cells (cons cells), atoms (symbols, numbers, strings) dynamically
- With arena, this becomes a simple bump allocator — very fast
- Risk: without GC, objects must be explicitly freed or scope-bounded
Code size estimate:
- uLisp core: ~30–35 KB (it’s already stripped for embedded)
- Adapting mark-sweep → arena: ~2–5 KB of refactoring + control flow changes
- FFI bridge (exposing NoshAPI): ~5–10 KB of new wrapper code
- Total: ~40–50 KB — marginally acceptable but tightly constrained
SRAM working state:
- Heap arena: ~~16–32 KB~~ 256 KB (cart-configurable; default per Amendment 2026-04-27 / GWP-233)
- Stack/interpreter state: ~2–4 KB
- Total: ~18–36 KB — requires careful arena sizing
Handler dispatch latency:
- uLisp interprets directly from Lisp source (or bytecode if a reader is implemented)
- No JIT, no optimization passes
- Cipher voice handler (simple lookups, ~10–20 s-expressions): 2–4 ms ✓
- Cell handler (moderate control flow): 3–8 ms ✓
- Complex procedural gen (nested loops): risk of 10–15 ms (exceeds ceiling)

Risks

GC removal is invasive: uLisp was designed with GC as a core assumption. Arena adaptation requires substantial refactoring and testing; risk of subtle memory corruption if scope lifetimes are wrong.
Manual memory management: cartridge authors must be aware of arena boundaries; leaks are possible if handlers allocate indefinitely.
Debuggability: stack traces in an arena-allocated Lisp are harder — no GC pauses means less natural breakpoints for inspection.
Code size creep: the full feature set (60+ builtins, macro system) adds bloat. A minimal subset might save 5–10 KB but means reimplementing parts of the std lib.

Verdict

Viable but risky. Code size is acceptable (40–50 KB), latency is good for typical handlers, and arena semantics align with ADR-0001’s constraints. However, the GC-to-arena migration is a large, invasive refactor with moderate risk of subtle bugs. Not a strong candidate unless there’s a compelling reason to use the full uLisp feature set.

Option B: Fe (Lightweight Lisp)

Architecture

Fe (https://github.com/rxi/fe) is a minimal Lisp interpreter written in C (~800 LOC, ~12 KB binary). It provides:

Compact S-expression reader and evaluator
First-class functions and lambdas
Arena-native allocation (by design — no GC)
Minimal built-in set (~20 core functions)
Straightforward bytecode compilation path (optional)

Design

Fe is already arena-allocated by design. Its architecture:

Arena allocator: Fe allocates all objects (cons cells, atoms, functions) from a pre-allocated arena. The arena is reset at well-defined boundaries (cartridge load, mission start).
- No GC, no mark-sweep, no pauses
- Bump allocator: O(1) allocation
- Deallocation happens implicitly at arena reset boundaries
Runtime model:
- Reader: parse s-expressions from source (or bytecode, with minimal extension)
- Evaluator: recursive-descent interpreter over the AST
- Function application: direct invocation, no bytecode compilation step (but compilation is possible)
FFI integration:
- Functions are first-class values; exposing NoshAPI as built-in functions is straightforward
- Each NoshAPI primitive (text_puts, psg_write, etc.) becomes a Lisp builtin
- ~3–5 KB of wrapper code to bind the ~60 NoshAPI functions
Code size estimate:
- Fe core: ~12 KB (already minimal)
- FFI bindings: ~5–8 KB
- Cartridge loader integration: ~2–3 KB
- Total: ~20–25 KB — well under budget
SRAM working state:
- Fe interpreter stack/state: ~1–2 KB (very small)
- Arena per cart: ~~16–32 KB~~ 256 KB (user-configurable; default per Amendment 2026-04-27 / GWP-233)
- Total: ~17–34 KB — comfortable margin
Handler dispatch latency:
- Fe is an interpreter with no JIT; evaluation is tree-walking
- Cipher voice handler: 2–3 ms ✓
- Cell handler (moderate): 3–6 ms ✓
- Complex procedural gen: 5–10 ms (within ceiling) ✓
- Latency is proportional to expression depth; typical handlers are shallow

Integration Path

Cartridge compilation (desktop, once at author time):
- Write cartridge source in .lsp (Lisp)
- Desktop tool reads .lsp, parses it, ~~compiles to Fe bytecode or AST~~ bundles/validates it — no bytecode is produced; Fe evaluates source (see Amendment 2026-06-14)
- Package into .kn86 (see deliverable 3)
On-device loading (Pi Zero 2 W):
- Load ~~.kn86 bytecode section~~ the cart’s Lisp source from the .kn86 container
- Reinitialize Fe arena
- ~~Hand bytecode to Fe evaluator~~ Parse the source and evaluate it with the Fe tree-walker (no bytecode; see Amendment 2026-06-14)
- Cell handlers are registered as closures capturing the cartridge’s lexical environment
Handler dispatch contract:
- When a cell event fires (ON_CAR, ON_EVAL, etc.), the runtime looks up the handler
- If handler is C function pointer: call directly
- If handler is Lisp lambda reference: invoke Fe evaluator with the lambda + arguments
- Same dispatch latency either way

Risks

Feature completeness: Fe’s minimal feature set (20 builtins) means the stdlib must be expanded carefully. The risk is minor — Fe is designed to be extended.
Bytecode format: Fe doesn’t have an official bytecode format. We’d either:
- Ship Fe’s source text directly in .kn86 (parse on device, slower)
- Design a custom bytecode format + Fe interpreter modification to consume it (implementation work)
- Option 2 is ~3–5 KB of work; Option 1 adds ~2–5 KB per cartridge
Debugger support: minimal. A source-line table in the .kn86 header would enable breakpoints; not difficult.

Verdict

Strong candidate. Fe’s arena-native design, tiny footprint (20–25 KB), and excellent latency profile make it the natural fit. The only work is FFI binding and bytecode format extension. Risks are well-understood and manageable.

Option C: From-Scratch Minimal Bytecode VM

Architecture

Design a purpose-built VM for KN-86: focus on handler dispatch, not general-purpose Lisp. Minimal feature set.

Compiler (desktop tool):
- Parse .lsp source
- Generate a compact bytecode instruction set targeting the VM
- Instruction set: ~30–50 opcodes (LOAD, STORE, CALL, BRANCH, RETURN, etc.)
- Output: bytecode blob + constant table (strings, symbols)
Interpreter (device):
- Stack-based VM with explicit instruction pointer
- Very fast dispatch loop: ~1 clock per opcode
- No AST, no tree-walking overhead
- Arena allocation for runtime values (stack, heap)
Code size estimate:
- Compiler (desktop): ~8–12 KB (one-time)
- Interpreter (device): ~15–20 KB
- FFI bindings: ~5–8 KB
- Total: ~20–28 KB — excellent footprint
SRAM working state:
- VM stack: ~2 KB (configurable)
- Instruction pointer, frame pointer, etc.: ~500 B
- Arena: ~~16–32 KB~~ 256 KB (per Amendment 2026-04-27 / GWP-233)
- Total: ~18–34 KB
Handler dispatch latency:
- Stack-based bytecode: near-native speed
- No tree-walking overhead
- Typical handler: 1–3 ms ✓
- Complex procedural gen: 3–8 ms ✓

Risks

Implementation cost: building a compiler, instruction set, and interpreter from scratch is ~2–3 weeks of engineering work (much higher than Option B).
Completeness: must ensure the instruction set is sufficient for all cartridge patterns (closures, higher-order functions, mutation, etc.). Missing primitives require bytecode redesign.
Debugger: stack traces require source-line mapping and careful design; more effort than Fe.
Maintenance: custom VM = custom bugs. No existing test suite or community.

Verdict

Not recommended. The implementation cost far exceeds the marginal gains (footprint savings of 2–5 KB are not worth 2–3 weeks). Option B (Fe) gives 95% of the benefits with 5% of the effort.

Comparative Table

Criterion	Option A (uLisp)	Option B (Fe)	Option C (Custom)
Code size (VM)	40–50 KB	20–25 KB	20–28 KB
SRAM (working)	18–36 KB	17–34 KB	18–34 KB
Handler latency (typical)	3–8 ms	2–6 ms	1–3 ms
Arena compatibility	Adapted (risky)	Native (proven)	Native (untested)
Dual-target builds	Yes (C source)	Yes (C source)	Yes (bytecode)
Implementation effort	3–4 weeks	3–5 days	2–3 weeks
Risk profile	Moderate (GC removal)	Low (proven design)	Moderate (new VM)
Feature set	Full Lisp	Minimal (extensible)	Custom (enough)
Debugger feasibility	Medium	Medium	High

Recommendation: Option B (Fe)

Rationale

Fe is the clear winner. It meets all constraints with minimal effort:

Perfect fit for constraints:
- Code size: 20–25 KB (well under 48 KB budget)
- SRAM: 17–34 KB (comfortable within 8 KB working state + ~~16–32 KB~~ 256 KB arena per Amendment 2026-04-27)
- Latency: 2–6 ms typical (well under 5 ms target, never exceeds 10 ms ceiling)
- Arena native: no GC, no pauses, bounded memory by design
Proven design:
- Fe has been shipping in real embedded systems for years
- Arena allocation is not an adaptation — it’s the core design
- ~800 LOC means the codebase is auditable and maintainable
Minimal porting effort:
- ~3–5 days to integrate Fe into the emulator and firmware
- ~5–8 KB of FFI boilerplate to expose NoshAPI
- Cartridge format design is independent (see deliverable 3)
Extensibility:
- Fe’s minimal builtins are not a limitation — they’re a feature
- The stdlib can be grown incrementally as cartridges demand new primitives
- No bloat from uLisp’s 60+ builtins that cartridges won’t use

Required Adaptations

Bytecode format: [DEFERRED 2026-06-14 — not implemented; design parked. See the Amendment Log and kec-lisp/docs/bytecode-vm.md.] Fe natively reads source. For production:
- Design a simple bytecode serialization (see deliverable 3)
- Modify Fe’s evaluator to consume bytecode in addition to source
- ~3–5 KB change to Fe’s reader
FFI bindings:
- Wrap all ~60 NoshAPI functions (text_puts, psg_write, spawn_cell, etc.) as Fe builtins
- Each binding: validate args, call C function, return result
- ~5–8 KB total
Arena integration:
- Cartridge loader initializes Fe’s arena with a configurable size (~~16–32 KB~~ 256 KB default per Amendment 2026-04-27 / GWP-233)
- At mission-instance boundaries, arena is reset
- This is already Fe’s design — no adaptation needed
Source-line mapping (optional, phase 2):
- Compiler stores source line → bytecode offset mapping in .kn86 header
- Debugger uses this for stack traces
- Not critical for MVP; can be added later

Next Steps

Prototype Fe integration:
- Clone Fe repo, integrate into emulator build
- Write hello-world Lisp cartridge, confirm it runs
- Measure: handler dispatch latency, memory usage under typical workload
Bytecode format design:
- Work on deliverable 3 in parallel
- Determine bytecode instruction set (Fe’s AST + constants)
- Finalize .kn86 header and section layout
FFI binding enumeration:
- Work on deliverable 2 in parallel
- Enumerate all ~60 NoshAPI primitives that cartridges will need
- Determine Lisp signatures and type mappings

Known Unknowns

Fe’s closure semantics: Fe supports closures; confirm they work correctly with arena resets at mission boundaries (likely yes, but worth testing).
Procedural generation performance: complex nested-loop generation (e.g., network generation in ICE Breaker) — measure latency on real cartridge patterns. (Closed 2026-04-27 by GWP-233 — see Amendment Log; arena pressure measured for all four launch carts.)
Hot reload: ADR-0002 mentions hot-reload of cart content. Confirm Fe’s arena reset works cleanly for this.
REPL integration: ADR-0002 scopes a player-facing REPL. Fe’s reader is suitable; confirm the integration path.

Amendment Log

2026-04-27 — Per-cart arena default ratified at 256 KB (GWP-233)

Status effect: Accepted (unchanged). Amendment pattern follows the ADR-0006 / ADR-0005 precedent (2026-04-22 / 2026-04-24): new **Amended:** header line + this log section + struck-through value updates in the body. No change to Decision (Fe is still selected) or Options Considered.

Background. ADR-0010 (the initial ICE Breaker reference sketch) named “16–32 KB per cart” as a Pico-class memory target. ADR-0004 inherited that band in six places. Wave 4 silently bumped the in-source constant to 256 KB when icebreaker.lsp landed (kn86-emulator/src/cartridge.c KN86_CART_FE_ARENA_SIZE) on the bet that a 440-line Lisp cart would not fit in 32 KB. That bump was correct — but it was a guess, not a measurement, and it left the ADR text out of sync. GWP-190’s third acceptance criterion called for an empirical pass to validate the new default; this amendment is that follow-up.

Methodology. New bench harness kn86-emulator/bench/bench_fe_arena_scale.c loads each launch cart through the production cartridge_load_v2_with_arena path at five arena sizes and drives a 16-tick CAR/CDR/INFO/BACK navigation flow that exercises the cart’s primary nav loop and triggers handler-driven allocation. Peak arena pressure is sampled via fe_arena_stats() (new public introspection, also added in this PR) after each tick, retaining the running max across the flow.

Bench output is structured for the BASELINE.md table; see kn86-emulator/bench/BASELINE.md for the canonical run.

Measurements (Apple M4 Pro / macOS / Debug, 2026-04-27). Each row is the per-cart peak slot count and byte-equivalent peak (peak_slots × 16 B/slot) at the listed arena size. FAIL means Fe panicked with out of memory during cart-init or the input flow.

Cart	32 KB	64 KB	128 KB	256 KB	512 KB
icebreaker	FAIL	54,640 B	54,640 B	54,640 B	54,640 B
neongrid	FAIL	59,472 B	70,816 B	70,816 B	70,816 B
depthcharge	FAIL	FAIL	126,240 B	126,240 B	126,240 B
blackledger	FAIL	FAIL	84,640 B	84,640 B	84,640 B

Worst-case observation. Depthcharge at 126,240 bytes (≈ 123 KB) is the worst peak across the matrix. At the in-source 256 KB default, that leaves 135,904 bytes (~51.8%) of arena headroom — comfortably above the 50% safety margin the bench harness asserts as its pass criterion.

Floor observation. No cart loads at 32 KB. depthcharge and blackledger require ≥ 128 KB. icebreaker and neongrid load at 64 KB but with thin margin. Minimum viable arena across the four launch carts: 128 KB.

Stable-peak observation. Each cart’s peak is essentially flat across the 128 KB / 256 KB / 512 KB columns (icebreaker is even flat from 64 KB upward). This is Fe’s mark-sweep GC working as intended: the arena reaches a steady-state working set determined by the cart’s cell registry + handler closure captures + mid-mission live objects, not by arena size. Bigger arenas don’t push the peak higher; they just mean the GC fires less often.

New default: 256 KB (no change to in-source value). The in-source KN86_CART_FE_ARENA_SIZE already sits at 256 KB. The measurement justifies that value rather than overturning it:

Worst-case headroom. 256 KB ÷ 126,240 B (depthcharge peak) = 2.08× — squarely meets the bench’s 2× safety multiplier.
Pi Zero 2 W RAM envelope. 512 MB total, ≈ 50–150 MB consumed by Linux + the nOSh runtime + display framebuffer + audio buffers + idle daemons (verify against bring-up notes; current estimate). Per-cart 256 KB is < 0.1% of the post-baseline envelope. Arena scaling up to 1 MB is plausible; staying at 256 KB leaves headroom for Universal Deck State, the runtime REPL/nEmacs context, and future concurrent-cart features without renegotiating the arena.
Future-proofing. A 4× scale-up to 1 MB would still be < 0.5% of the post-baseline RAM. The chosen 256 KB is the lower bound that satisfies the safety margin; doubling or quadrupling later if a cart genuinely needs it is a one-constant change. The hard ceiling is “Pi Zero 2 W RAM minus baseline minus runtime working set” — roughly 300–400 MB — which leaves 4–5 orders of magnitude of headroom over the current default.

Black Ledger scope decision (closes design-pack open question #2). Black Ledger’s published depth is 1 case × 1 account × 3 transactions and runs at 84,640 B peak inside the 256 KB arena (32% utilization). The cart has substantial headroom to scale up toward its original C-cart target of 4×3-8×8-50. A follow-on task should bump MAX-CASES / MAX-ACCOUNTS / MAX-TXNS in kn86-emulator/carts/blackledger.lsp toward the design target, re-run this bench, and confirm peak stays under 50% utilization. Out of scope for this amendment.

Production hardening side-effect. The bench required Fe to fail recoverably when the arena exhausts. Fe’s default fe_error() calls exit(EXIT_FAILURE), which would have torn down the bench process at the first sub-default arena. kn86-emulator/src/cartridge.c now installs a setjmp-based error handler in cartridge_load_v2_impl so a cart that panics during eval / cart-init reports the error through Row 24’s status buffer + stderr and unloads cleanly. This is also a real production hardening — a malformed cart no longer kills the emulator.

Plumbing additions (this PR).

File	Change
`kn86-emulator/vendor/fe/fe.h` + `fe.c`	New `fe_arena_stats()` + `fe_object_size()` introspection.
`kn86-emulator/src/cartridge.h` + `cartridge.c`	New `cartridge_load_v2_with_arena()` overload + `cartridge_get_default_arena_size()` accessor. Internal refactor: `cartridge_load_v2()` thunks through a shared `cartridge_load_v2_impl()`. setjmp recovery added.
`kn86-emulator/bench/bench_fe_arena_scale.c`	New bench harness — the canonical measurement vehicle.
`kn86-emulator/CMakeLists.txt`	Wires the new bench into the `benchmarks` target.
`kn86-emulator/bench/BASELINE.md`	Adds the per-cart × per-arena measurement table.

Doc-sweep follow-up. ADR-0005 §“Tier 1: All-Carts Primitives” notes “Max string length in cartridge: limited by arena size; typically 256–1024 bytes per string.” The new 256 KB default doesn’t change that practical advice — strings are still bounded by working-set fragmentation, not raw arena size — but the cross-reference is updated in this PR for Spec Hygiene Rule 3 compliance. ADR-0010’s “16–32 KB” framing is a frozen historical artifact and is not revised; Wave 4’s commit message + this amendment supersede it.

Pi Zero 2 W validation. Pending — bench will be re-run on the prototype at Stage 1c bring-up and BASELINE.md will gain a second platform column. Cortex-A53 @ 1 GHz is ~3–5x slower for integer/cache-friendly workloads vs Apple Silicon, but arena measurements are byte-counts, not timings, so the per-cart peak rows here will transfer unchanged. The 50% safety margin assertion holds either way.

Authority trail. Sprint 4 design pack docs/plans/sprints/2026-04-27-sprint4-gwp-233-design.md (the canonical brief). Provisional decisions on amendment target (ADR-0004, not ADR-0005), worst-case-cart treatment (Black Ledger at published depth, scale-up deferred), and new-default landing zone (256 KB) ratified by Josh in the GWP-233 task brief.

2026-06-14 — Runtime is tree-walking; bytecode/AOT path deferred (clarification)

Status effect: Accepted (unchanged). Fe remains the selected runtime. This amendment corrects framing/terminology and records a deferral; it does not re-open Options Considered or change the Decision.

What was conflated. This ADR was written as a “bytecode VM selection” spike and kept that title, but the winning option (Fe) evaluates by walking the cons-cell AST directly — it is a tree-walking interpreter with no compiler, no bytecode, and no instruction set. The ADR’s own Option B analysis already says so (“Fe is an interpreter with no JIT; evaluation is tree-walking”). The standalone KEC Lisp implementation (kec-lisp repo) confirms it: kernel/fe.c is rxi/fe 1.0, a recursive eval(); kec build is a source bundler, not a compiler.

What was never built. The “Integration Path” and “Required Adaptations” sections describe a desktop step that “compiles to Fe bytecode,” an on-device “.kn86 bytecode section,” and a modified Fe evaluator that “consumes bytecode.” None of that was implemented. Cartridges carry Lisp source (.kn86), which the device parses on load. There is no Fe bytecode format. The affected lines are struck through in the body above.

Decision recorded (2026-06-14): keep the tree-walker; defer the bytecode VM. For the current prototyping phase — a terminal-based game system on the Pi Zero 2 W — the tree-walker is the right substrate: maximal malleability (new primitive = a cfunc; new special form = one eval() switch arm), no compiler/VM/GC-rooting surface to maintain while the language and FFI are still in flux, and ample headroom on a 1 GHz A53 driving an 80×25 text grid. Hot computations are handled by the FFI escape hatch (push the one hot thing into C), not by making the whole language fast.

The deferred design is parked, not lost. A complete, implementation-ready design for an in-memory bytecode VM — plus an analysis of the analyzing-interpreter and AOT alternatives, with concrete revisit triggers — lives at kec-lisp/docs/bytecode-vm.md. Sequencing note: in-memory bytecode → AOT is an additive path (shared compiler + VM), so if bytecode is ever built, go straight to the in-memory VM; the analyzing interpreter is a detour.

Revisit triggers (data, not vibe). Reopen when on-device profiling shows the interpreter loop itself (not a cfunc-able hot spot) eating the frame budget; or cart load-time (on-device parsing) becomes a felt delay; or RAM/footprint near ship favors dropping the on-device parser (AOT); or untrusted third-party carts require opcode-level verification/metering.

Doc-sweep follow-up (Spec Hygiene Rule 3). Sweep the downstream “compiles ahead of time to Fe bytecode” / “Fe bytecode VM” phrasings → “tree-walking Fe interpreter.” The kn86-docs corpus sweep (ADR-0001/0006 amendments, runtime docs, grammars, SDK, cartridge docs, marketing/influence essays) is executed in GWP-526 alongside this amendment family. The kinoshita/CLAUDE.md “Runtime stack” note + “Fe bytecode VM” references are a separate same-repo edit (kinoshita repo, not kn86-docs). The kec-lisp repo’s own CLAUDE.md already describes Fe accurately and needs no change.

2026-06-14 — Language extracted to a standalone repo; embedding + memory facts (clarification)

Status effect: Accepted (unchanged). Fe remains the selected runtime. Companion to the tree-walking amendment above; records the 2026-06-14 language split and the embedding/memory facts that follow from it.

KEC Lisp is now a standalone language repo. The Fe runtime selected here was extracted to github.com/Kinoshita-Electronics-Consortium/kec-lisp — the Fe kernel (vendored rxi/fe), KEC Core (stdlib authored in Lisp), portable host primitives, the embedding API (kec.h), and the kec CLI. The KN-86 firmware vendors that language and registers the NoshAPI device primitives (ADR-0005) onto each Fe context via kec_bind_fe. The “VM selection” decision is unchanged; it is now satisfied by a vendored library rather than an in-tree interpreter. The in/out boundary is specified at the KEC Lisp site.

Embedding entry point (no-malloc). The runtime opens contexts via kec_open_with_arena(buf, size, profile) — a no-malloc entry point (added for the device, GWP-502) that runs Fe on a caller-supplied static/stack buffer, returns NULL cleanly if the buffer is too small to load Core, and never frees the caller’s buffer. The nOSh embedding uses this rather than the malloc-ing kec_open. One kec_State = one Fe context + one arena.

Capability tiers = profiles + binding-set. The standalone language ships two profiles — KEC_PROFILE_SANDBOX and KEC_PROFILE_FULL (FULL adds load/slurp/args/exit). The KN-86’s per-context permission tiers (cart vs mission vs system-render, ADR-0005) are these profiles plus which device primitives are bound into the context — capability is the binding-set, enforced at context creation, not by runtime checks.

Memory model reconciliation. Fe is mark-sweep GC over a fixed object pool carved from the arena — it reclaims dead objects but does not grow the pool, so the 256 KB per-cart default measured above (steady-state working set, not arena size) stands. The GC root stack is GCSTACKSIZE, now compile-time configurable: default 256 (sized for the device), raised to 8192 on the desktop build for recursive-code headroom. Core’s list/sequence functions are written iteratively so a long list won’t exhaust that root stack. (Top-level let binding globally is another KEC kernel delta vs upstream Fe — see the CHANGELOG.)

Fe: arena-native, proven, minimal. The right tool for the constraint set.