Sprint 4 Design Pack — GWP-233

Story narrative

GWP-190 (PR #86) widened the Lisp bridge with 7 Wave B primitives and closed ADR-0005 Known Unknown #1 (lfsr-shuffle array surface). The 4th AC bullet on GWP-190 — “run scaling experiments, document the new default in ADR-0005 or a follow-up ADR” — was not addressed in the merged code. This task is the follow-up.

Important framing correction: the task body cites “current 32 KB default” — that’s stale. As of Wave 4, kn86-emulator/src/cartridge.c:59 defines KN86_CART_FE_ARENA_SIZE = 256 KB. The bump happened with icebreaker.lsp landing (per the in-source comment: “Wave 4’s first Lisp cart at ~440 lines pushes past 32 KB; symbol interning, compiled form trees, closure captures, and the cell-field sidecar alist all eat Fe object budget”). So the question isn’t “32 KB → ?” but rather “is 256 KB right? what’s the actual headroom under representative cart workloads?” The measurement is still needed; the framing just shifted.

The stakes are real: Black Ledger procedural-generation depth was scaled back to 1 case / 1 account / 3 transactions vs the original C cart’s target of 4 × 3-8 × 8-50, on the assumption that arena pressure forced the scale-back. With 256 KB now (8× the original baseline), it’s not clear whether Black Ledger can actually run at full depth, or whether some other constraint (Fe object cost per cell, GC-less arena fragmentation, FFI marshaling overhead) is the real bottleneck. An empirical pass that measures peak Fe heap and procedural-depth ceiling at multiple arena sizes answers both: “what’s the right default?” and “does Black Ledger have headroom to scale up to its design target?”

The task asks for ADR-0005 amendment. ADR-0005 is the FFI surface — it’s not really the right home for an arena-size decision. ADR-0004 (vm-selection) is the canonical home for the arena spec (it currently says “16-32 KB cart-configurable” in five places). Recommend amending ADR-0004 with the Wave 4 bump retroactively and the new measurement-driven default, plus a smaller cross-reference in ADR-0005 noting that string-length implications scale with arena. A new ADR (ADR-0023?) is overkill — this is a value tweak with measurement justification, not a new architectural decision.

Player-facing semantics with worked example

Cart authors don’t see arena size directly; they see two downstream effects:

“Out of memory” errors when a cart’s runtime state exceeds the arena. Currently presented as an Fe panic / runtime abort. With proper arena defaults, this never happens for a normal mission playthrough. Author scenario: “I add a 5th account to my Black Ledger forensic case and the cart suddenly crashes with arena exhausted — what do I do?” The answer should be “you don’t see this; the arena is sized so the published cart designs all fit comfortably, and the documentation tells you the per-cart budget so you can plan.”
String / collection size implicit ceilings (per ADR-0005 L242: “Max string length in cartridge: limited by arena size; typically 256–1024 bytes per string”). A cart author writing a long mission brief or a large procedurally-generated scoreboard hits the arena ceiling indirectly. Documented behavior should match measured behavior.

Worked example: Black Ledger at full design depth.

The original C-cart Black Ledger generated 4 cases × 3-8 accounts/case × 8-50 transactions/account. At the geometric midpoint, that’s 4 × 5 × 25 = 500 transaction objects, plus 4 × 5 = 20 account objects, plus 4 case objects, plus the ledger root. In Lisp on Fe, each “object” is a list with N fields (transaction: amount, source, dest, timestamp, tags, audit-flag — 6 fields ≈ 6 cons cells + 6 symbols). Conservatively: 500 × 12 cells × 16 bytes/cell = ~96 KB just for transactions. Add accounts (20 × 8 cells × 16 = ~2.5 KB), cases (4 × 6 × 16 = ~0.4 KB), root (~0.1 KB). Total ≈ 100 KB peak Fe heap for full-depth Black Ledger. Plus the cart’s executable bytecode (probably ~30-50 KB for a 440-line cart), symbol table, closure captures, and per-mission-instance scratch. Realistic peak ≈ 150-200 KB. That’s in the same order of magnitude as the current 256 KB arena, which means Black Ledger at full design depth is plausible but not safe — one mission-instance bug (a runaway list-build, an unintended recursion, an uncleared let binding holding a long list alive) and the arena overflows mid-mission with a player visible crash.

The measurement: bench Black Ledger at scaled depths (current 1×1×3, then 2×3×10, 3×5×25, 4×5×25, 4×8×50) at arena sizes 32 KB, 64 KB, 128 KB, 256 KB, 512 KB. Map “depth that runs without arena overflow” to “minimum arena size for that depth.” Pick the new default at “supports full design depth + 50% safety margin” (per the example, that’s ~300-400 KB; round up to 512 KB? or stay at 256 KB and constrain Black Ledger’s design target?). The decision is Josh’s; the data informs it.

Acceptance criteria expanded (≥6 testable items)

Bench harness: new kn86-emulator/tests/bench_fe_arena_scale.c (or extend an existing bench_fe_*.c if one exists) that, parameterized by arena size at fe_open(), runs the Black Ledger procedural generator at scaled depths and measures: (a) peak Fe heap usage via fe_used() or equivalent introspection; (b) does the run complete or arena-overflow; (c) wall time per run (informational). The harness scripts a matrix: arena ∈ {32, 64, 128, 256, 512} KB × depth ∈ {1×1×3, 2×3×10, 3×5×25, 4×5×25, 4×8×50}.
Same matrix run for NeonGrid and Depthcharge generators, to validate that the chosen default doesn’t over-fit Black Ledger. NeonGrid generates patrol grids (LFSR-seeded sentry patterns); Depthcharge generates sonar contact tables (similar shape). Their peak heap profiles will differ — capture them.
Pi Zero 2 W RAM headroom check: the Pi Zero 2 W has 512 MB. Linux + nOSh runtime + display framebuffer + audio buffers + idle-state daemons consume some baseline (~50-150 MB depending on configuration; verify against Pi Zero bring-up notes). The remaining envelope is the practical ceiling on per-cart arena × concurrent-cart count. Document the headroom calculation in the ADR-0004 amendment so the next person resizing the arena understands the upper bound.
New default chosen with three justifications written into the ADR-0004 amendment: (a) the measured “smallest arena that supports the full design depth of the worst-case cart” + 50% safety margin; (b) the Pi Zero 2 W RAM-envelope headroom (the new default doesn’t squeeze concurrent-cart capacity); (c) one paragraph of “future-proof room” — at what arena size do we hit a hard limit (Pi Zero 2 W RAM minus baseline) and how far away are we from it. Recommended new default landing zone: 256 KB to 1 MB depending on measurement.
docs/adr/ADR-0004-vm-selection.md amended with an Amendment Log entry (following the ADR-0006 / ADR-0005 amendment pattern) covering: the Wave 4 retroactive bump from 32 KB → 256 KB; the GWP-233 measurement run; the new chosen default; and a measurement table (arena size × cart × depth × peak heap × completed). The five “16-32 KB” mentions in ADR-0004 body get retired with strikethrough notation per the existing amendment style. No new ADR.
docs/adr/ADR-0005-ffi-surface.md cross-reference: the L242 string-length note (“typically 256–1024 bytes per string”) gets a one-line update with the new arena default reference. Spec Hygiene Rule 3 sweep — grep for any other “32 KB” / “32-KB” arena mentions across docs/ and update.
kn86-emulator/src/cartridge.c:59 updated if the new default differs from 256 KB. The in-source comment block (L43-58) is rewritten to cite ADR-0004 amendment as the new authoritative source instead of “Wave 4’s first Lisp cart at 440 lines.”
Black Ledger scope decision documented: if the measurement reveals the cart can run at full design depth (4×8×50) within the new arena, file a follow-on task to scale blackledger.lsp to its original target. If the cart can only safely hit a smaller depth (e.g., 3×5×25), document that as the official scope and remove the “scaled back” framing from the cart’s design notes.

Cross-references (cart specs that consume + ADRs that constrain)

ADR-0004 (vm-selection): canonical home for arena spec; primary amendment target.
ADR-0005 (ffi-surface): consumer of arena size for string-length budget; secondary cross-reference.
ADR-0010 (icebreaker-lisp-sketch): cited in the cartridge.c comment as the source of the 32 KB baseline; the amendment supersedes that figure.
docs/software/cartridges/modules/black-ledger.md (cart design bible): consumer of arena scope decision; update if scope-back is finalized or if scale-up is unblocked.
docs/software/cartridges/modules/neon-grid.md, docs/software/cartridges/modules/depthcharge.md: same shape; potential consumers if the measurement pass surfaces depth ceilings for them.
kn86-emulator/src/cartridge.c: the KN86_CART_FE_ARENA_SIZE define. Pi Zero 2 W bring-up notes (when they exist) for RAM headroom validation.

Edge cases (≥3)

Arena fragmentation under arena-allocator semantics. Fe is bump-allocated and resets at cart load — no GC, no real fragmentation in the traditional sense. But: long-running mission instances that build and discard intermediate lists will permanently consume arena space until the next cart load (because nothing frees mid-mission). Measurement should distinguish “peak heap mid-mission” from “peak heap at mission end” — the former is the load-bearing constraint.
Per-cart vs global arena confusion. cartridge.c:60 is the per-cart arena (g_cart_fe_arena). main.c has a separate g_fe_arena for the runtime/REPL Lisp context. The measurement is for the cart arena only; runtime arena is on a different sizing track. Make sure the bench harness only measures the cart arena.
Concurrent-cart scenario doesn’t exist yet, but capacity-plan it. Hot Swap (per ADR-0019) means at most one cart loaded at a time, and the cart arena is reset at unload — so the per-cart arena × N is not load-bearing. But Universal Deck State + nOSh runtime objects + Cipher voice memory are always-resident; the new default should leave clean headroom for those even at the upper measurement bound.
Bench harness reproducibility. LFSR-seeded procedural generation is deterministic given a seed, but Fe heap usage may vary slightly run-to-run if cell ordering changes between recompiles. Pin the bench seed and the Fe build version in the bench output so a future engineer can reproduce.
256 KB → larger size impact on Pi Zero 2 W boot time / process working set. A 1 MB cart arena doubled across two cart slots (if we ever support that) is 2 MB always-allocated even when carts are unused. Marginal on 512 MB, but worth noting in the headroom paragraph.

Engineering hand-off notes

Files owned: kn86-emulator/tests/bench_fe_arena_scale.c (new bench harness), maybe kn86-emulator/tests/CMakeLists.txt to register it. kn86-emulator/src/cartridge.c:59 (the constant) if the default changes.
Files added-to: docs/adr/ADR-0004-vm-selection.md (Amendment Log entry — substantial, ~50 lines including measurement table). docs/adr/ADR-0005-ffi-surface.md (single-line cross-reference). Black Ledger / NeonGrid / Depthcharge cart design bibles only if scope decisions change.
Files NOT touched: any ADR Decision section (only Amendment Log additions per the existing pattern). No FFI surface changes. No cart code changes (those are follow-on tasks if scope-up is unblocked).
Expected PR size: ~300 lines new bench code, ~80 lines ADR-0004 amendment, ~5 lines ADR-0005 cross-ref, ~10 lines cartridge.c constant + comment update. Single C engineer, ~1.5 days with measurement-and-write-up time honestly accounted for.
Test strategy: the bench is the measurement. No unit tests — bench output IS the validation. PR description includes the measurement table verbatim so reviewers can sanity-check the chosen default.
Dispatch shape: single C engineer, additive. Sequential within the agent’s queue — measurement before ADR write-up. Independent of every other Sprint 4 candidate. Can run in parallel with anything else.
Watch for: ADR-0004’s “16-32 KB” appears five times in body prose; the amendment must strikethrough or note-supersede each one (per the ADR-0006 / ADR-0005 amendment patterns) rather than overwriting in place. Spec Hygiene Rule 3 — every place the spec was wrong is itself a stale-reference candidate.

Open questions for Josh

Amendment target: ADR-0004 (recommended) or ADR-0005 (per task body) or new ADR-0023? Recommendation: ADR-0004 amendment. ADR-0005 is the wrong home (FFI surface, not VM config); a new ADR is overkill (this is a value tweak with empirical justification, not a new architectural decision).
Black Ledger scope decision: scale-up or freeze-at-current? Driven by measurement. If the new arena supports full design depth (4×8×50) safely, recommend filing a follow-on task to scale the cart back up. If it doesn’t, freeze the scaled-back scope and document it as official.
New-default landing zone preference. Measurement will recommend; Josh confirms. Bands: stay at 256 KB; bump to 512 KB; bump to 1 MB. Each has different headroom-vs-future-proofing tradeoffs.