Skip to content

Panic Recovery — nOSh Runtime Cart Isolation

Status: Initial surface, GWP-320 (2026-04-27). Module: kn86-emulator/src/panic.[ch] Tests: kn86-emulator/tests/test_panic_handler.c

The panic-recovery subsystem isolates cartridge faults from the host nOSh runtime. A fault inside cart code — a segfault from a wild pointer in a C-handler cart, an arena exhaustion in Fe, or an explicit (panic) call from cart Lisp — unloads the offending cart, captures a panic record, posts a CIPHER-LINE Row 4 notice, writes a log file, and returns the operator to bare-deck Terminal. The host process never dies.

Reference: ADR-0004 (Fe arena discipline), ADR-0006 (cart format), ADR-0015 (CIPHER-LINE), CLAUDE.md Canonical Hardware Specification.

SourceDetectionTrigger path
SIGSEGV / SIGFPE / SIGBUSsigaction with SA_SIGINFO + SA_NODEFERsignal handler -> siglongjmp
Fe runtime errorfe_error() invoked anywhere in cart evalpanic_fe_error_trap -> panic_trigger
Cart self-haltCart Lisp calls (panic "reason")lisp_panic_prim -> fe_error -> trap

Fe arena exhaustion reduces to source #2: when the arena is full, Fe’s allocator calls fe_error("out of memory"), which the trap catches.

The signal handler is restricted to the POSIX async-signal-safe API. It does no allocation, no printf, no library calls except raise(), sigaction(), and siglongjmp(). The “reason” string for a signal is a static literal copied byte-by-byte into a preallocated buffer (g_panic_reason_buf) using a hand-rolled loop. Everything else — log IO, OLED writes, cart unload — runs after siglongjmp returns control to ordinary code.

When no panic guard is armed, the signal handler reinstalls SIG_DFL and re-raises the signal. This preserves debugger behavior and core dumps for genuine host bugs. The guard is only armed inside protected cart-invocation regions.

#include "panic.h"
PanicRecord rec;
if (panic_guard_enter("phase-tick", &rec) == 0) {
/* protected region: any signal / fe_error / (panic) here unwinds
* back to this call site with a non-zero return on the second
* call. */
invoke_cart_lisp_handler(...);
panic_guard_leave();
} else {
/* recovery: rec is populated, cart already unloaded, log
* written, OLED row 4 set. Return to bare-deck. */
log_recovery(rec);
}

The guard is single-shot and not nestable. nOSh runs one event-loop thread; only one cart-invocation scope is protected at a time.

The runtime calls (in main.c):

panic_init(true); /* installs SIGSEGV/SIGFPE/SIGBUS */
panic_bind_state(&g_state);
panic_set_unload_hook(cartridge_unload_module);
panic_set_oled_hook(nosh_oled_set_row);

Per-cart, the cart loader installs panic_fe_error_trap on the cart’s fe_Context:

fe_handlers(cart_ctx)->error = panic_fe_error_trap;

The Fe-error trap routes through panic_trigger, so cart-load failures (the existing cart_load_error_handler path in cartridge.c) and runtime cart faults share one recovery flow.

  1. Signal handler / Fe trap / (panic) calls panic_trigger.
  2. panic_trigger records source + reason in async-safe buffers, then siglongjmps to the active guard.
  3. The guard’s panic_guard_enter_impl sees a non-zero return code, disarms the guard, and assembles a PanicRecord with cart_id, phase index, phase label, source, reason, and timestamp.
  4. The unload hook is invoked (production: cartridge_unload_module).
  5. The log is written to <log_dir>/<timestamp_ms>.log.
  6. The OLED hook is invoked with (row=3, "CART HALTED <SOURCE>").
  7. The macro returns 1 to the caller, which can resume bare-deck.
kn86-panic v1
timestamp_ms=1714209033412
source=SIGSEGV
cart_id=0x1B72FE94
phase_index=2
phase_label=phase-tick
reason=SIGSEGV in cart code (memory fault)

Default log directory is $HOME/.kn86/panics. Override with panic_set_log_dir(). The directory is created lazily via a mkdir -p walk.

CIPHER voice is OLED-exclusive (CLAUDE.md Spec Hygiene Rule 6, ADR-0015). Row 4 (OLED_ROW_CONTEXTUAL) is the contextual slot — status / timer / mission meta. A cart-halt notice is exactly that kind of transient contextual content, so the panic notice writes there:

CART HALTED SIGSEGV

The notice is plain ASCII so it does not enter the CIPHER glyph domain. The cipher style guide (PR #178) governs CIPHER voice content; “CART HALTED” is firmware status, not CIPHER speech.

(panic "self-test invariant failed")

Halts the cart cleanly. Routes through fe_error, so the same recovery flow as arena exhaustion fires. msg is converted to a string via fe_tostring and stored in the panic record’s reason field.

This is the first panic surface. Out of scope for v0.1:

  • Crash analytics aggregation / device telemetry.
  • Persistent panic-log rotation. Today every panic emits a fresh file; a long-running deck eventually accumulates them.
  • Panic-on-stuck-cart (watchdog timer). A cart that runs forever inside a single Lisp form bypasses the panic surface entirely; the existing fe_set_instr_budget (GWP-248) covers some of this in scripted-mission contexts but not in cell handlers.
  • Stack-overflow detection. SIGSEGV from stack overflow may land in an alternate stack region; today we treat all SIGSEGVs the same. A sigaltstack config could harden this.
  • Multi-cart isolation. We support repeated panics across cart cycles, but only one cart is loaded at a time. When/if multiple carts coexist (Sysop-mode link cable, ADR-0023 fallout, etc.) the guard model needs revisiting.

These can land as separate work items once the bring-up surface is stable.