Panic Recovery — nOSh Runtime Cart Isolation
Status: Initial surface, GWP-320 (2026-04-27).
Module: kn86-emulator/src/panic.[ch]
Tests: kn86-emulator/tests/test_panic_handler.c
The panic-recovery subsystem isolates cartridge faults from the host
nOSh runtime. A fault inside cart code — a segfault from a wild
pointer in a C-handler cart, an arena exhaustion in Fe, or an
explicit (panic) call from cart Lisp — unloads the offending cart,
captures a panic record, posts a CIPHER-LINE Row 4 notice, writes a
log file, and returns the operator to bare-deck Terminal. The host
process never dies.
Reference: ADR-0004 (Fe arena discipline), ADR-0006 (cart format), ADR-0015 (CIPHER-LINE), CLAUDE.md Canonical Hardware Specification.
Three Fault Sources
Section titled “Three Fault Sources”| Source | Detection | Trigger path |
|---|---|---|
SIGSEGV / SIGFPE / SIGBUS | sigaction with SA_SIGINFO + SA_NODEFER | signal handler -> siglongjmp |
| Fe runtime error | fe_error() invoked anywhere in cart eval | panic_fe_error_trap -> panic_trigger |
| Cart self-halt | Cart Lisp calls (panic "reason") | lisp_panic_prim -> fe_error -> trap |
Fe arena exhaustion reduces to source #2: when the arena is full,
Fe’s allocator calls fe_error("out of memory"), which the trap
catches.
Async-Signal-Safety
Section titled “Async-Signal-Safety”The signal handler is restricted to the POSIX async-signal-safe API.
It does no allocation, no printf, no library calls except
raise(), sigaction(), and siglongjmp(). The “reason” string for
a signal is a static literal copied byte-by-byte into a
preallocated buffer (g_panic_reason_buf) using a hand-rolled
loop. Everything else — log IO, OLED writes, cart unload — runs
after siglongjmp returns control to ordinary code.
When no panic guard is armed, the signal handler reinstalls
SIG_DFL and re-raises the signal. This preserves debugger
behavior and core dumps for genuine host bugs. The guard is only
armed inside protected cart-invocation regions.
Guard API
Section titled “Guard API”#include "panic.h"
PanicRecord rec;if (panic_guard_enter("phase-tick", &rec) == 0) { /* protected region: any signal / fe_error / (panic) here unwinds * back to this call site with a non-zero return on the second * call. */ invoke_cart_lisp_handler(...); panic_guard_leave();} else { /* recovery: rec is populated, cart already unloaded, log * written, OLED row 4 set. Return to bare-deck. */ log_recovery(rec);}The guard is single-shot and not nestable. nOSh runs one event-loop thread; only one cart-invocation scope is protected at a time.
Boot Wiring
Section titled “Boot Wiring”The runtime calls (in main.c):
panic_init(true); /* installs SIGSEGV/SIGFPE/SIGBUS */panic_bind_state(&g_state);panic_set_unload_hook(cartridge_unload_module);panic_set_oled_hook(nosh_oled_set_row);Per-cart, the cart loader installs panic_fe_error_trap on the
cart’s fe_Context:
fe_handlers(cart_ctx)->error = panic_fe_error_trap;The Fe-error trap routes through panic_trigger, so cart-load
failures (the existing cart_load_error_handler path in
cartridge.c) and runtime cart faults share one recovery flow.
Recovery Sequence
Section titled “Recovery Sequence”- Signal handler / Fe trap /
(panic)callspanic_trigger. panic_triggerrecords source + reason in async-safe buffers, thensiglongjmps to the active guard.- The guard’s
panic_guard_enter_implsees a non-zero return code, disarms the guard, and assembles aPanicRecordwith cart_id, phase index, phase label, source, reason, and timestamp. - The unload hook is invoked (production:
cartridge_unload_module). - The log is written to
<log_dir>/<timestamp_ms>.log. - The OLED hook is invoked with
(row=3, "CART HALTED <SOURCE>"). - The macro returns 1 to the caller, which can resume bare-deck.
Log Format
Section titled “Log Format”kn86-panic v1timestamp_ms=1714209033412source=SIGSEGVcart_id=0x1B72FE94phase_index=2phase_label=phase-tickreason=SIGSEGV in cart code (memory fault)Default log directory is $HOME/.kn86/panics. Override with
panic_set_log_dir(). The directory is created lazily via a
mkdir -p walk.
CIPHER-LINE Notice
Section titled “CIPHER-LINE Notice”CIPHER voice is OLED-exclusive (CLAUDE.md Spec Hygiene Rule 6,
ADR-0015). Row 4 (OLED_ROW_CONTEXTUAL) is the contextual slot —
status / timer / mission meta. A cart-halt notice is exactly that
kind of transient contextual content, so the panic notice writes
there:
CART HALTED SIGSEGVThe notice is plain ASCII so it does not enter the CIPHER glyph domain. The cipher style guide (PR #178) governs CIPHER voice content; “CART HALTED” is firmware status, not CIPHER speech.
NoshAPI: (panic msg)
Section titled “NoshAPI: (panic msg)”(panic "self-test invariant failed")Halts the cart cleanly. Routes through fe_error, so the same
recovery flow as arena exhaustion fires. msg is converted to a
string via fe_tostring and stored in the panic record’s
reason field.
Limitations / Followups
Section titled “Limitations / Followups”This is the first panic surface. Out of scope for v0.1:
- Crash analytics aggregation / device telemetry.
- Persistent panic-log rotation. Today every panic emits a fresh file; a long-running deck eventually accumulates them.
- Panic-on-stuck-cart (watchdog timer). A cart that runs forever
inside a single Lisp form bypasses the panic surface entirely;
the existing
fe_set_instr_budget(GWP-248) covers some of this in scripted-mission contexts but not in cell handlers. - Stack-overflow detection. SIGSEGV from stack overflow may land
in an alternate stack region; today we treat all SIGSEGVs the
same. A
sigaltstackconfig could harden this. - Multi-cart isolation. We support repeated panics across cart cycles, but only one cart is loaded at a time. When/if multiple carts coexist (Sysop-mode link cable, ADR-0023 fallout, etc.) the guard model needs revisiting.
These can land as separate work items once the bring-up surface is stable.