ADR-0045: Background-system substrate — scheduler, deferred ticks, and power-off catch-up
Context
Section titled “Context”nOSh subsystems can already talk to each other: runtime/src/nosh_event_bus.{h,c} is a live, in-process pub/sub bus — 64 static subscription slots, fixed 32-byte payloads, no malloc, single-threaded fan-out in registration order, error-isolated handlers. Every deck_set_*() UDS mutation publishes; CIPHER and the board generator subscribe. The “how do parts notify each other” problem is solved.
What does not exist is a way for a subsystem to run in the background — to advance on a clock regardless of which screen is active, and to survive the device powering off. Concretely:
- The tick order is hardcoded. Every per-frame updater (CIPHER, attract, REPL toast, tutorial wiring) is hand-wired, in a fixed sequence, inside the host’s
frame_step()(hosts/emulator/src/main.c). There is no registry; a new nOSh background system cannot exist without editing the host loop. - There is no deferred firing. Nothing can say “do X once at T+30s” or “advance Y every N seconds.” Cadence is implicitly 60 fps (whatever the frame loop runs at) or nothing.
- There is no power-off catch-up. State that should evolve while the deck is in a bag (faction heat, world drift, anything time-based) has nowhere to live and no way to “advance by the elapsed gap” when the operator powers back on.
The triggering need is gameplay that feels alive without the operator watching it — a process that advances while you are playing the deck and while the deck is off, then surfaces through the bus. The motivating example was a background “pursuit” that closes in on you across runs; the decision here is deliberately the general substrate, not that one system. Any nOSh metagame is a client of it.
Forcing functions
Section titled “Forcing functions”- We want background activity now, and every ad-hoc approach (bolt another tick into
frame_step, hand-roll per-system catch-up) compounds host-loop coupling and gets re-paid per system. - The device is offline-first — it is not intended to be regularly internet-connected (operator decision, 2026-06-22). Without NTP, the OS cannot recover wall-clock time after power-off, and monotonic time (
SDL_GetTicks,CLOCK_MONOTONIC) resets to zero every boot — so “how long was I off?” is unanswerable in software alone. This forces a hardware timekeeping decision concurrently with the software one. - A prior policy assumed the opposite. GWP-349 (“OS settings — time/clock policy”, in Review, PR #217) accepted the no-RTC reality and leaned on fake-hwclock (restore the last saved time) + NTP-only-in-dev. That restores a clock but cannot measure elapsed-while-off — fake-hwclock’s saved time barely advances across a power cycle, so catch-up against it would undercount the offline gap to ≈0 and the feature would silently do nothing. This ADR adds the RTC as the primary clock and demotes fake-hwclock to a dead-battery fallback, amending GWP-349’s policy.
Constraints
Section titled “Constraints”- Single-threaded, single-process runtime on a Pi Zero 2 W. (This rules out the inter-process transports — dbus/zeromq — that were floated; see Options.)
- No malloc in hot paths. Any table is a fixed static pool.
- Background systems that touch the economy/UDS must go through the existing sanctioned
deck_set_*()mutators (integrity cores + bus publish), not poke state directly.
Decision
Section titled “Decision”Add a runtime-owned scheduler: a fixed table of entries advanced against a virtual clock, with two registration verbs and one per-frame driver. Periodic background systems and one-shot deferred ticks are the same mechanism (a one-shot is an entry with interval_ms == 0). Power-off catch-up reuses the same tick path with one large dt, sourced from an RTC-synced system wall-clock.
-
Scheduler lives in
runtime/(newsched.{h,c}), inside the DeckRunner step (ADR-0040). Both hosts get it for free; the runtime stays SDL-free (the host passes in the clock). -
Entry table — fixed static pool (~16–32 slots), no malloc:
typedef struct {uint32_t id;uint64_t next_due_ms; /* against the virtual clock */uint32_t interval_ms; /* 0 = one-shot, else periodic */bool enabled;void (*fn)(uint32_t dt_ms, void *ctx); /* dt since THIS entry last fired */void *ctx;} SchedEntry; -
Two registration verbs (C, runtime-internal — nOSh-owned, no cart FFI in v1):
sched_every(interval_ms, fn, ctx) -> id— a background system: runs forever at a cadence (1 Hz, 4 Hz — not 60 fps).sched_after(delay_ms, fn, ctx) -> id— a scheduled tick: fires once at T+delay.sched_cancel(id)— remove an entry.
-
One per-frame driver the host already has the clock for:
nosh_background_tick(now_ms)walks the table, fires every entry whosenext_due_ms <= virtual_now, hands each its owndt_ms(virtual_now - last_fired), then reschedules periodics (next_due = virtual_now + interval_ms,interval==0→ disable/free). This replaces hand-wiring new systems intoframe_step; it is additive — the existing hardcoded tickers stay put and may migrate later. -
Catch-up is the same
fn(dt_ms)path with a fatdt(the load-bearing idea):- While powered on,
virtual_nowadvances by the host’s monotonic delta each frame; every cadence step callsfnwith a smalldt. - At boot, the runtime computes the offline gap =
realtime_now − last_persisted_realtime(from the RTC-synced system clock), applies it as a one-time jump tovirtual_now, and runs the table once. Any entry that came due during the gap fires exactly once withdt == the whole gapand integrates it itself — no replaying 600 ticks for a 10-hour gap. Coalescing falls out of the data model for free. - The system periodically persists
realtime_now(every few seconds, not only on clean shutdown) so a yanked battery still leaves a recent timestamp.
- While powered on,
-
Add a hardware RTC (DS3231-class) on the Pi’s I²C bus + a backup coin cell. Standard
dtoverlay=i2c-rtc,ds3231+hwclocksync means the kernel restores the system clock at boot.libnoshreads the ordinary system clock — no RTC driver in the runtime. This is a hardware-spec change (BOM + the umbrella canonical-spec companion). -
Clamp the offline gap to a sane maximum before applying it, to bound integer math and prevent a runaway catch-up (clock fault, dead coin cell, or future tamper) from over-advancing economy-touching systems.
-
Output rides the existing bus. Background systems publish on
nosh_event_bus; consumers (CIPHER, status bar, screen router, the board) subscribe as they already do. Economy/UDS effects go throughdeck_set_*(). What the background does to the foreground (ambient narration → soft world-changes → hard intrusion) is consumer/gameplay behavior built on top, and is out of scope for this substrate ADR. Hard intrusion (a background system seizing the active screen) needs a safe-point-negotiated preemption mechanism — deferred to its own ADR when a consumer actually needs it.
Options Considered
Section titled “Options Considered”Option A: Unified scheduler table + virtual-clock catch-up + hardware RTC. (ACCEPTED)
Section titled “Option A: Unified scheduler table + virtual-clock catch-up + hardware RTC. (ACCEPTED)”One fixed table; sched_every/sched_after are the same rows; live ticking and power-off catch-up are one fn(dt) code path; an RTC makes the offline gap measurable.
Chosen because it gives the full capability (background run + deferred fire + catch-up) with one small, testable mechanism that reuses the bus we already have, keeps the dangerous parts (memory, the clock) in C, and writes the stepping logic once for both live and catch-up.
Option B: Two separate mechanisms — a tick registry for periodics and a separate timer queue for one-shots.
Section titled “Option B: Two separate mechanisms — a tick registry for periodics and a separate timer queue for one-shots.”Rejected. Two data structures, two code paths, two sets of tests for one concept. The table unifies them at zero cost (interval_ms == 0).
Option C: No scheduler — keep hardcoding new tickers into frame_step, hand-roll catch-up per system.
Section titled “Option C: No scheduler — keep hardcoding new tickers into frame_step, hand-roll catch-up per system.”Rejected. Doesn’t scale past one system, offers no deferred firing, deepens host-loop coupling, and makes every system reinvent (and re-bug) catch-up. This is the debt we’re buying out.
Option D: Inter-process messaging (dbus / zeromq).
Section titled “Option D: Inter-process messaging (dbus / zeromq).”Rejected — wrong tool class. Those are inter-process transports (sockets, a broker, serialization) for crossing address-space or machine boundaries. libnosh is a single-threaded, single-process runtime; the “parts” are structs in one g_state. A socket bus buys serialization overhead and broker failure modes to talk between things already in the same memory. Documented here because it was explicitly floated in design review.
Option E: Software-only catch-up (no RTC) — NTP when online, treat unknown gaps as zero.
Section titled “Option E: Software-only catch-up (no RTC) — NTP when online, treat unknown gaps as zero.”Rejected given the offline-first decision. Monotonic time resets every boot; with no network the OS can’t recover wall time, so “elapsed while off” would silently collapse to zero and the whole point of catch-up evaporates. The RTC is the enabling hardware for the feature.
Trade-off Analysis
Section titled “Trade-off Analysis”| Dimension | A — unified table + RTC (chosen) | B — registry + timer queue | C — hardcode + ad-hoc | D — dbus/zeromq | E — no RTC |
|---|---|---|---|---|---|
| Background run (screen-independent) | ✓ | ✓ | ◐ host-coupled | ✓ (overkill) | ✓ |
| Deferred / scheduled fire | ✓ | ✓ | ✗ | ✓ | ✓ |
| Power-off catch-up | ✓ one path | ◐ separate | ◐ per-system | ◐ | ✗ collapses to 0 |
| One mechanism, low surface | ✓ | ✗ two | ✓ trivial | ✗ broker | ✓ |
| Fits single-process / no-malloc | ✓ | ✓ | ✓ | ✗ | ✓ |
| Hardware cost | ◐ RTC + coin cell | ◐ RTC | ◐ RTC | ◐ RTC | ✓ none |
| Scales to N systems | ✓ | ✓ | ✗ | ✓ | ✓ |
The chosen option’s real cost is the RTC on the BOM (a part, an I²C address, board space, a coin cell) and a virtual-clock concept authors must hold — including writing fns that tolerate a fat catch-up dt. Both are accepted: the offline-first stance makes the RTC mandatory regardless, and the fat-dt contract is exactly what lets one code path serve live + catch-up.
Consequences
Section titled “Consequences”Positive
Section titled “Positive”- A general background substrate: any nOSh metagame registers a tick and publishes on the bus — no host-loop edits.
- Periodic and deferred behavior from one mechanism; coalesced power-off catch-up for free.
- Additive and low-risk — the existing tick order is untouched; nothing has to migrate to land this.
- Reuses the shipped event bus for output; no new messaging machinery.
Negative / Accepted costs
Section titled “Negative / Accepted costs”- RTC added to the BOM (DS3231-class + backup coin cell) — cost, board space, one more I²C peripheral on the Pi bus.
- A virtual-clock abstraction to reason about; background
fns must integrate a large catch-updtcorrectly (a documented authoring contract, plus the gap clamp as a guardrail). - Economy-touching background systems must respect UDS integrity (
deck_set_*()), and catch-up must be clamped against clock faults.
Follow-on work this ADR creates
Section titled “Follow-on work this ADR creates”- C Eng — implement
runtime/src/sched.{h,c}; wirenosh_background_tickinto the DeckRunner step + both hosts; persist/restore the wall-clock timestamp; gap clamp; ctest coverage (periodic cadence, one-shot fire, cancel, fat-dtcatch-up, clamp boundary). - Hardware — select the RTC part; update the BOM (build-specification.md); author the canonical-spec companion RTC row in the umbrella
CLAUDE.md(cross-repo). - Platform Eng —
dtoverlay=i2c-rtc+hwclocksync on the system image; confirm the Pi I²C bus/address; verify clock restore across a real power cycle; reconcile GWP-349 (RTC primary, fake-hwclock demoted to fallback) incl.kn86-nosh.serviceordering vs the clock-restore unit. - Deferred (separate ADRs, when a consumer needs them): (a) safe-point-negotiated hard intrusion (background→foreground preemption); (b) a minimal cart FFI (
heat-add/heat-level-style) if a cart-owned background consumer ever appears — v1 is nOSh-internal only.
Documentation Updates (REQUIRED — Spec Hygiene Rule 3)
Section titled “Documentation Updates (REQUIRED — Spec Hygiene Rule 3)”-
docs/adr/ADR-0045-background-system-substrate.md— this file. -
docs/adr/README.md— index row added (0045, Proposed). - Umbrella
CLAUDE.md(kinoshita repo) — Canonical Hardware Specification — add an RTC row (DS3231-class, on the Pi I²C bus, backup coin cell, enables offline timekeeping / power-off catch-up). Cross-repo companion — the spec table is NOT inkn-86. -
docs/device/hardware/build-specification.md— BOM: RTC module + coin cell; I²C wiring note. -
docs/device/os/release-setup.md(or the device-config doc covering overlays) —i2c-rtcoverlay +hwclocksync. -
docs/device/os/boot-and-systemd.md/kiosk-mode.md— time/clock policy: RTC primary, fake-hwclock demoted to dead-battery fallback; amends GWP-349 (the no-RTC policy). -
runtime/src/sched.{h,c}— new files (header documents the virtual-clock + fat-dtcatch-up contract). Landed +runtime/tests/test_sched.c(13 cases) + wired intoruntime/CMakeLists.txt(libnosh source + test target). Suite green: 121/121. -
hosts/emulator/—sched_init()besidenosh_event_bus_init()+sched_tick(current_tick)inframe_step(main.c) +sched.cin the host source list (CMakeLists.txt). Emulator builds;--sys-screen-smokeOK. (Device hostsched_catchup()boot wiring is the Platform Eng track.) -
software/runtime/background-systems.md— new companion spec (Draft): the scheduler model, the authoring contract for backgroundfns, the catch-up/coalescing rules.
Narrative (for the design history)
Section titled “Narrative (for the design history)”The deck could already let its parts talk — there’s a small in-process event bus that mission, cart, and deck-state changes already publish on. What it couldn’t do was let a part live in the background: advance on its own clock while you’re playing something else, fire something on a timer, or pick up where it left off after the deck spent a week in a drawer. This ADR adds that missing layer as one small mechanism — a fixed table of scheduled entries on a virtual clock, where “a background system” and “a one-shot timer” are the same thing, and where catching up after power-off is the exact same step taken with one big stride instead of many small ones. The one thing software couldn’t supply on its own was knowing how much time had passed while the deck was off — so, because this deck is meant to live offline, we add a real-time clock chip to tell it. Everything a future metagame needs to feel alive while you’re not looking now has a home; what each of them does to you is a story told later, on top of this.