Skip to content

ADR-0045: Background-system substrate — scheduler, deferred ticks, and power-off catch-up

nOSh subsystems can already talk to each other: runtime/src/nosh_event_bus.{h,c} is a live, in-process pub/sub bus — 64 static subscription slots, fixed 32-byte payloads, no malloc, single-threaded fan-out in registration order, error-isolated handlers. Every deck_set_*() UDS mutation publishes; CIPHER and the board generator subscribe. The “how do parts notify each other” problem is solved.

What does not exist is a way for a subsystem to run in the background — to advance on a clock regardless of which screen is active, and to survive the device powering off. Concretely:

  1. The tick order is hardcoded. Every per-frame updater (CIPHER, attract, REPL toast, tutorial wiring) is hand-wired, in a fixed sequence, inside the host’s frame_step() (hosts/emulator/src/main.c). There is no registry; a new nOSh background system cannot exist without editing the host loop.
  2. There is no deferred firing. Nothing can say “do X once at T+30s” or “advance Y every N seconds.” Cadence is implicitly 60 fps (whatever the frame loop runs at) or nothing.
  3. There is no power-off catch-up. State that should evolve while the deck is in a bag (faction heat, world drift, anything time-based) has nowhere to live and no way to “advance by the elapsed gap” when the operator powers back on.

The triggering need is gameplay that feels alive without the operator watching it — a process that advances while you are playing the deck and while the deck is off, then surfaces through the bus. The motivating example was a background “pursuit” that closes in on you across runs; the decision here is deliberately the general substrate, not that one system. Any nOSh metagame is a client of it.

  • We want background activity now, and every ad-hoc approach (bolt another tick into frame_step, hand-roll per-system catch-up) compounds host-loop coupling and gets re-paid per system.
  • The device is offline-first — it is not intended to be regularly internet-connected (operator decision, 2026-06-22). Without NTP, the OS cannot recover wall-clock time after power-off, and monotonic time (SDL_GetTicks, CLOCK_MONOTONIC) resets to zero every boot — so “how long was I off?” is unanswerable in software alone. This forces a hardware timekeeping decision concurrently with the software one.
  • A prior policy assumed the opposite. GWP-349 (“OS settings — time/clock policy”, in Review, PR #217) accepted the no-RTC reality and leaned on fake-hwclock (restore the last saved time) + NTP-only-in-dev. That restores a clock but cannot measure elapsed-while-off — fake-hwclock’s saved time barely advances across a power cycle, so catch-up against it would undercount the offline gap to ≈0 and the feature would silently do nothing. This ADR adds the RTC as the primary clock and demotes fake-hwclock to a dead-battery fallback, amending GWP-349’s policy.
  • Single-threaded, single-process runtime on a Pi Zero 2 W. (This rules out the inter-process transports — dbus/zeromq — that were floated; see Options.)
  • No malloc in hot paths. Any table is a fixed static pool.
  • Background systems that touch the economy/UDS must go through the existing sanctioned deck_set_*() mutators (integrity cores + bus publish), not poke state directly.

Add a runtime-owned scheduler: a fixed table of entries advanced against a virtual clock, with two registration verbs and one per-frame driver. Periodic background systems and one-shot deferred ticks are the same mechanism (a one-shot is an entry with interval_ms == 0). Power-off catch-up reuses the same tick path with one large dt, sourced from an RTC-synced system wall-clock.

  1. Scheduler lives in runtime/ (new sched.{h,c}), inside the DeckRunner step (ADR-0040). Both hosts get it for free; the runtime stays SDL-free (the host passes in the clock).

  2. Entry table — fixed static pool (~16–32 slots), no malloc:

    typedef struct {
    uint32_t id;
    uint64_t next_due_ms; /* against the virtual clock */
    uint32_t interval_ms; /* 0 = one-shot, else periodic */
    bool enabled;
    void (*fn)(uint32_t dt_ms, void *ctx); /* dt since THIS entry last fired */
    void *ctx;
    } SchedEntry;
  3. Two registration verbs (C, runtime-internal — nOSh-owned, no cart FFI in v1):

    • sched_every(interval_ms, fn, ctx) -> id — a background system: runs forever at a cadence (1 Hz, 4 Hz — not 60 fps).
    • sched_after(delay_ms, fn, ctx) -> id — a scheduled tick: fires once at T+delay.
    • sched_cancel(id) — remove an entry.
  4. One per-frame driver the host already has the clock for: nosh_background_tick(now_ms) walks the table, fires every entry whose next_due_ms <= virtual_now, hands each its own dt_ms (virtual_now - last_fired), then reschedules periodics (next_due = virtual_now + interval_ms, interval==0 → disable/free). This replaces hand-wiring new systems into frame_step; it is additive — the existing hardcoded tickers stay put and may migrate later.

  5. Catch-up is the same fn(dt_ms) path with a fat dt (the load-bearing idea):

    • While powered on, virtual_now advances by the host’s monotonic delta each frame; every cadence step calls fn with a small dt.
    • At boot, the runtime computes the offline gap = realtime_now − last_persisted_realtime (from the RTC-synced system clock), applies it as a one-time jump to virtual_now, and runs the table once. Any entry that came due during the gap fires exactly once with dt == the whole gap and integrates it itself — no replaying 600 ticks for a 10-hour gap. Coalescing falls out of the data model for free.
    • The system periodically persists realtime_now (every few seconds, not only on clean shutdown) so a yanked battery still leaves a recent timestamp.
  6. Add a hardware RTC (DS3231-class) on the Pi’s I²C bus + a backup coin cell. Standard dtoverlay=i2c-rtc,ds3231 + hwclock sync means the kernel restores the system clock at boot. libnosh reads the ordinary system clock — no RTC driver in the runtime. This is a hardware-spec change (BOM + the umbrella canonical-spec companion).

  7. Clamp the offline gap to a sane maximum before applying it, to bound integer math and prevent a runaway catch-up (clock fault, dead coin cell, or future tamper) from over-advancing economy-touching systems.

  8. Output rides the existing bus. Background systems publish on nosh_event_bus; consumers (CIPHER, status bar, screen router, the board) subscribe as they already do. Economy/UDS effects go through deck_set_*(). What the background does to the foreground (ambient narration → soft world-changes → hard intrusion) is consumer/gameplay behavior built on top, and is out of scope for this substrate ADR. Hard intrusion (a background system seizing the active screen) needs a safe-point-negotiated preemption mechanism — deferred to its own ADR when a consumer actually needs it.


Option A: Unified scheduler table + virtual-clock catch-up + hardware RTC. (ACCEPTED)

Section titled “Option A: Unified scheduler table + virtual-clock catch-up + hardware RTC. (ACCEPTED)”

One fixed table; sched_every/sched_after are the same rows; live ticking and power-off catch-up are one fn(dt) code path; an RTC makes the offline gap measurable. Chosen because it gives the full capability (background run + deferred fire + catch-up) with one small, testable mechanism that reuses the bus we already have, keeps the dangerous parts (memory, the clock) in C, and writes the stepping logic once for both live and catch-up.

Option B: Two separate mechanisms — a tick registry for periodics and a separate timer queue for one-shots.

Section titled “Option B: Two separate mechanisms — a tick registry for periodics and a separate timer queue for one-shots.”

Rejected. Two data structures, two code paths, two sets of tests for one concept. The table unifies them at zero cost (interval_ms == 0).

Option C: No scheduler — keep hardcoding new tickers into frame_step, hand-roll catch-up per system.

Section titled “Option C: No scheduler — keep hardcoding new tickers into frame_step, hand-roll catch-up per system.”

Rejected. Doesn’t scale past one system, offers no deferred firing, deepens host-loop coupling, and makes every system reinvent (and re-bug) catch-up. This is the debt we’re buying out.

Option D: Inter-process messaging (dbus / zeromq).

Section titled “Option D: Inter-process messaging (dbus / zeromq).”

Rejected — wrong tool class. Those are inter-process transports (sockets, a broker, serialization) for crossing address-space or machine boundaries. libnosh is a single-threaded, single-process runtime; the “parts” are structs in one g_state. A socket bus buys serialization overhead and broker failure modes to talk between things already in the same memory. Documented here because it was explicitly floated in design review.

Option E: Software-only catch-up (no RTC) — NTP when online, treat unknown gaps as zero.

Section titled “Option E: Software-only catch-up (no RTC) — NTP when online, treat unknown gaps as zero.”

Rejected given the offline-first decision. Monotonic time resets every boot; with no network the OS can’t recover wall time, so “elapsed while off” would silently collapse to zero and the whole point of catch-up evaporates. The RTC is the enabling hardware for the feature.


DimensionA — unified table + RTC (chosen)B — registry + timer queueC — hardcode + ad-hocD — dbus/zeromqE — no RTC
Background run (screen-independent)◐ host-coupled✓ (overkill)
Deferred / scheduled fire
Power-off catch-up✓ one path◐ separate◐ per-system✗ collapses to 0
One mechanism, low surface✗ two✓ trivial✗ broker
Fits single-process / no-malloc
Hardware cost◐ RTC + coin cell◐ RTC◐ RTC◐ RTC✓ none
Scales to N systems

The chosen option’s real cost is the RTC on the BOM (a part, an I²C address, board space, a coin cell) and a virtual-clock concept authors must hold — including writing fns that tolerate a fat catch-up dt. Both are accepted: the offline-first stance makes the RTC mandatory regardless, and the fat-dt contract is exactly what lets one code path serve live + catch-up.


  • A general background substrate: any nOSh metagame registers a tick and publishes on the bus — no host-loop edits.
  • Periodic and deferred behavior from one mechanism; coalesced power-off catch-up for free.
  • Additive and low-risk — the existing tick order is untouched; nothing has to migrate to land this.
  • Reuses the shipped event bus for output; no new messaging machinery.
  • RTC added to the BOM (DS3231-class + backup coin cell) — cost, board space, one more I²C peripheral on the Pi bus.
  • A virtual-clock abstraction to reason about; background fns must integrate a large catch-up dt correctly (a documented authoring contract, plus the gap clamp as a guardrail).
  • Economy-touching background systems must respect UDS integrity (deck_set_*()), and catch-up must be clamped against clock faults.
  • C Eng — implement runtime/src/sched.{h,c}; wire nosh_background_tick into the DeckRunner step + both hosts; persist/restore the wall-clock timestamp; gap clamp; ctest coverage (periodic cadence, one-shot fire, cancel, fat-dt catch-up, clamp boundary).
  • Hardware — select the RTC part; update the BOM (build-specification.md); author the canonical-spec companion RTC row in the umbrella CLAUDE.md (cross-repo).
  • Platform Engdtoverlay=i2c-rtc + hwclock sync on the system image; confirm the Pi I²C bus/address; verify clock restore across a real power cycle; reconcile GWP-349 (RTC primary, fake-hwclock demoted to fallback) incl. kn86-nosh.service ordering vs the clock-restore unit.
  • Deferred (separate ADRs, when a consumer needs them): (a) safe-point-negotiated hard intrusion (background→foreground preemption); (b) a minimal cart FFI (heat-add / heat-level-style) if a cart-owned background consumer ever appears — v1 is nOSh-internal only.

Documentation Updates (REQUIRED — Spec Hygiene Rule 3)

Section titled “Documentation Updates (REQUIRED — Spec Hygiene Rule 3)”
  • docs/adr/ADR-0045-background-system-substrate.md — this file.
  • docs/adr/README.md — index row added (0045, Proposed).
  • Umbrella CLAUDE.md (kinoshita repo) — Canonical Hardware Specification — add an RTC row (DS3231-class, on the Pi I²C bus, backup coin cell, enables offline timekeeping / power-off catch-up). Cross-repo companion — the spec table is NOT in kn-86.
  • docs/device/hardware/build-specification.md — BOM: RTC module + coin cell; I²C wiring note.
  • docs/device/os/release-setup.md (or the device-config doc covering overlays) — i2c-rtc overlay + hwclock sync.
  • docs/device/os/boot-and-systemd.md / kiosk-mode.md — time/clock policy: RTC primary, fake-hwclock demoted to dead-battery fallback; amends GWP-349 (the no-RTC policy).
  • runtime/src/sched.{h,c} — new files (header documents the virtual-clock + fat-dt catch-up contract). Landed + runtime/tests/test_sched.c (13 cases) + wired into runtime/CMakeLists.txt (libnosh source + test target). Suite green: 121/121.
  • hosts/emulator/sched_init() beside nosh_event_bus_init() + sched_tick(current_tick) in frame_step (main.c) + sched.c in the host source list (CMakeLists.txt). Emulator builds; --sys-screen-smoke OK. (Device host sched_catchup() boot wiring is the Platform Eng track.)
  • software/runtime/background-systems.md — new companion spec (Draft): the scheduler model, the authoring contract for background fns, the catch-up/coalescing rules.

The deck could already let its parts talk — there’s a small in-process event bus that mission, cart, and deck-state changes already publish on. What it couldn’t do was let a part live in the background: advance on its own clock while you’re playing something else, fire something on a timer, or pick up where it left off after the deck spent a week in a drawer. This ADR adds that missing layer as one small mechanism — a fixed table of scheduled entries on a virtual clock, where “a background system” and “a one-shot timer” are the same thing, and where catching up after power-off is the exact same step taken with one big stride instead of many small ones. The one thing software couldn’t supply on its own was knowing how much time had passed while the deck was off — so, because this deck is meant to live offline, we add a real-time clock chip to tell it. Everything a future metagame needs to feel alive while you’re not looking now has a home; what each of them does to you is a story told later, on top of this.