Sprint 4 Design Pack — GWP-265
Story narrative
Section titled “Story narrative”ADR-0009 (token prediction) is Accepted v1.0 and Extended by ADR-0016 with cart-grammar contributions. The implementation in kn86-emulator/src/t9_rank.c ranks predictive-palette candidates by combining five terms — legal-form, vocabulary boost (the +5 bonus from ADR-0009 §3), local-id boost (cart-grammar contributions per ADR-0016), recency (last-N-keystrokes ring), popularity (lifetime keystroke counts). Unit tests in tests/test_t9_prediction.c verify the math. What’s missing is observability. During a play session, Josh and PM cannot tell whether the +5 vocabulary boost is firing, whether one term is dominating the score, or why a particular candidate floated to slot 1. The dev overlay (F11, GWP-226) has no T9 panel.
This is straightforward debug-instrumentation work: add a non-mutating t9_rank_explain() entry point that returns the same top-N candidates the ranker just chose, but with each term’s per-candidate contribution exposed. Wire that into the F11 overlay as a new “T9 Ranker” tab. Doc a short “Debugging” subsection in the token-prediction reference.
The load-bearing design constraint is no behavior change. The explain entry point must call into the same ranking math as the production path; if explain and production drift, the panel becomes a lie. This is best done by extracting the per-term scoring into a shared helper and having both t9_rank() and t9_rank_explain() call it — one returns a sorted top-N list, the other returns the same list plus the per-term breakdown. The test (test_t9_explain) validates that the sorted candidate list from t9_rank_explain matches t9_rank byte-for-byte.
Acceptance criteria expanded (≥4 testable items with file paths)
Section titled “Acceptance criteria expanded (≥4 testable items with file paths)”kn86-emulator/src/t9_rank.cgainst9_rank_explain(const char *buffer, int position, T9Explanation *out)— additive entry point, no change to existingt9_rank()signature or behavior. Output struct carries top-8 candidates × {string, legal_form_score, vocab_score, local_id_score, recency_score, popularity_score, final_score}. Pre-existingt9_rank()extracts the per-term math into a shared helper that both call.kn86-emulator/src/debug.cgains a “T9 Ranker” tab in the F11 dev overlay (alongside the existing tabs from GWP-226). Tab shows the explanation for the cursor position. Layout: 12-row × 80-col panel, header row with column titles (CANDIDATE,LEGAL,VOCAB,LOCAL,RECNCY,POP,SCORE), then 8 candidate rows, then 3 footer rows for a “currently dominant term” hint. If no input buffer is active, panel shows(no input — type to populate).kn86-emulator/tests/test_t9_explain.c(new file) asserts:- Explanation top-8 list matches
t9_rank()top-8 list byte-for-byte (no scoring drift). - When the buffer matches a vocab-boosted token, the
vocab_scorecell is non-zero on that row. - When the buffer is short (1–2 keys) and the recency ring is empty,
recency_scoreis 0 across all rows. - When a cart-grammar local-id contribution applies,
local_id_scoreis non-zero on the matching row(s). - Reading the explanation does NOT mutate the recency ring, the popularity counters, or any other ranker state (call
t9_rank_explain()twice in a row and confirm identical output).
- Explanation top-8 list matches
docs/software/api-reference/editor-tools/token-prediction.mdgains a “Debugging” subsection citing F11 → T9 Ranker tab as the inspection surface, with one screenshot or ASCII mockup of the panel layout. The subsection should also document the explanation column meanings (a one-line gloss per term: legal-form is “valid characters per T9 mapping,” vocab is “+5 if cart vocabulary lists this token,” etc.) so a cart author can read the panel without re-reading ADR-0009.- Behavior unchanged:
tests/test_t9_prediction.ccontinues to pass with zero modifications. (If it doesn’t, the helper extraction broke something — fix and re-run, don’t update the existing tests.)
Edge cases (≥2)
Section titled “Edge cases (≥2)”- Cursor position outside any input buffer. Player is on the bare-deck terminal HUD with no active text-entry surface. Panel shows
(no input — type to populate)rather than an empty 8-row dump. Thet9_rank_explain()entry point returns anout->candidate_count = 0sentinel; debug.c renders the placeholder string. No null-pointer hazard. - Tied scores. When two candidates have identical
final_score, the panel must render them in a deterministic order matchingt9_rank()’s tie-breaking rule. This is the “no scoring drift” assertion in test #3 above — make suretest_t9_explaincovers a tied-score input case explicitly. Recommend: a 1-key input where the recency ring and popularity counters are both empty, so all candidates score on legal-form alone (most ties). - Cart-grammar contribution active but cart not loaded. If ADR-0016’s cart-grammar local-id table is non-empty but the cart that contributed it has been ejected (per ADR-0019 hot-swap), the
local_id_scorecell should still render correctly (the table is owned by the runtime, not the cart, post-load — see ADR-0016). Add a test case for this if it’s quick; otherwise flag as edge case for QA.
Engineering hand-off notes
Section titled “Engineering hand-off notes”- Files owned:
kn86-emulator/src/t9_rank.c(additive — explain entry point + extracted shared helper). - Files added-to:
kn86-emulator/src/debug.c(new tab),kn86-emulator/tests/test_t9_explain.c(new file),docs/software/api-reference/editor-tools/token-prediction.md(new subsection). - Files NOT touched: ranker scoring logic itself (no behavior change).
tests/test_t9_prediction.ccontinues passing without modification. - Expected PR size: ~80 lines in t9_rank.c (extract helper + add explain entry point), ~120 lines in debug.c (new tab renderer), ~150 lines in test_t9_explain.c (5–6 cases), ~30 lines in docs. Single-engineer task, ~half a day with TDD.
- Test strategy: TDD as constrained. Write
test_t9_explain.cfirst asserting the explain output matches the ranker’s top-N. Implement the explain entry point + helper extraction to make tests pass. Then build the debug.c tab. - Dispatch shape: single C engineer, additive. Independent of all other Sprint 4 work — no ordering constraints. Could pair sensibly with GWP-236 (also F11 dev overlay polish) in the same agent’s queue, but doesn’t have to.
- Watch for: the helper extraction is the riskiest moment. If
t9_rank()’s scoring math is currently inlined in a way that touches the recency ring or popularity counters as a side effect (it should not, but verify by reading the file), the extraction has to preserve that. Test #3 (no mutation on explain) catches this.
Open questions
Section titled “Open questions”None — task is well-bounded; ADR-0009 and ADR-0016 are stable; the test surface is clear. This is a clean, small instrumentation task.