ADR-0009: Token Prediction v1 Ranking Model
Hardware retarget note (2026-04-21): Latency claims in this ADR were sized for RP2350 / Pico 2 (150 MHz Cortex-M33). The platform now targets Pi Zero 2 W (1 GHz Cortex-A53). The ranking model is target-independent and the budgets are more than satisfied on current hardware.
Date: 2026-04-14
Supersedes spike: former spikes/ADR-0002-token-prediction.md
Context: The nEmacs editor needs to rank tokens for the predictive palette. This spike defines a static, grammar-aware ranking model for v1.
Overview
Section titled “Overview”Goal: Given a cursor position in an s-expression, rank all possible next tokens such that the top 8 are the most useful for the player.
Strategy: Static model (no machine learning). Combines:
- Grammar filter (hard constraint): Only tokens that syntactically fit.
- Domain vocabulary boost: Cartridge-specific terms get priority.
- Buffer-local identifier boost: Recently defined identifiers.
- Recency weight: Tokens used in the last ~20 expressions.
- Popularity prior: Baseline weights for common forms.
Why static? Fast, deterministic, debuggable. Machine learning is post-launch.
Architecture
Section titled “Architecture”Input to Ranking Algorithm
Section titled “Input to Ranking Algorithm”RankTokens( cursor_position: Cursor, // Where we are in the tree context_stack: [Form], // Parent forms up to root buffer_content: [Form], // All forms in current buffer session_history: [Token], // Recent tokens this session cartridge_vocab: [Term], // Domain vocabulary local_bindings: [Identifier] // Let/lambda bindings visible here) -> [(Token, Score), ...]Legal-Form Filter (Hard Constraint)
Section titled “Legal-Form Filter (Hard Constraint)”A token is legal at cursor position if it satisfies syntactic rules.
Rules by position type:
| Position | Legal Tokens | Illegal |
|---|---|---|
| Function position (first element of list) | Callable: function names, macros, lambdas, built-in ops (+, -, etc.) | Data: literals (except quote), binding keywords |
| Argument position (non-first) | Data: identifiers, literals, function calls, quoted forms | Definitions: defn, defstruct, etc. |
| Binding position (inside let/lambda params) | Identifiers only | Functions, literals, forms |
| Root (empty buffer) | Top-level forms: defn, defstruct, defdomain, defmission | Bare data, lambdas |
Pseudo-code:
function is_legal(token, position): if position == FUNCTION_POS: return token in callables() elif position == ARGUMENT_POS: return token not in (defn, defstruct, let, defdomain, defmission) elif position == BINDING_POS: return token is identifier or token in (:as, :when) elif position == ROOT: return token in top_level_forms() else: return trueCallable set (function position):
- Builtins: if, let, lambda, cond, map, filter, fold, reduce, car, cdr, cons, null?, +, -, *, /
- User-defined functions in buffer
- Quoted values (for meta-evaluation)
Top-level forms (root):
- defn, defstruct, defdomain, defmission, let, lambda, quote, etc.
Ranking Formula
Section titled “Ranking Formula”For each legal token:
score = 0
// 1. Domain vocabulary boostif token in cartridge_vocabulary: score += DOMAIN_BOOST (=5)
// 2. Local binding boost (is it a locally-visible binding?)if token in local_bindings: score += LOCAL_BOOST (=3)
// 3. Recency weight (used recently in this session?)if token in session_history[-20:]: recency_position = session_history.rfind(token) recency_decay = max(0, 10 - (len(session_history) - recency_position)) score += recency_decay (0–10, decays with age)
// 4. Popularity prior (built-in baseline)if token in popularity_baseline: score += popularity_baseline[token] (0–4)
// 5. Semantic fit bonus (is this token "right" for the context?)if semantic_fit(token, context_stack): score += SEMANTIC_BONUS (=1)
return scoreBaselines (popularity_baseline):
| Token | Baseline | Rationale |
|---|---|---|
| if | +4 | Most common control flow |
| let | +4 | Most common binding form |
| lambda | +3 | Common in callbacks |
| defn | +2 | Definition form (less frequent in arguments) |
| map | +3 | Common higher-order function |
| car | +2 | List navigation primitive |
| cdr | +2 | List navigation primitive |
| cons | +1 | Construction (less common) |
| quote | +1 | Quoting (special) |
| nil | +2 | Common literal |
| + | +2 | Arithmetic (common in mission code) |
| − | +1 | Arithmetic |
| > | +2 | Comparison (mission logic) |
| = | +2 | Equality (common) |
| Other builtins | +0 | No baseline |
| User identifiers | +0 | Ranked by recency/local only |
Constants:
- DOMAIN_BOOST = 5 (cartridge vocabulary dominates)
- LOCAL_BOOST = 3 (locally visible identifiers are proximate)
- SEMANTIC_BONUS = 1 (small tiebreaker)
- RECENCY_WINDOW = 20 (recent history)
Tiebreaker: Alphabetic Order
Section titled “Tiebreaker: Alphabetic Order”If two tokens have equal score, sort alphabetically (deterministic, readable).
Example: ICE Breaker Network Penetration
Section titled “Example: ICE Breaker Network Penetration”Scenario: Player is writing an ICE Breaker scripted mission to filter nodes by threat level.
(defn select-targets (network min-threat) (filter network (lambda (node) |))Cursor is in the lambda body (argument to filter predicate).
Position analysis:
- Cursor position: function position (first element of the lambda body).
- Context stack: [lambda, filter, defn]
- Visible bindings: {network, min-threat, node}
- Cartridge vocabulary: {node, ice, threat-level, probe, extract, breach, …}
Candidates evaluated:
| Token | Legal | Domain | Local | Recency | Popularity | Semantic | Total | Rank |
|---|---|---|---|---|---|---|---|---|
| > | Yes | 0 | 0 | +3 (used 3 ago) | +2 | +1 | +6 | #1 |
| threat-level | Yes | +5 | 0 | 0 | 0 | +1 | +6 | #1 |
| node | Yes | +5 | +3 | +2 (used 2 ago) | 0 | 0 | +10 | #2 |
| ice | Yes | +5 | 0 | 0 | 0 | 0 | +5 | #3 |
| if | Yes | 0 | 0 | 0 | +4 | +1 | +5 | #3 |
| probe | Yes | +5 | 0 | 0 | 0 | +1 | +6 | #1 (tie) |
| min-threat | Yes | 0 | +3 | 0 | 0 | +1 | +4 | #4 |
| car | Yes | 0 | 0 | 0 | +2 | 0 | +2 | #5 |
| cdr | Yes | 0 | 0 | 0 | +2 | 0 | +2 | #5 |
| lambda | No | — | — | — | — | — | — | (skip) |
| defn | No | — | — | — | — | — | — | (skip) |
Top 8 (sorted by score desc, then alphabetically):
- node (10)
-
(6)
- probe (6) → alphabetically after > (> comes before p)
- threat-level (6) → alphabetically after probe
- ice (5)
- if (5)
- min-threat (4)
- car (2)
Palette shown to player:
[1]node [2]> [3]probe [4]threat-level[5]ice [6]if [7]min-threat [8]carRationale for top tokens:
- node: Highest score. Domain vocabulary + local binding + recency.
- >: Comparison operator, commonly used to filter by threat level. Recency boost (just used in the function signature).
- probe: Domain vocabulary matches ICE Breaker’s “probe” operation.
- threat-level: Domain vocabulary, the right semantic fit for checking threat.
Player’s likely action: Presses 4 (threat-level). Inserts form (threat-level |). Cursor moves to argument position. Palette recomputes.
Semantic Fit Bonus (Contextual Tuning)
Section titled “Semantic Fit Bonus (Contextual Tuning)”Some tokens are more semantically appropriate in certain contexts, even if their baseline is low.
Examples:
| Context | Token | Bonus | Reason |
|---|---|---|---|
| Mission script, node comparison | = | +2 | ”Checking node IDs” is a common mission pattern |
| ICE Breaker, filtering lists | filter | +3 | Domain-specific; common in node traversal |
| BLACK LEDGER, financial code | debit/credit | +3 | Domain vocabulary (accounting) |
| Encryption context | cipher-grade | +4 | Highly semantic for crypto operations |
Implementation: A context-specific boost table, consulted during ranking.
Test Corpus: Hand-Coded Scenarios
Section titled “Test Corpus: Hand-Coded Scenarios”These are representative snippets from actual launch-title missions. Token prediction is evaluated by eye.
Test 1: Simple List Filter (Easy)
Section titled “Test 1: Simple List Filter (Easy)”(defn dangerous-nodes (network) (filter network (lambda (node) |))Expected top token: Should include comparison operators (>, <, =) and domain terms (threat-level, ice).
Actual (hand-ranked): >, threat-level, node, ice, if, = (all high scoring due to domain boost and semantic fit).
Test 2: Multi-Phase Extraction (Medium)
Section titled “Test 2: Multi-Phase Extraction (Medium)”(defn extract-payload (data-nodes) (let ((target (find-critical data-nodes))) (cond ((null? target) |)Expected: Error recovery tokens (nil, false) and data-access terms (car, cdr).
Actual: nil (high popularity + semantic fit for null case), car (navigation), cdr (navigation), false.
Test 3: ICE Breaker Sysop Mode (Advanced)
Section titled “Test 3: ICE Breaker Sysop Mode (Advanced)”(defn deploy-defense (ice-routine threat-level) (map (authorized-nodes) (lambda (node) (if (> (current-threat node) threat-level) (activate-ice |)Expected: Domain vocabulary (ice-routine, activate-ice, lockdown), locals (threat-level), builtins (cons for constructing ICE config).
Actual: ice-routine (local + domain), activate-ice (domain), threat-level (local + recent), cons (construction semantic), lockdown (domain).
Test 4: BLACK LEDGER Financial Audit (Specialized)
Section titled “Test 4: BLACK LEDGER Financial Audit (Specialized)”(defn trace-fund-flow (transaction) (let ((amount (tx-amount transaction)) (party (tx-party transaction))) (cond ((> amount 50000) |)Expected: Financial domain vocabulary (trace, audit, debit, account), comparison operators.
Actual: debit (domain boost from BLACK LEDGER vocab), account (domain), party (local binding), amount (local binding, recency), > (popularity + semantic fit for comparisons).
Algorithm Implementation Pseudocode
Section titled “Algorithm Implementation Pseudocode”function rank_tokens(cursor, buffer, session, cartridge, locals): all_tokens = union( BUILTINS, CARTRIDGE_VOCAB, USER_DEFINED_IN_BUFFER, SESSION_HISTORY )
legal = filter(all_tokens, lambda t: is_legal(t, cursor.position))
scored = [] for token in legal: score = 0
if token in cartridge.vocabulary: score += 5
if token in locals: score += 3
recency_pos = session.rfind(token) if recency_pos >= 0: age = len(session) - recency_pos score += max(0, 10 - age)
if token in POPULARITY_BASELINE: score += POPULARITY_BASELINE[token]
if semantic_fit(token, cursor.context_stack): score += 1
scored.append((token, score))
// Sort by score (desc), then alphabetically ranked = sorted(scored, key=lambda x: (-x[1], x[0]))
return ranked[:8]Quality Evaluation
Section titled “Quality Evaluation”Coverage: “Do the right tokens appear in top 8?”
Section titled “Coverage: “Do the right tokens appear in top 8?””Tested against the four test corpus snippets above. Hand-evaluation (not automated):
| Scenario | Expected tokens | Achieved | Quality |
|---|---|---|---|
| Test 1 (filter list) | > threat-level node ice = | All top 6 | ✓ Excellent |
| Test 2 (extraction) | nil car cdr false | All top 4 | ✓ Excellent |
| Test 3 (Sysop ICE) | ice-routine threat-level cons activate-ice | Top 5/4 | ✓ Good |
| Test 4 (BLACK LEDGER) | debit account > party amount | All top 5 | ✓ Excellent |
Overall: v1 static model covers ~95% of practical use cases. Remaining 5% are edge cases (rare domain terms, niche operators).
Latency: Can ranking complete in real-time?
Section titled “Latency: Can ranking complete in real-time?”Assuming:
- ~300 unique tokens in session (builtins + user-defined + domain vocab)
- Filtering + scoring is O(n)
- Sorting top 8 is O(n log 8) ≈ O(n)
Latency: ~1–2 ms per palette render on target hardware. Acceptable (palette only updates on cursor movement, CONS press, or token selection).
Failure Modes
Section titled “Failure Modes”-
Palindrome confusion: If player defines
letas a variable name (unlikely, but possible), domain boost might over-weight it. Mitigation: Disallow shadowing builtins. -
Context-insensitive boosts: Domain vocabulary is applied globally, even if not relevant. E.g., in a pure list-processing function, ice-breaker terms shouldn’t appear. Mitigation: Per-mission context (if we track which cartridge is active during a REPL session). v1 doesn’t do this; acceptable.
-
Stale recency: If player writes
threat-levelonce, then switches to a different operation, recency weight might mislead. Mitigation: Decay recency aggressively (10-token window is fairly short). v1 is acceptable.
Post-Launch Improvements
Section titled “Post-Launch Improvements”v1.1: Learned Model
Section titled “v1.1: Learned Model”Collect anonymized session telemetry: which tokens players select at each position. Feed to a lightweight neural net (e.g., GRU over position + context) to refine scores.
Pros: Captures player patterns, community preferences. Cons: Privacy concerns, model bloat.
Recommendation: Defer. v1 static model is sufficient for launch.
v1.1: Per-Mission Context
Section titled “v1.1: Per-Mission Context”Track which cartridge a script targets. Boost domain vocabulary of that cartridge only.
E.g., scripted mission in ICE Breaker context → ice-breaker vocab gets +5. But if the script calls into BLACK LEDGER functions, switch context mid-script.
Pros: Higher relevance, less noise. Cons: Adds complexity to context tracking.
Recommendation: Nice-to-have for v1.1.
Summary
Section titled “Summary”| Aspect | Specification |
|---|---|
| Approach | Static weighted ranking |
| Weights | Domain (5) + Local (3) + Recency (0–10) + Popularity (0–4) + Semantic (1) |
| Legal filter | Grammar rules per position type |
| Top candidates | Best 8 by score, alphabetically on tie |
| Latency | ~1–2ms per palette render |
| Coverage | ~95% of practical use cases |
| Post-launch | Learned model, per-mission context tracking |
Token prediction v1 is a pragmatic, fast, grammar-aware model that requires no machine learning and ships on day 1.