Token Prediction
The KN-86 nEmacs editor and REPL share a token-prediction engine that ranks the most likely next ~8 tokens at the cursor and surfaces them in the predictive palette. This page is the implementer-facing reference for the algorithm, ranking weights, performance budget, memory footprint, and how cartridges influence predictions. Runtime engineers building the ranker and cartridge authors writing :vocabulary blocks read this.
../../../adr/ADR-0009-token-prediction.md— canonical ranking model (static, weighted, grammar-aware). Extended 2026-04-24 by ADR-0016 with cart-contributed grammar + vocabulary FFI primitives.../../../adr/ADR-0016-nemacs-repl-input-model.md§7 —emacs-extend-grammarandemacs-extend-vocabularycart-contribution primitives.nemacs.md(palette display,:nemacs-navmode),repl.md(prompt completion),README.md.
Algorithm: static, weighted, grammar-aware
Section titled “Algorithm: static, weighted, grammar-aware”No machine learning. Per ADR-0009, the v1 ranker is a static weighted model — fast, deterministic, debuggable. (A learned model is post-launch v1.1 territory; see “Post-launch” below.)
The model is not n-gram or trie-based despite the original stub language. It is a per-call ranking pass:
- Enumerate all candidate tokens (builtins + cartridge vocabulary + user-defined identifiers + recent session history).
- Apply a legal-form filter (hard constraint based on cursor position).
- Score each surviving candidate.
- Sort by score (descending), break ties alphabetically.
- Return the top 8.
The full pseudocode is in ADR-0009 §“Algorithm Implementation Pseudocode.”
Source corpus (where tokens come from)
Section titled “Source corpus (where tokens come from)”| Source | What’s in it | Lifetime |
|---|---|---|
| Fe builtins | The 26 primitives in primnames[] (see KEC Lisp built-ins) | Permanent. |
| NoshAPI primitives | The Tier 1 / Tier 2 names from ../nosh-api/primitives-by-category.md | Permanent. |
| Cartridge vocabulary | Domain terms contributed via emacs-extend-vocabulary (ADR-0016) | Cart-load to cart-unload. |
| Cartridge grammar productions | Productions added via emacs-extend-grammar (ADR-0016) — these influence the legal-form filter, not the score directly | Cart-load to cart-unload. |
| Buffer-local identifiers | Names defined in the current buffer via (let ...) or (define ...) (assignment is (set ...); = is equality) | Buffer lifetime. |
| Session history | The last ~20 tokens the player typed | Session (REPL or nEmacs instance). |
Cartridge vocabulary is the highest-leverage contribution — domain terms (e.g., node, ice, threat-level from ICE Breaker; debit, account from Black Ledger) get a +5 boost in scoring (see “Ranking weights” below).
Legal-form filter (hard constraint)
Section titled “Legal-form filter (hard constraint)”The filter eliminates syntactically-illegal tokens before scoring. Position types:
| Position | Legal | Illegal |
|---|---|---|
| Function position (first element of list) | Callables: function names, macros, lambdas, builtins | Pure data (literals other than quoted forms), binding keywords |
| Argument position (non-first) | Identifiers, literals, function calls, quoted forms | Definition forms (defn-style macros) |
Binding position (inside let / fn params) | Identifiers only | Functions, literals, forms |
| Root (empty buffer or top-level) | Top-level forms (defn, defstruct, defdomain, defmission) | Bare data, lambdas |
Cartridge-contributed grammar (emacs-extend-grammar per ADR-0016) extends the legal sets — e.g., a cart that introduces (scan-result) as a callable form adds it to the function-position legal list while that cart is loaded.
Ranking weights
Section titled “Ranking weights”For each legal token:
score = 0if token in cartridge_vocabulary: score += 5 ; DOMAIN_BOOSTif token in local_bindings: score += 3 ; LOCAL_BOOSTif token in session_history[-20:]: age = len(session_history) - rfind_position(token) score += max(0, 10 - age) ; recency, decays 0–10if token in popularity_baseline: score += baseline[token] ; 0–4if semantic_fit(token, context_stack): score += 1 ; SEMANTIC_BONUSConstants:
| Constant | Value | Rationale |
|---|---|---|
DOMAIN_BOOST | +5 | Cartridge vocabulary should dominate when relevant. |
LOCAL_BOOST | +3 | Locally visible bindings are proximate. |
RECENCY_WINDOW | 20 tokens | Session window for recency boost. |
RECENCY_MAX | +10 (decays linearly) | Just-typed tokens get full boost; oldest in window get +0. |
SEMANTIC_BONUS | +1 | Small tiebreaker for context-fit (e.g., = in mission node-comparison contexts). |
Popularity baseline (excerpt — full table in ADR-0009 §“Ranking Formula”):
| Token | Baseline | Why |
|---|---|---|
if, let | +4 | Most common control flow / binding |
lambda, map | +3 | Common in callbacks / higher-order |
defn, nil, +, >, = | +2 | Common in mission code |
cdr, car | +2 | List navigation |
cons, quote, − | +1 | Less frequent |
| User identifiers | +0 | Ranked by recency / local boost only |
Tiebreaker: alphabetical order (deterministic, readable).
Semantic fit (contextual)
Section titled “Semantic fit (contextual)”A small per-context bonus consulted during ranking. Examples from ADR-0009 §“Semantic Fit Bonus”:
| Context | Token | Bonus | Reason |
|---|---|---|---|
| Mission script, node comparison | = | +2 | Mission pattern |
| ICE Breaker, list filtering | filter | +3 | Domain idiom |
| Black Ledger, financial code | debit / credit | +3 | Domain vocabulary |
| Crypto context | cipher-grade | +4 | Highly semantic |
Implementation is a context-specific boost table consulted during ranking. Specific bonuses finalize as carts are authored.
UI surface
Section titled “UI surface”Where suggestions appear
Section titled “Where suggestions appear”Per ADR-0016 §4, the predictive palette renders on CIPHER-LINE Rows 2–3 while nEmacs or the REPL is active:
- Row 2 (Palette Row A): 8 candidates numbered
[1]..[8]. - Row 3 (Palette Row B): continuation indicator (
↓if more candidates available), selection feedback, or the alt slots from a cart’s multi-level binding.
The main grid stays free for the buffer view (recovered from ADR-0008’s earlier rows-22–25 allocation).
Accept / reject
Section titled “Accept / reject”- Accept: numpad keys
1–8insert the selected token. Cursor advances to the inserted node. Palette recomputes. - Scroll: beyond the top 8, the player can scroll the candidate set via context-bound keys (per ADR-0008 mock 5). Implementation lands during the editor foundation task.
- Reject / dismiss:
BACKcloses the palette without inserting.LAMBDAenters literal-entry mode for an identifier or number not in the palette.
Multi-level bindings
Section titled “Multi-level bindings”Per ADR-0016 §9, palette keys can carry :tap / :double-tap / :long-press bindings. When a slot has alt bindings, Row 24 (firmware action bar) renders them with superscript markers (² for double-tap, … for long-press). The palette adopts the same convention when context provides multi-level alternates.
Performance budget
Section titled “Performance budget”Per ADR-0009 §“Latency”:
- ~1–2 ms per palette render on the original RP2350 / Pico 2 target. The Pi Zero 2 W (1 GHz Cortex-A53, ADR-0009 hardware-retarget note) trivially exceeds this — the model is target-independent.
- The palette only updates on cursor movement,
CONSpress, or token selection (event-driven, not per-frame). - Filtering + scoring is O(n) where n is the candidate count (~300 unique tokens in a typical session: builtins + user + domain).
- Sorting top 8 is O(n log 8) ≈ O(n).
The static model satisfies the latency budget with comfortable margin. There is no caching layer needed.
Memory footprint
Section titled “Memory footprint”The ranker holds:
- Vocabulary tables. Builtins + NoshAPI: fixed strings, ~2 KB total. Cartridge contributions: variable, typically 50–200 terms × ~16 bytes ≈ 0.8–3.2 KB per cart.
- Session history: 20 tokens × ~16 bytes = ~320 bytes.
- Local bindings: scoped per buffer; rarely exceeds 50 names = ~800 bytes.
- Per-call working set: scoring array for ~300 candidates ≈ ~1.2 KB (token pointer + float score), released after ranking returns.
Per ADR-0016 §“Known Unknowns” #5, the cart grammar arena is shared with the Cipher grammar arena (8 KB) or allocated separately — confirm during the foundation task. Probably separate for isolation.
How cartridge code influences predictions
Section titled “How cartridge code influences predictions”Two FFI primitives (per ADR-0016 §7, see ../nosh-api/primitives-by-category.md):
emacs-extend-vocabulary list-of-strings
Section titled “emacs-extend-vocabulary list-of-strings”Cart contributes domain terms. Each term receives the +5 vocabulary boost in ranking. Typical use:
(emacs-extend-vocabulary (list "node" "ice" "threat-level" "probe" "extract" "breach" "lockdown" "evasion" "sysop" "cipher-grade" "exfil"))Terms are case-sensitive, stored as plain Lisp strings, and matched against the candidate token’s source-form representation. They live in the cart’s grammar arena from cart-load to cart-unload.
emacs-extend-grammar sexp
Section titled “emacs-extend-grammar sexp”Cart contributes grammar productions that influence the legal-form filter, not the score. A cart introducing (scan-result) as a callable form makes it appear at function position in the palette where it wouldn’t otherwise be legal:
(emacs-extend-grammar '(:legal-at-function-position scan-result) '(:legal-at-argument-position node))The exact syntax of grammar fragments lands during the editor foundation task; ADR-0016 §7 commits to the FFI primitive but defers the production syntax to implementation. Treat the production-syntax surface as evolving until the foundation task lands; the FFI signature is stable.
Indirect influence: domain idioms
Section titled “Indirect influence: domain idioms”The semantic-fit bonus (see “Ranking weights” above) is a per-context boost table. As carts ship, the table will grow with their domain idioms. This is implementation-side: cartridge authors don’t write to the table directly, but their domain vocabulary is what the runtime authors put there.
Quality and coverage
Section titled “Quality and coverage”Per ADR-0009 §“Quality Evaluation,” the static model covers ~95% of practical use cases as measured against four hand-coded test scenarios from launch titles (ICE Breaker filter, multi-phase extraction, Sysop ICE deployment, Black Ledger financial audit). The remaining ~5% are edge cases — rare domain terms, niche operators that didn’t make the popularity baseline.
Failure modes (and mitigations)
Section titled “Failure modes (and mitigations)”- Builtin shadowing. A user defining
letas a variable name would over-weight it via domain boost. Mitigation: disallow shadowing builtins in the parser. - Context-insensitive boosts. Domain vocabulary applies globally even when not relevant. Mitigation: v1 doesn’t track per-cartridge context in the REPL; acceptable for launch. (v1.1 would track active cartridge per session.)
- Stale recency. A token used once then abandoned still gets recency boost. Mitigation: the 20-token decay window is short; v1 is acceptable.
Debugging (GWP-265)
Section titled “Debugging (GWP-265)”When the T9 palette produces unexpected results — a token that should rank top doesn’t, or one with a high domain-boost is missing — the F11 dev overlay T9 PALETTE section is the primary inspection surface.
Opening the panel
Section titled “Opening the panel”Press F11 in the desktop emulator to toggle the dev overlay. The T9 PALETTE (top 8: token v/l/r/p) section shows the current ranked candidates and per-component score breakdown:
T9 PALETTE (top 8: token v/l/r/p) threat-level s=5 v=5 l=0 r=0 p=0 node s=3 l=3 v=0 r=0 p=0 operator s=10 v=0 l=0 r=10 p=0 car s=2 v=0 l=0 r=0 p=2 ...Column meanings
Section titled “Column meanings”| Column | Field | Meaning |
|---|---|---|
s= | score | Total score (sum of components, clamped to int8). |
v= | vocab | +T9_DOMAIN_BOOST (+5) if this token appears in the cart vocabulary list. Zero otherwise. |
l= | local | +T9_LOCAL_BOOST (+3) if this token appears in the local-binding scope at the cursor. Zero otherwise. |
r= | recency | 0..T9_RECENCY_MAX_BOOST (+0..+10) based on how recently the operator typed this token. Youngest gets +10; decays linearly; tokens outside the 20-slot window get 0. |
p= | popularity | Static baseline from the builtin popularity table (see “Ranking weights” above). User-defined identifiers and cart-only tokens have baseline 0. |
The underlying entry point is t9_rank_explained() in kn86-emulator/src/t9_rank.h. It is a pure read — it never mutates the recency ring, popularity counters, or any other ranker state. Calling it twice in a row yields identical output.
Reading the panel
Section titled “Reading the panel”- All
v=0rows despite a loaded cart? The cart’semacs-extend-vocabularycall may have failed (arena overflow) or the vocabulary list doesn’t match the token surface forms in the buffer. Check the VOCAB section forcount=0. r=0across the board? The session recency ring is empty — the operator hasn’t committed any tokens yet. The RECENCY section shows the ring contents.- Token absent from the panel entirely? It was either filtered out by the legal-form rule (check the CURSOR section’s
position=label) or the candidate pool exceeded 256 entries and it lost the buffer-cap coin-toss. The latter is rare in practice.
Post-launch
Section titled “Post-launch”v1.1: learned model
Section titled “v1.1: learned model”Collect anonymized session telemetry (which tokens players select at each position type), feed to a lightweight neural net (e.g., GRU over position + context). Privacy concerns + model bloat → deferred. v1 static model ships at launch.
v1.1: per-mission context tracking
Section titled “v1.1: per-mission context tracking”Track which cartridge a script targets; boost domain vocabulary of that cartridge only. Higher relevance, less cross-domain noise. Modest complexity. Nice-to-have for v1.1.