Token Prediction

The KN-86 nEmacs editor and REPL share a token-prediction engine that ranks the most likely next ~8 tokens at the cursor and surfaces them in the predictive palette. This page is the implementer-facing reference for the algorithm, ranking weights, performance budget, memory footprint, and how cartridges influence predictions. Runtime engineers building the ranker and cartridge authors writing :vocabulary blocks read this.

../../../adr/ADR-0009-token-prediction.md — canonical ranking model (static, weighted, grammar-aware). Extended 2026-04-24 by ADR-0016 with cart-contributed grammar + vocabulary FFI primitives.
../../../adr/ADR-0016-nemacs-repl-input-model.md §7 — emacs-extend-grammar and emacs-extend-vocabulary cart-contribution primitives.
nemacs.md (palette display, :nemacs-nav mode), repl.md (prompt completion), README.md.

Algorithm: static, weighted, grammar-aware

No machine learning. Per ADR-0009, the v1 ranker is a static weighted model — fast, deterministic, debuggable. (A learned model is post-launch v1.1 territory; see “Post-launch” below.)

The model is not n-gram or trie-based despite the original stub language. It is a per-call ranking pass:

Enumerate all candidate tokens (builtins + cartridge vocabulary + user-defined identifiers + recent session history).
Apply a legal-form filter (hard constraint based on cursor position).
Score each surviving candidate.
Sort by score (descending), break ties alphabetically.
Return the top 8.

The full pseudocode is in ADR-0009 §“Algorithm Implementation Pseudocode.”

Source corpus (where tokens come from)

Source	What’s in it	Lifetime
Fe builtins	The 26 primitives in `primnames[]` (see KEC Lisp built-ins)	Permanent.
NoshAPI primitives	The Tier 1 / Tier 2 names from `../nosh-api/primitives-by-category.md`	Permanent.
Cartridge vocabulary	Domain terms contributed via `emacs-extend-vocabulary` (ADR-0016)	Cart-load to cart-unload.
Cartridge grammar productions	Productions added via `emacs-extend-grammar` (ADR-0016) — these influence the legal-form filter, not the score directly	Cart-load to cart-unload.
Buffer-local identifiers	Names defined in the current buffer via `(let ...)` or `(define ...)` (assignment is `(set ...)`; `=` is equality)	Buffer lifetime.
Session history	The last ~20 tokens the player typed	Session (REPL or nEmacs instance).

Cartridge vocabulary is the highest-leverage contribution — domain terms (e.g., node, ice, threat-level from ICE Breaker; debit, account from Black Ledger) get a +5 boost in scoring (see “Ranking weights” below).

Legal-form filter (hard constraint)

The filter eliminates syntactically-illegal tokens before scoring. Position types:

Position	Legal	Illegal
Function position (first element of list)	Callables: function names, macros, lambdas, builtins	Pure data (literals other than quoted forms), binding keywords
Argument position (non-first)	Identifiers, literals, function calls, quoted forms	Definition forms (`defn`-style macros)
Binding position (inside `let` / `fn` params)	Identifiers only	Functions, literals, forms
Root (empty buffer or top-level)	Top-level forms (`defn`, `defstruct`, `defdomain`, `defmission`)	Bare data, lambdas

Cartridge-contributed grammar (emacs-extend-grammar per ADR-0016) extends the legal sets — e.g., a cart that introduces (scan-result) as a callable form adds it to the function-position legal list while that cart is loaded.

Ranking weights

For each legal token:

score = 0
if token in cartridge_vocabulary:        score += 5    ; DOMAIN_BOOST
if token in local_bindings:              score += 3    ; LOCAL_BOOST
if token in session_history[-20:]:
    age = len(session_history) - rfind_position(token)
    score += max(0, 10 - age)                          ; recency, decays 0–10
if token in popularity_baseline:         score += baseline[token]   ; 0–4
if semantic_fit(token, context_stack):   score += 1    ; SEMANTIC_BONUS

Constants:

Constant	Value	Rationale
`DOMAIN_BOOST`	+5	Cartridge vocabulary should dominate when relevant.
`LOCAL_BOOST`	+3	Locally visible bindings are proximate.
`RECENCY_WINDOW`	20 tokens	Session window for recency boost.
`RECENCY_MAX`	+10 (decays linearly)	Just-typed tokens get full boost; oldest in window get +0.
`SEMANTIC_BONUS`	+1	Small tiebreaker for context-fit (e.g., `=` in mission node-comparison contexts).

Popularity baseline (excerpt — full table in ADR-0009 §“Ranking Formula”):

Token	Baseline	Why
`if`, `let`	+4	Most common control flow / binding
`lambda`, `map`	+3	Common in callbacks / higher-order
`defn`, `nil`, `+`, `>`, `=`	+2	Common in mission code
`cdr`, `car`	+2	List navigation
`cons`, `quote`, `−`	+1	Less frequent
User identifiers	+0	Ranked by recency / local boost only

Tiebreaker: alphabetical order (deterministic, readable).

Semantic fit (contextual)

A small per-context bonus consulted during ranking. Examples from ADR-0009 §“Semantic Fit Bonus”:

Context	Token	Bonus	Reason
Mission script, node comparison	`=`	+2	Mission pattern
ICE Breaker, list filtering	`filter`	+3	Domain idiom
Black Ledger, financial code	`debit` / `credit`	+3	Domain vocabulary
Crypto context	`cipher-grade`	+4	Highly semantic

Implementation is a context-specific boost table consulted during ranking. Specific bonuses finalize as carts are authored.

UI surface

Where suggestions appear

Per ADR-0016 §4, the predictive palette renders on CIPHER-LINE Rows 2–3 while nEmacs or the REPL is active:

Row 2 (Palette Row A): 8 candidates numbered [1]..[8].
Row 3 (Palette Row B): continuation indicator (↓ if more candidates available), selection feedback, or the alt slots from a cart’s multi-level binding.

The main grid stays free for the buffer view (recovered from ADR-0008’s earlier rows-22–25 allocation).

Accept / reject

Accept: numpad keys 1–8 insert the selected token. Cursor advances to the inserted node. Palette recomputes.
Scroll: beyond the top 8, the player can scroll the candidate set via context-bound keys (per ADR-0008 mock 5). Implementation lands during the editor foundation task.
Reject / dismiss: BACK closes the palette without inserting. LAMBDA enters literal-entry mode for an identifier or number not in the palette.

Multi-level bindings

Per ADR-0016 §9, palette keys can carry :tap / :double-tap / :long-press bindings. When a slot has alt bindings, Row 24 (firmware action bar) renders them with superscript markers (² for double-tap, … for long-press). The palette adopts the same convention when context provides multi-level alternates.

Performance budget

Per ADR-0009 §“Latency”:

~1–2 ms per palette render on the original RP2350 / Pico 2 target. The Pi Zero 2 W (1 GHz Cortex-A53, ADR-0009 hardware-retarget note) trivially exceeds this — the model is target-independent.
The palette only updates on cursor movement, CONS press, or token selection (event-driven, not per-frame).
Filtering + scoring is O(n) where n is the candidate count (~300 unique tokens in a typical session: builtins + user + domain).
Sorting top 8 is O(n log 8) ≈ O(n).

The static model satisfies the latency budget with comfortable margin. There is no caching layer needed.

Memory footprint

The ranker holds:

Vocabulary tables. Builtins + NoshAPI: fixed strings, ~2 KB total. Cartridge contributions: variable, typically 50–200 terms × ~16 bytes ≈ 0.8–3.2 KB per cart.
Session history: 20 tokens × ~16 bytes = ~320 bytes.
Local bindings: scoped per buffer; rarely exceeds 50 names = ~800 bytes.
Per-call working set: scoring array for ~300 candidates ≈ ~1.2 KB (token pointer + float score), released after ranking returns.

Per ADR-0016 §“Known Unknowns” #5, the cart grammar arena is shared with the Cipher grammar arena (8 KB) or allocated separately — confirm during the foundation task. Probably separate for isolation.

How cartridge code influences predictions

Two FFI primitives (per ADR-0016 §7, see ../nosh-api/primitives-by-category.md):

`emacs-extend-vocabulary list-of-strings`

Cart contributes domain terms. Each term receives the +5 vocabulary boost in ranking. Typical use:

(emacs-extend-vocabulary
  (list "node" "ice" "threat-level" "probe" "extract" "breach"
        "lockdown" "evasion" "sysop" "cipher-grade" "exfil"))

Terms are case-sensitive, stored as plain Lisp strings, and matched against the candidate token’s source-form representation. They live in the cart’s grammar arena from cart-load to cart-unload.

`emacs-extend-grammar sexp`

Cart contributes grammar productions that influence the legal-form filter, not the score. A cart introducing (scan-result) as a callable form makes it appear at function position in the palette where it wouldn’t otherwise be legal:

(emacs-extend-grammar
  '(:legal-at-function-position scan-result)
  '(:legal-at-argument-position node))

The exact syntax of grammar fragments lands during the editor foundation task; ADR-0016 §7 commits to the FFI primitive but defers the production syntax to implementation. Treat the production-syntax surface as evolving until the foundation task lands; the FFI signature is stable.

Indirect influence: domain idioms

The semantic-fit bonus (see “Ranking weights” above) is a per-context boost table. As carts ship, the table will grow with their domain idioms. This is implementation-side: cartridge authors don’t write to the table directly, but their domain vocabulary is what the runtime authors put there.

Quality and coverage

Per ADR-0009 §“Quality Evaluation,” the static model covers ~95% of practical use cases as measured against four hand-coded test scenarios from launch titles (ICE Breaker filter, multi-phase extraction, Sysop ICE deployment, Black Ledger financial audit). The remaining ~5% are edge cases — rare domain terms, niche operators that didn’t make the popularity baseline.

Failure modes (and mitigations)

Builtin shadowing. A user defining let as a variable name would over-weight it via domain boost. Mitigation: disallow shadowing builtins in the parser.
Context-insensitive boosts. Domain vocabulary applies globally even when not relevant. Mitigation: v1 doesn’t track per-cartridge context in the REPL; acceptable for launch. (v1.1 would track active cartridge per session.)
Stale recency. A token used once then abandoned still gets recency boost. Mitigation: the 20-token decay window is short; v1 is acceptable.

Debugging (GWP-265)

When the T9 palette produces unexpected results — a token that should rank top doesn’t, or one with a high domain-boost is missing — the F11 dev overlay T9 PALETTE section is the primary inspection surface.

Opening the panel

Press F11 in the desktop emulator to toggle the dev overlay. The T9 PALETTE (top 8: token v/l/r/p) section shows the current ranked candidates and per-component score breakdown:

T9 PALETTE (top 8: token v/l/r/p)
  threat-level     s=5  v=5 l=0 r=0 p=0
  node             s=3  l=3 v=0 r=0 p=0
  operator         s=10 v=0 l=0 r=10 p=0
  car              s=2  v=0 l=0 r=0  p=2
  ...

Column meanings

Column	Field	Meaning
`s=`	`score`	Total score (sum of components, clamped to int8).
`v=`	`vocab`	`+T9_DOMAIN_BOOST` (+5) if this token appears in the cart vocabulary list. Zero otherwise.
`l=`	`local`	`+T9_LOCAL_BOOST` (+3) if this token appears in the local-binding scope at the cursor. Zero otherwise.
`r=`	`recency`	`0..T9_RECENCY_MAX_BOOST` (+0..+10) based on how recently the operator typed this token. Youngest gets +10; decays linearly; tokens outside the 20-slot window get 0.
`p=`	`popularity`	Static baseline from the builtin popularity table (see “Ranking weights” above). User-defined identifiers and cart-only tokens have baseline 0.

The underlying entry point is t9_rank_explained() in kn86-emulator/src/t9_rank.h. It is a pure read — it never mutates the recency ring, popularity counters, or any other ranker state. Calling it twice in a row yields identical output.

Reading the panel

All v=0 rows despite a loaded cart? The cart’s emacs-extend-vocabulary call may have failed (arena overflow) or the vocabulary list doesn’t match the token surface forms in the buffer. Check the VOCAB section for count=0.
r=0 across the board? The session recency ring is empty — the operator hasn’t committed any tokens yet. The RECENCY section shows the ring contents.
Token absent from the panel entirely? It was either filtered out by the legal-form rule (check the CURSOR section’s position= label) or the candidate pool exceeded 256 entries and it lost the buffer-cap coin-toss. The latter is rare in practice.

Post-launch

v1.1: learned model

Collect anonymized session telemetry (which tokens players select at each position type), feed to a lightweight neural net (e.g., GRU over position + context). Privacy concerns + model bloat → deferred. v1 static model ships at launch.

v1.1: per-mission context tracking

Track which cartridge a script targets; boost domain vocabulary of that cartridge only. Higher relevance, less cross-domain noise. Modest complexity. Nice-to-have for v1.1.