Token Prediction
The KN-86 nEmacs editor and REPL share a token-prediction engine that ranks the most likely next ~8 tokens at the cursor and surfaces them in the predictive palette. This page is the implementer-facing reference for the algorithm, ranking weights, performance budget, memory footprint, and how cartridges influence predictions. Runtime engineers building the ranker and cartridge authors writing :vocabulary blocks read this.
../../../adr/ADR-0009-token-prediction.md— canonical ranking model (static, weighted, grammar-aware). Extended 2026-04-24 by ADR-0016 with cart-contributed grammar + vocabulary FFI primitives.../../../adr/ADR-0016-nemacs-repl-input-model.md§7 —emacs-extend-grammarandemacs-extend-vocabularycart-contribution primitives.nemacs.md(palette display,:nemacs-navmode),repl.md(prompt completion),README.md.
Algorithm: static, weighted, grammar-aware
Section titled “Algorithm: static, weighted, grammar-aware”No machine learning. Per ADR-0009, the v1 ranker is a static weighted model — fast, deterministic, debuggable. (A learned model is post-launch v1.1 territory; see “Post-launch” below.)
The model is not n-gram or trie-based despite the original stub language. It is a per-call ranking pass:
- Enumerate all candidate tokens (builtins + cartridge vocabulary + user-defined identifiers + recent session history).
- Apply a legal-form filter (hard constraint based on cursor position).
- Score each surviving candidate.
- Sort by score (descending), break ties alphabetically.
- Return the top 8.
The full pseudocode is in ADR-0009 §“Algorithm Implementation Pseudocode.”
Source corpus (where tokens come from)
Section titled “Source corpus (where tokens come from)”| Source | What’s in it | Lifetime |
|---|---|---|
| Fe builtins | The 26 primitives in primnames[] (see ../fe-lisp/builtins.md) | Permanent. |
| NoshAPI primitives | The Tier 1 / Tier 2 names from ../nosh-api/primitives-by-category.md | Permanent. |
| Cartridge vocabulary | Domain terms contributed via emacs-extend-vocabulary (ADR-0016) | Cart-load to cart-unload. |
| Cartridge grammar productions | Productions added via emacs-extend-grammar (ADR-0016) — these influence the legal-form filter, not the score directly | Cart-load to cart-unload. |
| Buffer-local identifiers | Names defined in the current buffer via (let ...) or (= ...) | Buffer lifetime. |
| Session history | The last ~20 tokens the player typed | Session (REPL or nEmacs instance). |
Cartridge vocabulary is the highest-leverage contribution — domain terms (e.g., node, ice, threat-level from ICE Breaker; debit, account from Black Ledger) get a +5 boost in scoring (see “Ranking weights” below).
Legal-form filter (hard constraint)
Section titled “Legal-form filter (hard constraint)”The filter eliminates syntactically-illegal tokens before scoring. Position types:
| Position | Legal | Illegal |
|---|---|---|
| Function position (first element of list) | Callables: function names, macros, lambdas, builtins | Pure data (literals other than quoted forms), binding keywords |
| Argument position (non-first) | Identifiers, literals, function calls, quoted forms | Definition forms (defn-style macros) |
Binding position (inside let / fn params) | Identifiers only | Functions, literals, forms |
| Root (empty buffer or top-level) | Top-level forms (defn, defstruct, defdomain, defmission) | Bare data, lambdas |
Cartridge-contributed grammar (emacs-extend-grammar per ADR-0016) extends the legal sets — e.g., a cart that introduces (scan-result) as a callable form adds it to the function-position legal list while that cart is loaded.
Ranking weights
Section titled “Ranking weights”For each legal token:
score = 0if token in cartridge_vocabulary: score += 5 ; DOMAIN_BOOSTif token in local_bindings: score += 3 ; LOCAL_BOOSTif token in session_history[-20:]: age = len(session_history) - rfind_position(token) score += max(0, 10 - age) ; recency, decays 0–10if token in popularity_baseline: score += baseline[token] ; 0–4if semantic_fit(token, context_stack): score += 1 ; SEMANTIC_BONUSConstants:
| Constant | Value | Rationale |
|---|---|---|
DOMAIN_BOOST | +5 | Cartridge vocabulary should dominate when relevant. |
LOCAL_BOOST | +3 | Locally visible bindings are proximate. |
RECENCY_WINDOW | 20 tokens | Session window for recency boost. |
RECENCY_MAX | +10 (decays linearly) | Just-typed tokens get full boost; oldest in window get +0. |
SEMANTIC_BONUS | +1 | Small tiebreaker for context-fit (e.g., = in mission node-comparison contexts). |
Popularity baseline (excerpt — full table in ADR-0009 §“Ranking Formula”):
| Token | Baseline | Why |
|---|---|---|
if, let | +4 | Most common control flow / binding |
lambda, map | +3 | Common in callbacks / higher-order |
defn, nil, +, >, = | +2 | Common in mission code |
cdr, car | +2 | List navigation |
cons, quote, − | +1 | Less frequent |
| User identifiers | +0 | Ranked by recency / local boost only |
Tiebreaker: alphabetical order (deterministic, readable).
Semantic fit (contextual)
Section titled “Semantic fit (contextual)”A small per-context bonus consulted during ranking. Examples from ADR-0009 §“Semantic Fit Bonus”:
| Context | Token | Bonus | Reason |
|---|---|---|---|
| Mission script, node comparison | = | +2 | Mission pattern |
| ICE Breaker, list filtering | filter | +3 | Domain idiom |
| Black Ledger, financial code | debit / credit | +3 | Domain vocabulary |
| Crypto context | cipher-grade | +4 | Highly semantic |
Implementation is a context-specific boost table consulted during ranking. Specific bonuses finalize as carts are authored.
UI surface
Section titled “UI surface”Where suggestions appear
Section titled “Where suggestions appear”Per ADR-0016 §4, the predictive palette renders on CIPHER-LINE Rows 2–3 while nEmacs or the REPL is active:
- Row 2 (Palette Row A): 8 candidates numbered
[1]..[8]. - Row 3 (Palette Row B): continuation indicator (
↓if more candidates available), selection feedback, or the alt slots from a cart’s multi-level binding.
The main grid stays free for the buffer view (recovered from ADR-0008’s earlier rows-22–25 allocation).
Accept / reject
Section titled “Accept / reject”- Accept: numpad keys
1–8insert the selected token. Cursor advances to the inserted node. Palette recomputes. - Scroll: beyond the top 8, the player can scroll the candidate set via context-bound keys (per ADR-0008 mock 5). Implementation lands during the editor foundation task.
- Reject / dismiss:
BACKcloses the palette without inserting.LAMBDAenters literal-entry mode for an identifier or number not in the palette.
Multi-level bindings
Section titled “Multi-level bindings”Per ADR-0016 §9, palette keys can carry :tap / :double-tap / :long-press bindings. When a slot has alt bindings, Row 24 (firmware action bar) renders them with superscript markers (² for double-tap, … for long-press). The palette adopts the same convention when context provides multi-level alternates.
Performance budget
Section titled “Performance budget”Per ADR-0009 §“Latency”:
- ~1–2 ms per palette render on the original RP2350 / Pico 2 target. The Pi Zero 2 W (1 GHz Cortex-A53, ADR-0009 hardware-retarget note) trivially exceeds this — the model is target-independent.
- The palette only updates on cursor movement,
CONSpress, or token selection (event-driven, not per-frame). - Filtering + scoring is O(n) where n is the candidate count (~300 unique tokens in a typical session: builtins + user + domain).
- Sorting top 8 is O(n log 8) ≈ O(n).
The static model satisfies the latency budget with comfortable margin. There is no caching layer needed.
Memory footprint
Section titled “Memory footprint”The ranker holds:
- Vocabulary tables. Builtins + NoshAPI: fixed strings, ~2 KB total. Cartridge contributions: variable, typically 50–200 terms × ~16 bytes ≈ 0.8–3.2 KB per cart.
- Session history: 20 tokens × ~16 bytes = ~320 bytes.
- Local bindings: scoped per buffer; rarely exceeds 50 names = ~800 bytes.
- Per-call working set: scoring array for ~300 candidates ≈ ~1.2 KB (token pointer + float score), released after ranking returns.
Per ADR-0016 §“Known Unknowns” #5, the cart grammar arena is shared with the Cipher grammar arena (8 KB) or allocated separately — confirm during the foundation task. Probably separate for isolation.
How cartridge code influences predictions
Section titled “How cartridge code influences predictions”Two FFI primitives (per ADR-0016 §7, see ../nosh-api/primitives-by-category.md):
emacs-extend-vocabulary list-of-strings
Section titled “emacs-extend-vocabulary list-of-strings”Cart contributes domain terms. Each term receives the +5 vocabulary boost in ranking. Typical use:
(emacs-extend-vocabulary (list "node" "ice" "threat-level" "probe" "extract" "breach" "lockdown" "evasion" "sysop" "cipher-grade" "exfil"))Terms are case-sensitive, stored as plain Lisp strings, and matched against the candidate token’s source-form representation. They live in the cart’s grammar arena from cart-load to cart-unload.
emacs-extend-grammar sexp
Section titled “emacs-extend-grammar sexp”Cart contributes grammar productions that influence the legal-form filter, not the score. A cart introducing (scan-result) as a callable form makes it appear at function position in the palette where it wouldn’t otherwise be legal:
(emacs-extend-grammar '(:legal-at-function-position scan-result) '(:legal-at-argument-position node))The exact syntax of grammar fragments lands during the editor foundation task; ADR-0016 §7 commits to the FFI primitive but defers the production syntax to implementation. Treat the production-syntax surface as evolving until the foundation task lands; the FFI signature is stable.
Indirect influence: domain idioms
Section titled “Indirect influence: domain idioms”The semantic-fit bonus (see “Ranking weights” above) is a per-context boost table. As carts ship, the table will grow with their domain idioms. This is implementation-side: cartridge authors don’t write to the table directly, but their domain vocabulary is what the runtime authors put there.
Quality and coverage
Section titled “Quality and coverage”Per ADR-0009 §“Quality Evaluation,” the static model covers ~95% of practical use cases as measured against four hand-coded test scenarios from launch titles (ICE Breaker filter, multi-phase extraction, Sysop ICE deployment, Black Ledger financial audit). The remaining ~5% are edge cases — rare domain terms, niche operators that didn’t make the popularity baseline.
Failure modes (and mitigations)
Section titled “Failure modes (and mitigations)”- Builtin shadowing. A user defining
letas a variable name would over-weight it via domain boost. Mitigation: disallow shadowing builtins in the parser. - Context-insensitive boosts. Domain vocabulary applies globally even when not relevant. Mitigation: v1 doesn’t track per-cartridge context in the REPL; acceptable for launch. (v1.1 would track active cartridge per session.)
- Stale recency. A token used once then abandoned still gets recency boost. Mitigation: the 20-token decay window is short; v1 is acceptable.
Post-launch
Section titled “Post-launch”v1.1: learned model
Section titled “v1.1: learned model”Collect anonymized session telemetry (which tokens players select at each position type), feed to a lightweight neural net (e.g., GRU over position + context). Privacy concerns + model bloat → deferred. v1 static model ships at launch.
v1.1: per-mission context tracking
Section titled “v1.1: per-mission context tracking”Track which cartridge a script targets; boost domain vocabulary of that cartridge only. Higher relevance, less cross-domain noise. Modest complexity. Nice-to-have for v1.1.