PCM Voice Bark
Parent Documents:
docs/adr/ADR-0017-realtime-io-coprocessor.md— Pico 2 coprocessor; Pico owns YM2149 synthesis with I2S out to MAX98357Adocs/adr/ADR-0019-cartridge-storage-and-form-factor.md— Cartridges are full-size SD cards via USB mass storage; no on-cart flash regionsdocs/adr/ADR-0015-cipher-line-auxiliary-display.md— CIPHER-LINE auxiliary OLED; CIPHER voice is text-primarydocs/software/api-reference/grammars/coprocessor-protocol.md— Pi to Pico UART wire-format spec (canonical frame types post-ADR-0019)
Cross-references: see CLAUDE.md Canonical Hardware Specification for audio hardware values (MAX98357A, YM2149 register count, speaker spec). Do not restate those values here.
1. Motivation
Section titled “1. Motivation”The Cipher voice is the KN-86’s narrative engine. It currently communicates exclusively through text rendered to the amber display — procedurally constructed sentences from domain word tables, appearing at mission transitions, debriefs, and critical state changes. The PSG provides tonal punctuation (alert stings, confirmation tones) but no vocal content.
This addendum proposes adding short PCM voice barks to the Cipher voice system. These are pre-recorded speech samples — single words or short phrases, stored as files on the cartridge SD card — played through the MAX98357A DAC/amplifier and mixed Pico-side alongside the YM2149 PSG output. The reference point is NES-era digitized speech: Double Dribble’s “DOUBLE DRIBBLE!”, Blades of Steel’s “FIGHT!”, Mike Tyson’s Punch-Out’s “BODY BLOW!”. But with a real DAC and 16-bit headroom, the result can be cleaner than those references while staying in the spirit: short, punched, unmistakable.
Why this matters: The Cipher voice is described as a “competent colleague” — terse, clipped, authoritative. Text on amber screen already sells this. But a barked word at a critical moment — “BREACH.” when ICE catches you, “CLEAN.” on a perfect extraction, “TRACED.” when Black Ledger finds the fraud — bridges the gap between reading a terminal message and feeling like someone is actually there. It’s the difference between seeing > CONTACT on screen and hearing the word punched through a 28mm speaker at the same time.
Design constraint: Barks supplement the text voice. They never replace it. The Cipher voice remains primarily textual. Barks fire at high-impact moments only — a few per session, not every screen transition. Overuse kills the effect.
2. Technical Approach
Section titled “2. Technical Approach”2A. Post-ADR-0017 Audio Architecture Summary
Section titled “2A. Post-ADR-0017 Audio Architecture Summary”ADR-0017 moved the entire audio stack off the Pi Zero 2 W and onto the Raspberry Pi Pico 2 (RP2350) coprocessor. The Pico owns:
- YM2149 PSG synthesis —
kn86_psg_sample()runs on Pico core 1, producingint16_tsamples at 44,100 Hz. - I2S output to MAX98357A — a PIO state machine and chained DMA pair (
audio_i2s.c) clocks 16-bit stereo frames (L=R, mono-replicated) into the MAX98357A at 44.1 kHz. The MAX98357A is a genuine DAC and Class-D amplifier; it accepts PCM directly. - UART command link from Pi — the Pi sends
PSG_REG_WRITE,PSG_BULK_WRITE, andPSG_RESETframes over 1 Mbps UART (seecoprocessor-protocol.md); the Pico applies them before the next sample.
The YM2149-as-DAC trick described in the v0.1 spec (commandeering Channel C’s amplitude register to feed 4-bit samples at 8 kHz) was a workaround for Pi-side synthesis where the PSG emulator had to double as an output path. That constraint no longer exists. The MAX98357A accepts any PCM value the Pico puts in the I2S frame. PCM bark playback is a mixer addition, not a register-write hack.
2B. Playback Mechanism (post-ADR-0017)
Section titled “2B. Playback Mechanism (post-ADR-0017)”Bark playback happens Pico-side, in the synthesis loop, by mixing a signed 16-bit PCM channel with the YM2149 PSG output before the combined sample is written into the I2S DMA buffer.
The synthesis loop in audio_i2s.c currently reads:
void kn86_audio_i2s_core1_synth_loop(void) { while (true) { uint32_t half = multicore_fifo_pop_blocking(); audio_frame_t *base = ...; for (uint32_t i = 0; i < HALF_FRAMES; i++) { uint16_t u = (uint16_t)kn86_psg_sample(g_psg); base[i] = ((uint32_t)u << 16) | (uint32_t)u; } }}When bark playback is active, the loop mixes a PCM sample alongside the PSG output before packing the I2S frame:
int16_t psg = kn86_psg_sample(g_psg);int16_t bark = kn86_bark_next_sample(g_bark); /* returns 0 when idle */int32_t mixed = (int32_t)psg + (int32_t)bark;if (mixed > 32767) mixed = 32767;if (mixed < -32768) mixed = -32768;uint16_t u = (uint16_t)(int16_t)mixed;base[i] = ((uint32_t)u << 16) | (uint32_t)u;kn86_bark_next_sample() advances the bark read pointer and returns the next signed 16-bit sample, or zero when no bark is active. The three YM2149 channels (A, B, C) are completely unaffected — all three remain available throughout bark playback.
Why this is correct: The I2S DMA buffer holds 32-bit stereo frames and is fed by the Pico’s synthesis core continuously. The MAX98357A treats every frame as a direct PCM value. Mixing at the synthesis stage before the DMA buffer write is the natural and correct insertion point — zero protocol overhead, zero jitter, and the mix happens at the same 44.1 kHz rate as the PSG output.
Pi involvement: The Pi triggers a bark by sending a new UART command (see section 2E). The Pico’s UART handler (core 0) writes the bark parameters into a shared structure; core 1 reads them on the next synthesis iteration. The kn86_psg_state_t volatile discipline already in place covers this access pattern — bark state follows the same convention.
2C. Sample Format
Section titled “2C. Sample Format”| Property | Value | Rationale |
|---|---|---|
| Bit depth | 16-bit signed PCM | Matches the I2S frame width and the PSG output type directly. The 4-bit constraint was a YM2149 artifact. |
| Sample rate | 22,050 Hz (preferred) or 44,100 Hz | 22 kHz is adequate for intelligible speech (10 kHz bandwidth), halves storage vs 44 kHz, and is an exact integer divisor of the 44.1 kHz synthesis rate. The Pico synthesis loop upsamples by holding each sample for two frames. |
| Channels | Mono | Single speaker. |
| Compression | None (raw 16-bit PCM) | SD has ample headroom; compression adds decoder complexity for no meaningful gain at bark durations. |
| Max duration | 1.0 second per bark | Design constraint, not technical limit. |
| Storage per second at 22 kHz | ~44,100 bytes | 22,050 samples x 2 bytes/sample. |
| Storage per second at 44 kHz | ~88,200 bytes | For barks that benefit from the full sample rate. |
On the emulator: The desktop emulator’s sound.c SDL audio callback mixes PCM samples into its output buffer at the same insertion point — before writing the SDL audio frame. The 22 kHz upsampling step is identical: hold each bark sample for two 44.1 kHz SDL frames. Emulator and device playback are bit-equivalent.
2D. Cartridge SD Filesystem Layout (post-ADR-0019)
Section titled “2D. Cartridge SD Filesystem Layout (post-ADR-0019)”ADR-0019 replaced the flash-region storage model (ADR-0013 MBC5 SRAM, on-cart flash) with a full-size SD card mounted on the Pi as USB mass storage. There is no on-cart flash region, no bark-table packed into a binary header at a known offset. Bark samples are ordinary files on the cartridge’s SD filesystem.
Directory layout:
<cart_root>/ cart.kn86 -- the .kn86 container (Lisp source + static data + metadata) audio/ barks/ breach.pcm -- raw 16-bit signed mono PCM, 22050 Hz, little-endian clean.pcm burned.pcm traced.pcm barks.toml -- bark index (label -> filename, sample_rate, volume_scale) save/ save.sav -- per-cartridge save state (per ADR-0019 section 6)barks.toml is a small plaintext index:
[[bark]]label = "BREACH"file = "audio/barks/breach.pcm"rate = 22050vol = 1.0
[[bark]]label = "CLEAN"file = "audio/barks/clean.pcm"rate = 22050vol = 1.0The nOSh runtime reads barks.toml at cartridge load time and builds an in-memory bark table (labels to opened file descriptors or preloaded buffers). File I/O is standard open() / read() on the Pi side against the USB-MSC mounted SD.
Max bark count: 64 per cartridge (up from the v0.1 constraint of 16). The previous 16-slot limit was sized against the fixed 260-byte BarkTableHeader structure embedded in on-cart flash. With SD storage there is no fixed-size index, and 64 labels fits in a barks.toml under 2 KB. In practice, the design constraint (a few barks per session, no overuse) means most cartridges will carry 6-12 barks.
Total bark footprint: at 22 kHz / 16-bit / 1.0 sec max, each bark is ~43 KB. 64 barks at max duration = ~2.75 MB — trivial on any SD card. No bark budget pressure.
No .kn86 container change: the .kn86 container (ADR-0006) does not need a bark_offset / bark_size field. Bark samples are sidecar files on the SD filesystem, not embedded in the container. The cart.kn86 file itself is unchanged.
2E. Wire Protocol Surface (Pi to Pico)
Section titled “2E. Wire Protocol Surface (Pi to Pico)”The Pi loads and decodes bark metadata; the Pico synthesizes. When a bark triggers, the Pi must send the PCM data to the Pico for mixing into the I2S stream.
Proposed additions to coprocessor-protocol.md (to be added when implementation begins — flagged here, not fully specified):
| Frame type (proposed) | Direction | Semantics |
|---|---|---|
PCM_BUFFER_LOAD (type TBD, 0x40-0xDF reserved range) | Pi to Pico, fire-and-forget | Sends a chunk of 16-bit PCM payload. Multiple frames needed for a full bark; framed at 1 KB payload (matching MAX_FRAME_LEN). |
PCM_PLAYBACK_START (type TBD) | Pi to Pico, fire-and-forget | Signals the Pico to begin mixing the pre-loaded PCM buffer. Payload: total sample count, sample rate, volume scale. |
PCM_PLAYBACK_STOP (type TBD) | Pi to Pico, fire-and-forget | Immediately stops bark mixing, returns to PSG-only output. |
Alternative: full Pico-side file read. If the Pico gains shared-memory access to the SD (not currently planned), it could read bark files directly. The Pi-to-Pico-stream model is the practical path under the current architecture. End-to-end latency from trigger to first sample must fit within the <30 ms PSG-write-to-audible-tone budget established in coprocessor-protocol.md section 7.
Do not implement these frames in this PR. They are flagged here so the protocol spec author has the design intent when implementation begins.
2F. Implementation Gate
Section titled “2F. Implementation Gate”GWP-171 (the parent task) is explicitly deferred until all three gate conditions clear (per the Sprint 4 design pack at docs/plans/sprints/2026-04-27-sprint4-gwp-171-design.md):
- ADR-0017 audio path measured on prototype. Stage 1c bring-up has captured real Pico to I2S to MAX98357A latency, jitter, and power figures.
- CIPHER-LINE OLED voice playtested. The text-voice surface (ADR-0015) is on the device and Josh has called its narrative weight sufficient or insufficient on its own.
- This spec is current. The rewrite in this document (GWP-171a) satisfies gate condition 3. Gates 1 and 2 remain open.
3. Software Interface
Section titled “3. Software Interface”3A. Stdlib Addition (nosh_stdlib.h)
Section titled “3A. Stdlib Addition (nosh_stdlib.h)”The callable interface is unchanged from v0.1 — cartridge authors still call stdlib_bark_play(g_state, "CLEAN"). The implementation behind the call changes: instead of writing a nibble into a PSG register, it triggers the Pi-side bark dispatch to the Pico.
/* ---- Voice Bark Playback ---- */
/* Play a bark by label. Returns false if label not found or no bark table. */bool stdlib_bark_play(SystemState *state, const char *label);
/* Play a bark by index (0 to bark_count-1). Returns false if index out of range. */bool stdlib_bark_play_index(SystemState *state, uint8_t index);
/* Stop any currently playing bark immediately. */void stdlib_bark_stop(SystemState *state);
/* Is a bark currently playing? */bool stdlib_bark_active(SystemState *state);3B. Runtime State Addition (types.h)
Section titled “3B. Runtime State Addition (types.h)”/* PCM bark playback state (inside RuntimeState) */typedef struct { char label[32]; /* Currently playing bark label, for logging */ uint32_t total_samples; /* Total sample count for current bark */ uint32_t samples_sent; /* Samples sent to Pico so far */ uint16_t sample_rate; /* Bark sample rate in Hz (22050 or 44100) */ float volume_scale; /* 0.0 to 1.0 */ bool active; /* true = bark is playing or being streamed */} BarkPlayback;3C. Cartridge Authoring
Section titled “3C. Cartridge Authoring”; In cartridge Lisp source, declare bark triggers:(on-event :node-compromised (fn (node) (bark-play "BREACH") ; Text still displays simultaneously: (text-puts 0 12 "> NETWORK COMPROMISED. NO TRACE.")))The bark-play NoshAPI primitive maps to stdlib_bark_play(). Cartridge authors never touch the UART protocol or file I/O. The bark files live on the SD card as sidecar assets alongside the cart.kn86 container.
4. Content Design
Section titled “4. Content Design”4A. Bark Vocabulary per Module
Section titled “4A. Bark Vocabulary per Module”Each cartridge gets a domain-specific bark palette. These should be single words or two-word phrases, recorded with an authoritative, clipped delivery — like a military radio operator or an air traffic controller. Not conversational. Not friendly. Functional.
| Module | Proposed Barks | Trigger Context |
|---|---|---|
| ICE Breaker | BREACH, CLEAN, BURNED, LOCKED, OPEN, TRACE, EXIT | ICE detection, extraction success/fail, node state changes |
| Depthcharge | CONTACT, DEPTH, SURFACE, LAUNCH, HIT, MISS, CLEAR | Sonar events, depth charge outcomes |
| Black Ledger | FRAUD, TRACED, VOID, FLAGGED, CLEAN, AUDIT | Transaction analysis results, audit outcomes |
| NeonGrid | PATROL, CLEAR, BLOCKED, ROUTE, BREACH | Guard detection, path validation |
| Cipher Garden | DECRYPT, LOCKED, KEY, MATCH, FAIL | Cipher operations, key verification |
| nOSh (firmware) | READY, LINK, SWAP, COMPLETE | Boot, cartridge swap, mission completion |
4B. Recording Guidelines
Section titled “4B. Recording Guidelines”- Source: Record at 44.1 kHz / 16-bit WAV.
- Target: Downsample to 22 kHz / 16-bit via build tool before packing. The 4-bit encode step is removed — the MAX98357A takes 16-bit directly, so there is no forced quantization to 16 amplitude levels.
- Delivery: Short, barked, declarative. Hard consonants (B, D, K, T, CH) cut through a 28mm speaker clearly; sibilants (S, SH, F, TH) are less punched. The 22 kHz path is far more forgiving than the old 4-bit/8 kHz pipeline — the lo-fi aesthetic is a deliberate choice, not a technical floor.
- Processing: Compression / limiting to control dynamic range, high-pass filter at 200 Hz to remove room rumble. At 16-bit there is real dynamic headroom — choose the aesthetic deliberately rather than fighting quantization noise.
- Duration target: 0.3-0.7 seconds per bark. Anything over 1.0 second should be cut. The bark should feel like a punch, not a sentence.
- Tone: Not robotic. Not text-to-speech. A real human voice. The 28mm speaker through the Pelican case will impose its own character.
4C. Interaction with Existing Audio
Section titled “4C. Interaction with Existing Audio”All three YM2149 channels (A, B, C) remain fully available during bark playback. The design rules:
- Barks mix with, not over, PSG output. The mixed signal is the arithmetic sum of PSG output and bark PCM, clamped to the 16-bit signed range. No PSG channel is silenced, hijacked, or interrupted.
- Bark volume is scaled before mix. The
volfield inbarks.tomllets cartridge authors balance bark loudness against the PSG background. - One bark at a time. Triggering a new bark while one is playing stops the current bark and starts the new one immediately.
- No barks during LAMBDA playback. Macro replay should be silent — barks during fast replay would be cacophonous.
- SYS hold abort stops all barks. The emergency exit silences everything.
5. Build Tooling
Section titled “5. Build Tooling”5A. kn86bark — WAV to Bark Converter (revised)
Section titled “5A. kn86bark — WAV to Bark Converter (revised)”kn86bark input.wav output.pcm [--rate 22050] [--normalize] [--preview]- Reads any WAV format (via dr_wav or similar single-header library)
- Resamples to target rate (default 22050 Hz)
- Outputs raw signed 16-bit PCM (little-endian), not packed nibbles
- Optionally generates a TOML snippet for
barks.toml --previewflag plays back through SDL audio for quick listening
The forced nibble-packing step from v0.1 is removed. The output is standard 16-bit PCM.
5B. Cartridge Build Integration
Section titled “5B. Cartridge Build Integration”Bark WAV assets live in the cartridge source tree under assets/barks/. The build step converts and places them into the SD card layout:
kn86bark_convert( BARKS assets/barks/breach.wav assets/barks/clean.wav assets/barks/burned.wav OUTPUT_DIR ${CMAKE_CURRENT_BINARY_DIR}/sd_root/audio/barks RATE 22050)This generates breach.pcm, clean.pcm, etc. and a barks.toml index, which are copied alongside cart.kn86 onto the SD card image during the pack step.
6. Questions for Agent Review
Section titled “6. Questions for Agent Review”For Platform Engineering Agent:
Section titled “For Platform Engineering Agent:”- Pi to Pico stream latency: Barks are triggered from cartridge Lisp during a cell evaluation. The Pi must stream PCM data to the Pico over UART before playback starts. At 1 Mbps and 43 KB per bark (22 kHz / 1.0 sec max), the wire time is ~430 ms — longer than the bark itself. Options: (a) pre-load bark data to the Pico at cart-load time (fits ~5 barks at 43 KB each in the Pico’s 520 KB SRAM budget); (b) stream on trigger and accept the latency for longer barks; (c) compress bark data (IMA ADPCM at 4:1 gives ~11 KB/bark, fitting ~5 barks at 1 sec each in under 65 KB). Recommend option (a) with a 6-bark preload budget and a compressed fallback. Resolve before implementation begins.
- Pico SRAM budget: At 22 kHz / 16-bit / 1.0 sec, one bark is 44,100 bytes. Six pre-loaded barks = ~264 KB. The Pico has 520 KB total SRAM; the existing audio buffer (
g_audio_buf) is 2 KB; PSG state is ~100 bytes; OLED framebuffer (SSD1322 256x64 at 4bpp) is 8 KB. Roughly 260 KB headroom for bark preload. This is workable but tight. Validate against Pico memory map during bring-up. - Wire format for
PCM_BUFFER_LOAD: The 1 KBMAX_FRAME_LENcap means a 44 KB bark requires ~44 frames. The Pi must send all frames beforePCM_PLAYBACK_START. Propose a sequence number and total-frame-count handshake in the payload, matching the pattern of the now-obsoleteCART_READ_BANK_DATA. Full protocol spec deferred tocoprocessor-protocol.md.
For C Engineer Agent:
Section titled “For C Engineer Agent:”kn86_bark_next_sample()in the synthesis loop: This function must be zero-cost when inactive (single branch onbark_state.active, predictable). When active, it reads from the pre-loaded buffer with a counter, advances by 1 at 44.1 kHz (bark at 22 kHz: hold each sample for two frames) or by 1 per sample (bark at 44.1 kHz), and returns 0 after the last sample. No locking needed if bark_state writes from core 0 are aligned-word stores (RP2350 guarantees atomic aligned stores up to 32 bits).- Volume mixing clamp: The mixed output is
psg + bark * vol_scale. At max PSG output (~32767) plus max bark (~32767), the sum saturates. The clamp-to-int16 must be applied before packing the I2S frame. This is already the natural location (see section 2B code sketch).
For Gameplay Design Agent:
Section titled “For Gameplay Design Agent:”- Bark frequency per session: How many barks per 30-minute session feels right before the novelty wears off? Propose a “bark budget” per mission type (e.g., max 3 barks per single-phase contract, max 6 per multi-phase campaign).
- Bark selection determinism: Should bark choice be LFSR-driven (same seed = same bark at same moment) or event-driven (always play “BREACH” on ICE detection regardless of seed)? The event-driven model is more intuitive for authors and aligns with how barks are labeled.
- Bare deck barks: Should the nOSh runtime have its own bark table for boot, cartridge swap, and mission board events? If so, firmware barks live in the device root filesystem, not on a cartridge SD card.
For QA Agent:
Section titled “For QA Agent:”- Audio quality acceptance criteria: At 22 kHz / 16-bit the intelligibility bar is substantially higher than the old 4-bit/8 kHz standard. Define: “recognizable as the word on first hearing, without accompanying text.” The Double Dribble standard was the floor; 22 kHz should clear it easily.
- Mixing regression: Bark playback must not introduce audible artifacts in PSG output when the bark is playing alongside PSG tones. Test: run all three YM2149 channels during a bark; verify no clipping, phase artifacts, or dropout.
- Emulator/device parity: Bark playback must sound identical on emulator (SDL audio callback) and device (Pico I2S path). The mixing arithmetic is identical; verify by recording both and comparing waveforms.
7. Risk Assessment
Section titled “7. Risk Assessment”| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Pico SRAM too tight for bark preload | Medium | Medium | Compress barks (IMA ADPCM 4:1 = ~11 KB/bark); or reduce max preloaded barks to 4. Measure at bring-up. |
| UART stream latency makes bark feel late | Medium | High | Pre-load at cart-load time. Longest viable preload window is 44 barks x 44 KB / 100 KB/s = ~19 s — too long. Design for 6-bark preload (264 KB / 100 KB/s = 2.6 s at cart-load, acceptable). |
| Bark overuse kills impact | Medium | Medium | Enforce bark budget in design reviews. Gameplay Design agent owns bark trigger criteria. QA agent validates frequency in playtesting. |
| PSG mixing clamp distortion | Low | Medium | Clamp is applied before I2S frame pack (section 2B). Volume scale defaults (1.0 for bark, 1.0 for PSG) may need adjustment during content authoring — provide vol field in barks.toml for per-bark tuning. |
| Scope creep toward full speech | Low | Medium | This spec explicitly caps barks at 1.0 second and 64 per cartridge. Longer speech, streaming playback, or multi-bark queuing are out of scope. The constraint is the feature. |
8. Success Criteria
Section titled “8. Success Criteria”- A recorded “BREACH” bark, converted to 22 kHz / 16-bit PCM and played through the emulator’s SDL audio path mixed alongside PSG output, is recognizable as the word “breach” to a listener on first hearing without accompanying text.
- During bark playback, all three YM2149 channels continue producing tones and noise without audible artifacts (clips, pops, pitch glitches).
kn86_bark_next_sample()returns 0 with a single branch when no bark is active — zero overhead on the synthesis hot path.- The
kn86barkbuild tool converts a 44.1 kHz WAV to 22 kHz 16-bit PCM and the round-trip (record to convert to play in emulator) is completable in under 5 minutes. - ICE Breaker’s
on-eventhandler can trigger a bark with a single(bark-play "CLEAN")call — no direct Pico protocol manipulation required by cartridge authors. - Six bark files pre-loaded from the cartridge SD card at cart-load time fit within the Pico’s SRAM budget without displacing the audio buffer or OLED framebuffer.
Amendment Log
Section titled “Amendment Log”| Date | Author | Change | Reference |
|---|---|---|---|
| 2026-04-13 | Josh Schairbaum | Original spec (v0.1) — YM2149 Channel C amplitude DAC model, 4-bit/8 kHz, on-cart flash Bark Table. Status: PROPOSED. | — |
| 2026-04-27 | Platform Engineering agent | v0.2 rewrite for post-ADR-0017 architecture: replaced YM2149-DAC playback with Pico-side PCM mixer in audio_i2s.c synthesis loop; updated sample format to 16-bit/22 kHz; replaced flash Bark Table with SD filesystem sidecar layout per ADR-0019; added PCM wire protocol surface placeholder; added ADR cross-references; updated Status. | GWP-171a; docs/plans/sprints/2026-04-27-sprint4-gwp-171-design.md |