PCM Voice Bark

Parent Documents:

docs/adr/ADR-0017-realtime-io-coprocessor.md — Pico 2 coprocessor; Pico owns YM2149 synthesis with I2S out to MAX98357A
docs/adr/ADR-0019-cartridge-storage-and-form-factor.md — Cartridges are full-size SD cards via USB mass storage; no on-cart flash regions
docs/adr/ADR-0015-cipher-line-auxiliary-display.md — CIPHER-LINE auxiliary OLED; CIPHER voice is text-primary
docs/software/api-reference/grammars/coprocessor-protocol.md — Pi to Pico UART wire-format spec (canonical frame types post-ADR-0019)

Cross-references: see CLAUDE.md Canonical Hardware Specification for audio hardware values (MAX98357A, YM2149 register count, speaker spec). Do not restate those values here.

1. Motivation

The Cipher voice is the KN-86’s narrative engine. It currently communicates exclusively through text rendered to the amber display — procedurally constructed sentences from domain word tables, appearing at mission transitions, debriefs, and critical state changes. The PSG provides tonal punctuation (alert stings, confirmation tones) but no vocal content.

This addendum proposes adding short PCM voice barks to the Cipher voice system. These are pre-recorded speech samples — single words or short phrases, stored as files on the cartridge SD card — played through the MAX98357A DAC/amplifier and mixed Pico-side alongside the YM2149 PSG output. The reference point is NES-era digitized speech: Double Dribble’s “DOUBLE DRIBBLE!”, Blades of Steel’s “FIGHT!”, Mike Tyson’s Punch-Out’s “BODY BLOW!”. But with a real DAC and 16-bit headroom, the result can be cleaner than those references while staying in the spirit: short, punched, unmistakable.

Why this matters: The Cipher voice is described as a “competent colleague” — terse, clipped, authoritative. Text on amber screen already sells this. But a barked word at a critical moment — “BREACH.” when ICE catches you, “CLEAN.” on a perfect extraction, “TRACED.” when Black Ledger finds the fraud — bridges the gap between reading a terminal message and feeling like someone is actually there. It’s the difference between seeing > CONTACT on screen and hearing the word punched through a 28mm speaker at the same time.

Design constraint: Barks supplement the text voice. They never replace it. The Cipher voice remains primarily textual. Barks fire at high-impact moments only — a few per session, not every screen transition. Overuse kills the effect.

2. Technical Approach

2A. Post-ADR-0017 Audio Architecture Summary

ADR-0017 moved the entire audio stack off the Pi Zero 2 W and onto the Raspberry Pi Pico 2 (RP2350) coprocessor. The Pico owns:

YM2149 PSG synthesis — kn86_psg_sample() runs on Pico core 1, producing int16_t samples at 44,100 Hz.
I2S output to MAX98357A — a PIO state machine and chained DMA pair (audio_i2s.c) clocks 16-bit stereo frames (L=R, mono-replicated) into the MAX98357A at 44.1 kHz. The MAX98357A is a genuine DAC and Class-D amplifier; it accepts PCM directly.
UART command link from Pi — the Pi sends PSG_REG_WRITE, PSG_BULK_WRITE, and PSG_RESET frames over 1 Mbps UART (see coprocessor-protocol.md); the Pico applies them before the next sample.

The YM2149-as-DAC trick described in the v0.1 spec (commandeering Channel C’s amplitude register to feed 4-bit samples at 8 kHz) was a workaround for Pi-side synthesis where the PSG emulator had to double as an output path. That constraint no longer exists. The MAX98357A accepts any PCM value the Pico puts in the I2S frame. PCM bark playback is a mixer addition, not a register-write hack.

2B. Playback Mechanism (post-ADR-0017)

Bark playback happens Pico-side, in the synthesis loop, by mixing a signed 16-bit PCM channel with the YM2149 PSG output before the combined sample is written into the I2S DMA buffer.

The synthesis loop in audio_i2s.c currently reads:

void kn86_audio_i2s_core1_synth_loop(void) {
    while (true) {
        uint32_t half = multicore_fifo_pop_blocking();
        audio_frame_t *base = ...;
        for (uint32_t i = 0; i < HALF_FRAMES; i++) {
            uint16_t u = (uint16_t)kn86_psg_sample(g_psg);
            base[i] = ((uint32_t)u << 16) | (uint32_t)u;
        }
    }
}

When bark playback is active, the loop mixes a PCM sample alongside the PSG output before packing the I2S frame:

int16_t psg  = kn86_psg_sample(g_psg);
int16_t bark = kn86_bark_next_sample(g_bark);   /* returns 0 when idle */
int32_t mixed = (int32_t)psg + (int32_t)bark;
if (mixed >  32767) mixed =  32767;
if (mixed < -32768) mixed = -32768;
uint16_t u = (uint16_t)(int16_t)mixed;
base[i] = ((uint32_t)u << 16) | (uint32_t)u;

kn86_bark_next_sample() advances the bark read pointer and returns the next signed 16-bit sample, or zero when no bark is active. The three YM2149 channels (A, B, C) are completely unaffected — all three remain available throughout bark playback.

Why this is correct: The I2S DMA buffer holds 32-bit stereo frames and is fed by the Pico’s synthesis core continuously. The MAX98357A treats every frame as a direct PCM value. Mixing at the synthesis stage before the DMA buffer write is the natural and correct insertion point — zero protocol overhead, zero jitter, and the mix happens at the same 44.1 kHz rate as the PSG output.

Pi involvement: The Pi triggers a bark by sending a new UART command (see section 2E). The Pico’s UART handler (core 0) writes the bark parameters into a shared structure; core 1 reads them on the next synthesis iteration. The kn86_psg_state_t volatile discipline already in place covers this access pattern — bark state follows the same convention.

2C. Sample Format

Property	Value	Rationale
Bit depth	16-bit signed PCM	Matches the I2S frame width and the PSG output type directly. The 4-bit constraint was a YM2149 artifact.
Sample rate	22,050 Hz (preferred) or 44,100 Hz	22 kHz is adequate for intelligible speech (10 kHz bandwidth), halves storage vs 44 kHz, and is an exact integer divisor of the 44.1 kHz synthesis rate. The Pico synthesis loop upsamples by holding each sample for two frames.
Channels	Mono	Single speaker.
Compression	None (raw 16-bit PCM)	SD has ample headroom; compression adds decoder complexity for no meaningful gain at bark durations.
Max duration	1.0 second per bark	Design constraint, not technical limit.
Storage per second at 22 kHz	~44,100 bytes	22,050 samples x 2 bytes/sample.
Storage per second at 44 kHz	~88,200 bytes	For barks that benefit from the full sample rate.

On the emulator: The desktop emulator’s sound.c SDL audio callback mixes PCM samples into its output buffer at the same insertion point — before writing the SDL audio frame. The 22 kHz upsampling step is identical: hold each bark sample for two 44.1 kHz SDL frames. Emulator and device playback are bit-equivalent.

2D. Cartridge SD Filesystem Layout (post-ADR-0019)

ADR-0019 replaced the flash-region storage model (ADR-0013 MBC5 SRAM, on-cart flash) with a full-size SD card mounted on the Pi as USB mass storage. There is no on-cart flash region, no bark-table packed into a binary header at a known offset. Bark samples are ordinary files on the cartridge’s SD filesystem.

Directory layout:

<cart_root>/
  cart.kn86            -- the .kn86 container (Lisp source + static data + metadata)
  audio/
    barks/
      breach.pcm       -- raw 16-bit signed mono PCM, 22050 Hz, little-endian
      clean.pcm
      burned.pcm
      traced.pcm
    barks.toml         -- bark index (label -> filename, sample_rate, volume_scale)
  save/
    save.sav           -- per-cartridge save state (per ADR-0019 section 6)

barks.toml is a small plaintext index:

[[bark]]
label = "BREACH"
file  = "audio/barks/breach.pcm"
rate  = 22050
vol   = 1.0

[[bark]]
label = "CLEAN"
file  = "audio/barks/clean.pcm"
rate  = 22050
vol   = 1.0

The nOSh runtime reads barks.toml at cartridge load time and builds an in-memory bark table (labels to opened file descriptors or preloaded buffers). File I/O is standard open() / read() on the Pi side against the USB-MSC mounted SD.

Max bark count: 64 per cartridge (up from the v0.1 constraint of 16). The previous 16-slot limit was sized against the fixed 260-byte BarkTableHeader structure embedded in on-cart flash. With SD storage there is no fixed-size index, and 64 labels fits in a barks.toml under 2 KB. In practice, the design constraint (a few barks per session, no overuse) means most cartridges will carry 6-12 barks.

Total bark footprint: at 22 kHz / 16-bit / 1.0 sec max, each bark is ~43 KB. 64 barks at max duration = ~2.75 MB — trivial on any SD card. No bark budget pressure.

No .kn86 container change: the .kn86 container (ADR-0006) does not need a bark_offset / bark_size field. Bark samples are sidecar files on the SD filesystem, not embedded in the container. The cart.kn86 file itself is unchanged.

2E. Wire Protocol Surface (Pi to Pico)

The Pi loads and decodes bark metadata; the Pico synthesizes. When a bark triggers, the Pi must send the PCM data to the Pico for mixing into the I2S stream.

Proposed additions to coprocessor-protocol.md (to be added when implementation begins — flagged here, not fully specified):

Frame type (proposed)	Direction	Semantics
`PCM_BUFFER_LOAD` (type TBD, 0x40-0xDF reserved range)	Pi to Pico, fire-and-forget	Sends a chunk of 16-bit PCM payload. Multiple frames needed for a full bark; framed at 1 KB payload (matching `MAX_FRAME_LEN`).
`PCM_PLAYBACK_START` (type TBD)	Pi to Pico, fire-and-forget	Signals the Pico to begin mixing the pre-loaded PCM buffer. Payload: total sample count, sample rate, volume scale.
`PCM_PLAYBACK_STOP` (type TBD)	Pi to Pico, fire-and-forget	Immediately stops bark mixing, returns to PSG-only output.

Alternative: full Pico-side file read. If the Pico gains shared-memory access to the SD (not currently planned), it could read bark files directly. The Pi-to-Pico-stream model is the practical path under the current architecture. End-to-end latency from trigger to first sample must fit within the <30 ms PSG-write-to-audible-tone budget established in coprocessor-protocol.md section 7.

Do not implement these frames in this PR. They are flagged here so the protocol spec author has the design intent when implementation begins.

2F. Implementation Gate

GWP-171 (the parent task) is explicitly deferred until all three gate conditions clear (per the Sprint 4 design pack at docs/plans/sprints/2026-04-27-sprint4-gwp-171-design.md):

ADR-0017 audio path measured on prototype. Stage 1c bring-up has captured real Pico to I2S to MAX98357A latency, jitter, and power figures.
CIPHER-LINE OLED voice playtested. The text-voice surface (ADR-0015) is on the device and Josh has called its narrative weight sufficient or insufficient on its own.
This spec is current. The rewrite in this document (GWP-171a) satisfies gate condition 3. Gates 1 and 2 remain open.

3. Software Interface

3A. Stdlib Addition (`nosh_stdlib.h`)

The callable interface is unchanged from v0.1 — cartridge authors still call stdlib_bark_play(g_state, "CLEAN"). The implementation behind the call changes: instead of writing a nibble into a PSG register, it triggers the Pi-side bark dispatch to the Pico.

/* ---- Voice Bark Playback ---- */

/* Play a bark by label. Returns false if label not found or no bark table. */
bool stdlib_bark_play(SystemState *state, const char *label);

/* Play a bark by index (0 to bark_count-1). Returns false if index out of range. */
bool stdlib_bark_play_index(SystemState *state, uint8_t index);

/* Stop any currently playing bark immediately. */
void stdlib_bark_stop(SystemState *state);

/* Is a bark currently playing? */
bool stdlib_bark_active(SystemState *state);

3B. Runtime State Addition (`types.h`)

/* PCM bark playback state (inside RuntimeState) */
typedef struct {
    char     label[32];        /* Currently playing bark label, for logging */
    uint32_t total_samples;    /* Total sample count for current bark */
    uint32_t samples_sent;     /* Samples sent to Pico so far */
    uint16_t sample_rate;      /* Bark sample rate in Hz (22050 or 44100) */
    float    volume_scale;     /* 0.0 to 1.0 */
    bool     active;           /* true = bark is playing or being streamed */
} BarkPlayback;

3C. Cartridge Authoring

; In cartridge Lisp source, declare bark triggers:
(on-event :node-compromised
  (fn (node)
    (bark-play "BREACH")
    ; Text still displays simultaneously:
    (text-puts 0 12 "> NETWORK COMPROMISED. NO TRACE.")))

The bark-play NoshAPI primitive maps to stdlib_bark_play(). Cartridge authors never touch the UART protocol or file I/O. The bark files live on the SD card as sidecar assets alongside the cart.kn86 container.

4. Content Design

4A. Bark Vocabulary per Module

Each cartridge gets a domain-specific bark palette. These should be single words or two-word phrases, recorded with an authoritative, clipped delivery — like a military radio operator or an air traffic controller. Not conversational. Not friendly. Functional.

Module	Proposed Barks	Trigger Context
ICE Breaker	BREACH, CLEAN, BURNED, LOCKED, OPEN, TRACE, EXIT	ICE detection, extraction success/fail, node state changes
Depthcharge	CONTACT, DEPTH, SURFACE, LAUNCH, HIT, MISS, CLEAR	Sonar events, depth charge outcomes
Black Ledger	FRAUD, TRACED, VOID, FLAGGED, CLEAN, AUDIT	Transaction analysis results, audit outcomes
NeonGrid	PATROL, CLEAR, BLOCKED, ROUTE, BREACH	Guard detection, path validation
Cipher Garden	DECRYPT, LOCKED, KEY, MATCH, FAIL	Cipher operations, key verification
nOSh (firmware)	READY, LINK, SWAP, COMPLETE	Boot, cartridge swap, mission completion

4B. Recording Guidelines

Source: Record at 44.1 kHz / 16-bit WAV.
Target: Downsample to 22 kHz / 16-bit via build tool before packing. The 4-bit encode step is removed — the MAX98357A takes 16-bit directly, so there is no forced quantization to 16 amplitude levels.
Delivery: Short, barked, declarative. Hard consonants (B, D, K, T, CH) cut through a 28mm speaker clearly; sibilants (S, SH, F, TH) are less punched. The 22 kHz path is far more forgiving than the old 4-bit/8 kHz pipeline — the lo-fi aesthetic is a deliberate choice, not a technical floor.
Processing: Compression / limiting to control dynamic range, high-pass filter at 200 Hz to remove room rumble. At 16-bit there is real dynamic headroom — choose the aesthetic deliberately rather than fighting quantization noise.
Duration target: 0.3-0.7 seconds per bark. Anything over 1.0 second should be cut. The bark should feel like a punch, not a sentence.
Tone: Not robotic. Not text-to-speech. A real human voice. The 28mm speaker through the Pelican case will impose its own character.

4C. Interaction with Existing Audio

All three YM2149 channels (A, B, C) remain fully available during bark playback. The design rules:

Barks mix with, not over, PSG output. The mixed signal is the arithmetic sum of PSG output and bark PCM, clamped to the 16-bit signed range. No PSG channel is silenced, hijacked, or interrupted.
Bark volume is scaled before mix. The vol field in barks.toml lets cartridge authors balance bark loudness against the PSG background.
One bark at a time. Triggering a new bark while one is playing stops the current bark and starts the new one immediately.
No barks during LAMBDA playback. Macro replay should be silent — barks during fast replay would be cacophonous.
SYS hold abort stops all barks. The emergency exit silences everything.

5. Build Tooling

5A. `kn86bark` — WAV to Bark Converter (revised)

kn86bark input.wav output.pcm [--rate 22050] [--normalize] [--preview]

Reads any WAV format (via dr_wav or similar single-header library)
Resamples to target rate (default 22050 Hz)
Outputs raw signed 16-bit PCM (little-endian), not packed nibbles
Optionally generates a TOML snippet for barks.toml
--preview flag plays back through SDL audio for quick listening

The forced nibble-packing step from v0.1 is removed. The output is standard 16-bit PCM.

5B. Cartridge Build Integration

Bark WAV assets live in the cartridge source tree under assets/barks/. The build step converts and places them into the SD card layout:

kn86bark_convert(
    BARKS
        assets/barks/breach.wav
        assets/barks/clean.wav
        assets/barks/burned.wav
    OUTPUT_DIR ${CMAKE_CURRENT_BINARY_DIR}/sd_root/audio/barks
    RATE 22050
)

This generates breach.pcm, clean.pcm, etc. and a barks.toml index, which are copied alongside cart.kn86 onto the SD card image during the pack step.

6. Questions for Agent Review

For Platform Engineering Agent:

Pi to Pico stream latency: Barks are triggered from cartridge Lisp during a cell evaluation. The Pi must stream PCM data to the Pico over UART before playback starts. At 1 Mbps and 43 KB per bark (22 kHz / 1.0 sec max), the wire time is ~430 ms — longer than the bark itself. Options: (a) pre-load bark data to the Pico at cart-load time (fits ~5 barks at 43 KB each in the Pico’s 520 KB SRAM budget); (b) stream on trigger and accept the latency for longer barks; (c) compress bark data (IMA ADPCM at 4:1 gives ~11 KB/bark, fitting ~5 barks at 1 sec each in under 65 KB). Recommend option (a) with a 6-bark preload budget and a compressed fallback. Resolve before implementation begins.
Pico SRAM budget: At 22 kHz / 16-bit / 1.0 sec, one bark is 44,100 bytes. Six pre-loaded barks = ~264 KB. The Pico has 520 KB total SRAM; the existing audio buffer (g_audio_buf) is 2 KB; PSG state is ~100 bytes; OLED framebuffer (SSD1322 256x64 at 4bpp) is 8 KB. Roughly 260 KB headroom for bark preload. This is workable but tight. Validate against Pico memory map during bring-up.
Wire format for PCM_BUFFER_LOAD: The 1 KB MAX_FRAME_LEN cap means a 44 KB bark requires ~44 frames. The Pi must send all frames before PCM_PLAYBACK_START. Propose a sequence number and total-frame-count handshake in the payload, matching the pattern of the now-obsolete CART_READ_BANK_DATA. Full protocol spec deferred to coprocessor-protocol.md.

For C Engineer Agent:

kn86_bark_next_sample() in the synthesis loop: This function must be zero-cost when inactive (single branch on bark_state.active, predictable). When active, it reads from the pre-loaded buffer with a counter, advances by 1 at 44.1 kHz (bark at 22 kHz: hold each sample for two frames) or by 1 per sample (bark at 44.1 kHz), and returns 0 after the last sample. No locking needed if bark_state writes from core 0 are aligned-word stores (RP2350 guarantees atomic aligned stores up to 32 bits).
Volume mixing clamp: The mixed output is psg + bark * vol_scale. At max PSG output (~32767) plus max bark (~32767), the sum saturates. The clamp-to-int16 must be applied before packing the I2S frame. This is already the natural location (see section 2B code sketch).

For Gameplay Design Agent:

Bark frequency per session: How many barks per 30-minute session feels right before the novelty wears off? Propose a “bark budget” per mission type (e.g., max 3 barks per single-phase contract, max 6 per multi-phase campaign).
Bark selection determinism: Should bark choice be LFSR-driven (same seed = same bark at same moment) or event-driven (always play “BREACH” on ICE detection regardless of seed)? The event-driven model is more intuitive for authors and aligns with how barks are labeled.
Bare deck barks: Should the nOSh runtime have its own bark table for boot, cartridge swap, and mission board events? If so, firmware barks live in the device root filesystem, not on a cartridge SD card.

For QA Agent:

Audio quality acceptance criteria: At 22 kHz / 16-bit the intelligibility bar is substantially higher than the old 4-bit/8 kHz standard. Define: “recognizable as the word on first hearing, without accompanying text.” The Double Dribble standard was the floor; 22 kHz should clear it easily.
Mixing regression: Bark playback must not introduce audible artifacts in PSG output when the bark is playing alongside PSG tones. Test: run all three YM2149 channels during a bark; verify no clipping, phase artifacts, or dropout.
Emulator/device parity: Bark playback must sound identical on emulator (SDL audio callback) and device (Pico I2S path). The mixing arithmetic is identical; verify by recording both and comparing waveforms.

7. Risk Assessment

Risk	Likelihood	Impact	Mitigation
Pico SRAM too tight for bark preload	Medium	Medium	Compress barks (IMA ADPCM 4:1 = ~11 KB/bark); or reduce max preloaded barks to 4. Measure at bring-up.
UART stream latency makes bark feel late	Medium	High	Pre-load at cart-load time. Longest viable preload window is 44 barks x 44 KB / 100 KB/s = ~19 s — too long. Design for 6-bark preload (264 KB / 100 KB/s = 2.6 s at cart-load, acceptable).
Bark overuse kills impact	Medium	Medium	Enforce bark budget in design reviews. Gameplay Design agent owns bark trigger criteria. QA agent validates frequency in playtesting.
PSG mixing clamp distortion	Low	Medium	Clamp is applied before I2S frame pack (section 2B). Volume scale defaults (1.0 for bark, 1.0 for PSG) may need adjustment during content authoring — provide `vol` field in `barks.toml` for per-bark tuning.
Scope creep toward full speech	Low	Medium	This spec explicitly caps barks at 1.0 second and 64 per cartridge. Longer speech, streaming playback, or multi-bark queuing are out of scope. The constraint is the feature.

8. Success Criteria

A recorded “BREACH” bark, converted to 22 kHz / 16-bit PCM and played through the emulator’s SDL audio path mixed alongside PSG output, is recognizable as the word “breach” to a listener on first hearing without accompanying text.
During bark playback, all three YM2149 channels continue producing tones and noise without audible artifacts (clips, pops, pitch glitches).
kn86_bark_next_sample() returns 0 with a single branch when no bark is active — zero overhead on the synthesis hot path.
The kn86bark build tool converts a 44.1 kHz WAV to 22 kHz 16-bit PCM and the round-trip (record to convert to play in emulator) is completable in under 5 minutes.
ICE Breaker’s on-event handler can trigger a bark with a single (bark-play "CLEAN") call — no direct Pico protocol manipulation required by cartridge authors.
Six bark files pre-loaded from the cartridge SD card at cart-load time fit within the Pico’s SRAM budget without displacing the audio buffer or OLED framebuffer.

Amendment Log

Date	Author	Change	Reference
2026-04-13	Josh Schairbaum	Original spec (v0.1) — YM2149 Channel C amplitude DAC model, 4-bit/8 kHz, on-cart flash Bark Table. Status: PROPOSED.	—
2026-04-27	Platform Engineering agent	v0.2 rewrite for post-ADR-0017 architecture: replaced YM2149-DAC playback with Pico-side PCM mixer in `audio_i2s.c` synthesis loop; updated sample format to 16-bit/22 kHz; replaced flash Bark Table with SD filesystem sidecar layout per ADR-0019; added PCM wire protocol surface placeholder; added ADR cross-references; updated Status.	GWP-171a; `docs/plans/sprints/2026-04-27-sprint4-gwp-171-design.md`