Skip to content

PCM Voice Bark

Parent Documents:

Cross-references: see CLAUDE.md Canonical Hardware Specification for audio hardware values (MAX98357A, YM2149 register count, speaker spec). Do not restate those values here.


The Cipher voice is the KN-86’s narrative engine. It currently communicates exclusively through text rendered to the amber display — procedurally constructed sentences from domain word tables, appearing at mission transitions, debriefs, and critical state changes. The PSG provides tonal punctuation (alert stings, confirmation tones) but no vocal content.

This addendum proposes adding short PCM voice barks to the Cipher voice system. These are pre-recorded speech samples — single words or short phrases, stored as files on the cartridge SD card — played through the MAX98357A DAC/amplifier and mixed Pico-side alongside the YM2149 PSG output. The reference point is NES-era digitized speech: Double Dribble’s “DOUBLE DRIBBLE!”, Blades of Steel’s “FIGHT!”, Mike Tyson’s Punch-Out’s “BODY BLOW!”. But with a real DAC and 16-bit headroom, the result can be cleaner than those references while staying in the spirit: short, punched, unmistakable.

Why this matters: The Cipher voice is described as a “competent colleague” — terse, clipped, authoritative. Text on amber screen already sells this. But a barked word at a critical moment — “BREACH.” when ICE catches you, “CLEAN.” on a perfect extraction, “TRACED.” when Black Ledger finds the fraud — bridges the gap between reading a terminal message and feeling like someone is actually there. It’s the difference between seeing > CONTACT on screen and hearing the word punched through a 28mm speaker at the same time.

Design constraint: Barks supplement the text voice. They never replace it. The Cipher voice remains primarily textual. Barks fire at high-impact moments only — a few per session, not every screen transition. Overuse kills the effect.


2A. Post-ADR-0017 Audio Architecture Summary

Section titled “2A. Post-ADR-0017 Audio Architecture Summary”

ADR-0017 moved the entire audio stack off the Pi Zero 2 W and onto the Raspberry Pi Pico 2 (RP2350) coprocessor. The Pico owns:

  1. YM2149 PSG synthesiskn86_psg_sample() runs on Pico core 1, producing int16_t samples at 44,100 Hz.
  2. I2S output to MAX98357A — a PIO state machine and chained DMA pair (audio_i2s.c) clocks 16-bit stereo frames (L=R, mono-replicated) into the MAX98357A at 44.1 kHz. The MAX98357A is a genuine DAC and Class-D amplifier; it accepts PCM directly.
  3. UART command link from Pi — the Pi sends PSG_REG_WRITE, PSG_BULK_WRITE, and PSG_RESET frames over 1 Mbps UART (see coprocessor-protocol.md); the Pico applies them before the next sample.

The YM2149-as-DAC trick described in the v0.1 spec (commandeering Channel C’s amplitude register to feed 4-bit samples at 8 kHz) was a workaround for Pi-side synthesis where the PSG emulator had to double as an output path. That constraint no longer exists. The MAX98357A accepts any PCM value the Pico puts in the I2S frame. PCM bark playback is a mixer addition, not a register-write hack.

Bark playback happens Pico-side, in the synthesis loop, by mixing a signed 16-bit PCM channel with the YM2149 PSG output before the combined sample is written into the I2S DMA buffer.

The synthesis loop in audio_i2s.c currently reads:

void kn86_audio_i2s_core1_synth_loop(void) {
while (true) {
uint32_t half = multicore_fifo_pop_blocking();
audio_frame_t *base = ...;
for (uint32_t i = 0; i < HALF_FRAMES; i++) {
uint16_t u = (uint16_t)kn86_psg_sample(g_psg);
base[i] = ((uint32_t)u << 16) | (uint32_t)u;
}
}
}

When bark playback is active, the loop mixes a PCM sample alongside the PSG output before packing the I2S frame:

int16_t psg = kn86_psg_sample(g_psg);
int16_t bark = kn86_bark_next_sample(g_bark); /* returns 0 when idle */
int32_t mixed = (int32_t)psg + (int32_t)bark;
if (mixed > 32767) mixed = 32767;
if (mixed < -32768) mixed = -32768;
uint16_t u = (uint16_t)(int16_t)mixed;
base[i] = ((uint32_t)u << 16) | (uint32_t)u;

kn86_bark_next_sample() advances the bark read pointer and returns the next signed 16-bit sample, or zero when no bark is active. The three YM2149 channels (A, B, C) are completely unaffected — all three remain available throughout bark playback.

Why this is correct: The I2S DMA buffer holds 32-bit stereo frames and is fed by the Pico’s synthesis core continuously. The MAX98357A treats every frame as a direct PCM value. Mixing at the synthesis stage before the DMA buffer write is the natural and correct insertion point — zero protocol overhead, zero jitter, and the mix happens at the same 44.1 kHz rate as the PSG output.

Pi involvement: The Pi triggers a bark by sending a new UART command (see section 2E). The Pico’s UART handler (core 0) writes the bark parameters into a shared structure; core 1 reads them on the next synthesis iteration. The kn86_psg_state_t volatile discipline already in place covers this access pattern — bark state follows the same convention.

PropertyValueRationale
Bit depth16-bit signed PCMMatches the I2S frame width and the PSG output type directly. The 4-bit constraint was a YM2149 artifact.
Sample rate22,050 Hz (preferred) or 44,100 Hz22 kHz is adequate for intelligible speech (10 kHz bandwidth), halves storage vs 44 kHz, and is an exact integer divisor of the 44.1 kHz synthesis rate. The Pico synthesis loop upsamples by holding each sample for two frames.
ChannelsMonoSingle speaker.
CompressionNone (raw 16-bit PCM)SD has ample headroom; compression adds decoder complexity for no meaningful gain at bark durations.
Max duration1.0 second per barkDesign constraint, not technical limit.
Storage per second at 22 kHz~44,100 bytes22,050 samples x 2 bytes/sample.
Storage per second at 44 kHz~88,200 bytesFor barks that benefit from the full sample rate.

On the emulator: The desktop emulator’s sound.c SDL audio callback mixes PCM samples into its output buffer at the same insertion point — before writing the SDL audio frame. The 22 kHz upsampling step is identical: hold each bark sample for two 44.1 kHz SDL frames. Emulator and device playback are bit-equivalent.

2D. Cartridge SD Filesystem Layout (post-ADR-0019)

Section titled “2D. Cartridge SD Filesystem Layout (post-ADR-0019)”

ADR-0019 replaced the flash-region storage model (ADR-0013 MBC5 SRAM, on-cart flash) with a full-size SD card mounted on the Pi as USB mass storage. There is no on-cart flash region, no bark-table packed into a binary header at a known offset. Bark samples are ordinary files on the cartridge’s SD filesystem.

Directory layout:

<cart_root>/
cart.kn86 -- the .kn86 container (Lisp source + static data + metadata)
audio/
barks/
breach.pcm -- raw 16-bit signed mono PCM, 22050 Hz, little-endian
clean.pcm
burned.pcm
traced.pcm
barks.toml -- bark index (label -> filename, sample_rate, volume_scale)
save/
save.sav -- per-cartridge save state (per ADR-0019 section 6)

barks.toml is a small plaintext index:

[[bark]]
label = "BREACH"
file = "audio/barks/breach.pcm"
rate = 22050
vol = 1.0
[[bark]]
label = "CLEAN"
file = "audio/barks/clean.pcm"
rate = 22050
vol = 1.0

The nOSh runtime reads barks.toml at cartridge load time and builds an in-memory bark table (labels to opened file descriptors or preloaded buffers). File I/O is standard open() / read() on the Pi side against the USB-MSC mounted SD.

Max bark count: 64 per cartridge (up from the v0.1 constraint of 16). The previous 16-slot limit was sized against the fixed 260-byte BarkTableHeader structure embedded in on-cart flash. With SD storage there is no fixed-size index, and 64 labels fits in a barks.toml under 2 KB. In practice, the design constraint (a few barks per session, no overuse) means most cartridges will carry 6-12 barks.

Total bark footprint: at 22 kHz / 16-bit / 1.0 sec max, each bark is ~43 KB. 64 barks at max duration = ~2.75 MB — trivial on any SD card. No bark budget pressure.

No .kn86 container change: the .kn86 container (ADR-0006) does not need a bark_offset / bark_size field. Bark samples are sidecar files on the SD filesystem, not embedded in the container. The cart.kn86 file itself is unchanged.

The Pi loads and decodes bark metadata; the Pico synthesizes. When a bark triggers, the Pi must send the PCM data to the Pico for mixing into the I2S stream.

Proposed additions to coprocessor-protocol.md (to be added when implementation begins — flagged here, not fully specified):

Frame type (proposed)DirectionSemantics
PCM_BUFFER_LOAD (type TBD, 0x40-0xDF reserved range)Pi to Pico, fire-and-forgetSends a chunk of 16-bit PCM payload. Multiple frames needed for a full bark; framed at 1 KB payload (matching MAX_FRAME_LEN).
PCM_PLAYBACK_START (type TBD)Pi to Pico, fire-and-forgetSignals the Pico to begin mixing the pre-loaded PCM buffer. Payload: total sample count, sample rate, volume scale.
PCM_PLAYBACK_STOP (type TBD)Pi to Pico, fire-and-forgetImmediately stops bark mixing, returns to PSG-only output.

Alternative: full Pico-side file read. If the Pico gains shared-memory access to the SD (not currently planned), it could read bark files directly. The Pi-to-Pico-stream model is the practical path under the current architecture. End-to-end latency from trigger to first sample must fit within the <30 ms PSG-write-to-audible-tone budget established in coprocessor-protocol.md section 7.

Do not implement these frames in this PR. They are flagged here so the protocol spec author has the design intent when implementation begins.

GWP-171 (the parent task) is explicitly deferred until all three gate conditions clear (per the Sprint 4 design pack at docs/plans/sprints/2026-04-27-sprint4-gwp-171-design.md):

  1. ADR-0017 audio path measured on prototype. Stage 1c bring-up has captured real Pico to I2S to MAX98357A latency, jitter, and power figures.
  2. CIPHER-LINE OLED voice playtested. The text-voice surface (ADR-0015) is on the device and Josh has called its narrative weight sufficient or insufficient on its own.
  3. This spec is current. The rewrite in this document (GWP-171a) satisfies gate condition 3. Gates 1 and 2 remain open.

The callable interface is unchanged from v0.1 — cartridge authors still call stdlib_bark_play(g_state, "CLEAN"). The implementation behind the call changes: instead of writing a nibble into a PSG register, it triggers the Pi-side bark dispatch to the Pico.

/* ---- Voice Bark Playback ---- */
/* Play a bark by label. Returns false if label not found or no bark table. */
bool stdlib_bark_play(SystemState *state, const char *label);
/* Play a bark by index (0 to bark_count-1). Returns false if index out of range. */
bool stdlib_bark_play_index(SystemState *state, uint8_t index);
/* Stop any currently playing bark immediately. */
void stdlib_bark_stop(SystemState *state);
/* Is a bark currently playing? */
bool stdlib_bark_active(SystemState *state);
/* PCM bark playback state (inside RuntimeState) */
typedef struct {
char label[32]; /* Currently playing bark label, for logging */
uint32_t total_samples; /* Total sample count for current bark */
uint32_t samples_sent; /* Samples sent to Pico so far */
uint16_t sample_rate; /* Bark sample rate in Hz (22050 or 44100) */
float volume_scale; /* 0.0 to 1.0 */
bool active; /* true = bark is playing or being streamed */
} BarkPlayback;
; In cartridge Lisp source, declare bark triggers:
(on-event :node-compromised
(fn (node)
(bark-play "BREACH")
; Text still displays simultaneously:
(text-puts 0 12 "> NETWORK COMPROMISED. NO TRACE.")))

The bark-play NoshAPI primitive maps to stdlib_bark_play(). Cartridge authors never touch the UART protocol or file I/O. The bark files live on the SD card as sidecar assets alongside the cart.kn86 container.


Each cartridge gets a domain-specific bark palette. These should be single words or two-word phrases, recorded with an authoritative, clipped delivery — like a military radio operator or an air traffic controller. Not conversational. Not friendly. Functional.

ModuleProposed BarksTrigger Context
ICE BreakerBREACH, CLEAN, BURNED, LOCKED, OPEN, TRACE, EXITICE detection, extraction success/fail, node state changes
DepthchargeCONTACT, DEPTH, SURFACE, LAUNCH, HIT, MISS, CLEARSonar events, depth charge outcomes
Black LedgerFRAUD, TRACED, VOID, FLAGGED, CLEAN, AUDITTransaction analysis results, audit outcomes
NeonGridPATROL, CLEAR, BLOCKED, ROUTE, BREACHGuard detection, path validation
Cipher GardenDECRYPT, LOCKED, KEY, MATCH, FAILCipher operations, key verification
nOSh (firmware)READY, LINK, SWAP, COMPLETEBoot, cartridge swap, mission completion
  • Source: Record at 44.1 kHz / 16-bit WAV.
  • Target: Downsample to 22 kHz / 16-bit via build tool before packing. The 4-bit encode step is removed — the MAX98357A takes 16-bit directly, so there is no forced quantization to 16 amplitude levels.
  • Delivery: Short, barked, declarative. Hard consonants (B, D, K, T, CH) cut through a 28mm speaker clearly; sibilants (S, SH, F, TH) are less punched. The 22 kHz path is far more forgiving than the old 4-bit/8 kHz pipeline — the lo-fi aesthetic is a deliberate choice, not a technical floor.
  • Processing: Compression / limiting to control dynamic range, high-pass filter at 200 Hz to remove room rumble. At 16-bit there is real dynamic headroom — choose the aesthetic deliberately rather than fighting quantization noise.
  • Duration target: 0.3-0.7 seconds per bark. Anything over 1.0 second should be cut. The bark should feel like a punch, not a sentence.
  • Tone: Not robotic. Not text-to-speech. A real human voice. The 28mm speaker through the Pelican case will impose its own character.

All three YM2149 channels (A, B, C) remain fully available during bark playback. The design rules:

  1. Barks mix with, not over, PSG output. The mixed signal is the arithmetic sum of PSG output and bark PCM, clamped to the 16-bit signed range. No PSG channel is silenced, hijacked, or interrupted.
  2. Bark volume is scaled before mix. The vol field in barks.toml lets cartridge authors balance bark loudness against the PSG background.
  3. One bark at a time. Triggering a new bark while one is playing stops the current bark and starts the new one immediately.
  4. No barks during LAMBDA playback. Macro replay should be silent — barks during fast replay would be cacophonous.
  5. SYS hold abort stops all barks. The emergency exit silences everything.

5A. kn86bark — WAV to Bark Converter (revised)

Section titled “5A. kn86bark — WAV to Bark Converter (revised)”
kn86bark input.wav output.pcm [--rate 22050] [--normalize] [--preview]
  • Reads any WAV format (via dr_wav or similar single-header library)
  • Resamples to target rate (default 22050 Hz)
  • Outputs raw signed 16-bit PCM (little-endian), not packed nibbles
  • Optionally generates a TOML snippet for barks.toml
  • --preview flag plays back through SDL audio for quick listening

The forced nibble-packing step from v0.1 is removed. The output is standard 16-bit PCM.

Bark WAV assets live in the cartridge source tree under assets/barks/. The build step converts and places them into the SD card layout:

kn86bark_convert(
BARKS
assets/barks/breach.wav
assets/barks/clean.wav
assets/barks/burned.wav
OUTPUT_DIR ${CMAKE_CURRENT_BINARY_DIR}/sd_root/audio/barks
RATE 22050
)

This generates breach.pcm, clean.pcm, etc. and a barks.toml index, which are copied alongside cart.kn86 onto the SD card image during the pack step.


  1. Pi to Pico stream latency: Barks are triggered from cartridge Lisp during a cell evaluation. The Pi must stream PCM data to the Pico over UART before playback starts. At 1 Mbps and 43 KB per bark (22 kHz / 1.0 sec max), the wire time is ~430 ms — longer than the bark itself. Options: (a) pre-load bark data to the Pico at cart-load time (fits ~5 barks at 43 KB each in the Pico’s 520 KB SRAM budget); (b) stream on trigger and accept the latency for longer barks; (c) compress bark data (IMA ADPCM at 4:1 gives ~11 KB/bark, fitting ~5 barks at 1 sec each in under 65 KB). Recommend option (a) with a 6-bark preload budget and a compressed fallback. Resolve before implementation begins.
  2. Pico SRAM budget: At 22 kHz / 16-bit / 1.0 sec, one bark is 44,100 bytes. Six pre-loaded barks = ~264 KB. The Pico has 520 KB total SRAM; the existing audio buffer (g_audio_buf) is 2 KB; PSG state is ~100 bytes; OLED framebuffer (SSD1322 256x64 at 4bpp) is 8 KB. Roughly 260 KB headroom for bark preload. This is workable but tight. Validate against Pico memory map during bring-up.
  3. Wire format for PCM_BUFFER_LOAD: The 1 KB MAX_FRAME_LEN cap means a 44 KB bark requires ~44 frames. The Pi must send all frames before PCM_PLAYBACK_START. Propose a sequence number and total-frame-count handshake in the payload, matching the pattern of the now-obsolete CART_READ_BANK_DATA. Full protocol spec deferred to coprocessor-protocol.md.
  1. kn86_bark_next_sample() in the synthesis loop: This function must be zero-cost when inactive (single branch on bark_state.active, predictable). When active, it reads from the pre-loaded buffer with a counter, advances by 1 at 44.1 kHz (bark at 22 kHz: hold each sample for two frames) or by 1 per sample (bark at 44.1 kHz), and returns 0 after the last sample. No locking needed if bark_state writes from core 0 are aligned-word stores (RP2350 guarantees atomic aligned stores up to 32 bits).
  2. Volume mixing clamp: The mixed output is psg + bark * vol_scale. At max PSG output (~32767) plus max bark (~32767), the sum saturates. The clamp-to-int16 must be applied before packing the I2S frame. This is already the natural location (see section 2B code sketch).
  1. Bark frequency per session: How many barks per 30-minute session feels right before the novelty wears off? Propose a “bark budget” per mission type (e.g., max 3 barks per single-phase contract, max 6 per multi-phase campaign).
  2. Bark selection determinism: Should bark choice be LFSR-driven (same seed = same bark at same moment) or event-driven (always play “BREACH” on ICE detection regardless of seed)? The event-driven model is more intuitive for authors and aligns with how barks are labeled.
  3. Bare deck barks: Should the nOSh runtime have its own bark table for boot, cartridge swap, and mission board events? If so, firmware barks live in the device root filesystem, not on a cartridge SD card.
  1. Audio quality acceptance criteria: At 22 kHz / 16-bit the intelligibility bar is substantially higher than the old 4-bit/8 kHz standard. Define: “recognizable as the word on first hearing, without accompanying text.” The Double Dribble standard was the floor; 22 kHz should clear it easily.
  2. Mixing regression: Bark playback must not introduce audible artifacts in PSG output when the bark is playing alongside PSG tones. Test: run all three YM2149 channels during a bark; verify no clipping, phase artifacts, or dropout.
  3. Emulator/device parity: Bark playback must sound identical on emulator (SDL audio callback) and device (Pico I2S path). The mixing arithmetic is identical; verify by recording both and comparing waveforms.

RiskLikelihoodImpactMitigation
Pico SRAM too tight for bark preloadMediumMediumCompress barks (IMA ADPCM 4:1 = ~11 KB/bark); or reduce max preloaded barks to 4. Measure at bring-up.
UART stream latency makes bark feel lateMediumHighPre-load at cart-load time. Longest viable preload window is 44 barks x 44 KB / 100 KB/s = ~19 s — too long. Design for 6-bark preload (264 KB / 100 KB/s = 2.6 s at cart-load, acceptable).
Bark overuse kills impactMediumMediumEnforce bark budget in design reviews. Gameplay Design agent owns bark trigger criteria. QA agent validates frequency in playtesting.
PSG mixing clamp distortionLowMediumClamp is applied before I2S frame pack (section 2B). Volume scale defaults (1.0 for bark, 1.0 for PSG) may need adjustment during content authoring — provide vol field in barks.toml for per-bark tuning.
Scope creep toward full speechLowMediumThis spec explicitly caps barks at 1.0 second and 64 per cartridge. Longer speech, streaming playback, or multi-bark queuing are out of scope. The constraint is the feature.

  1. A recorded “BREACH” bark, converted to 22 kHz / 16-bit PCM and played through the emulator’s SDL audio path mixed alongside PSG output, is recognizable as the word “breach” to a listener on first hearing without accompanying text.
  2. During bark playback, all three YM2149 channels continue producing tones and noise without audible artifacts (clips, pops, pitch glitches).
  3. kn86_bark_next_sample() returns 0 with a single branch when no bark is active — zero overhead on the synthesis hot path.
  4. The kn86bark build tool converts a 44.1 kHz WAV to 22 kHz 16-bit PCM and the round-trip (record to convert to play in emulator) is completable in under 5 minutes.
  5. ICE Breaker’s on-event handler can trigger a bark with a single (bark-play "CLEAN") call — no direct Pico protocol manipulation required by cartridge authors.
  6. Six bark files pre-loaded from the cartridge SD card at cart-load time fit within the Pico’s SRAM budget without displacing the audio buffer or OLED framebuffer.

DateAuthorChangeReference
2026-04-13Josh SchairbaumOriginal spec (v0.1) — YM2149 Channel C amplitude DAC model, 4-bit/8 kHz, on-cart flash Bark Table. Status: PROPOSED.
2026-04-27Platform Engineering agentv0.2 rewrite for post-ADR-0017 architecture: replaced YM2149-DAC playback with Pico-side PCM mixer in audio_i2s.c synthesis loop; updated sample format to 16-bit/22 kHz; replaced flash Bark Table with SD filesystem sidecar layout per ADR-0019; added PCM wire protocol surface placeholder; added ADR cross-references; updated Status.GWP-171a; docs/plans/sprints/2026-04-27-sprint4-gwp-171-design.md