-
Notifications
You must be signed in to change notification settings - Fork 12.6k
Description
What would you like to be added?
A formatForSpeech() function in packages/core/src/voice/responseFormatter.ts that converts markdown/ANSI-formatted tool output and LLM responses into speech-clean plain text for TTS playback in voice mode.
Required transformations:
- Strip all markdown syntax (bold, italic, code fences, blockquotes, headings, links, lists)
- Strip ANSI escape codes
- Abbreviate deep absolute paths (
/a/b/c/d/e/file.ts:142→…/e/file.ts line 142) - Summarise large JSON blobs as
(JSON object with N keys) - Collapse stack traces to first line +
(and N more frames) - Truncate outputs above ~500 chars with
… (N chars total)suffix
Proposed API:
export function formatForSpeech(
text: string,
options?: { maxLength?: number; pathDepth?: number },
): string;Why is this needed?
The voice mode architecture skeleton (PR #20779) introduced VoiceModeController and the core interfaces. Before any text can be streamed to TTS, it must be converted to speech-clean plain text.
Every existing tool result and LLM response is formatted for visual rendering. A response like:
Error: ENOENT: no such file or directory
at /home/user/project/packages/core/src/tools/file-utils.ts:142:7
would be read aloud as:
"asterisk asterisk Error asterisk asterisk backtick E N O E N T colon no such file or directory backtick greater-than at slash home slash user slash..."
Without this formatter, VoiceModeService cannot produce natural-sounding output. It is a hard prerequisite for the Live API streaming integration.
Additional context
- Architecture skeleton PR: feat(cli): introduce experimental voice mode architecture skeleton #20779
- RFC/Architecture discussion: [RFC] Architecture Proposal: Hands-Free Multimodal Voice Mode (GSoC 2026) #20456
- GSoC 2026 Project 11: Hands-Free Multimodal Voice Mode
Files to create:
- packages/core/src/voice/responseFormatter.ts
- packages/core/src/voice/responseFormatter.test.ts