Skip to content

feat(voice): implement speech-friendly response formatter#20989

Merged
spencer426 merged 18 commits intogoogle-gemini:mainfrom
ayush31010:fix/voice-response-formatter
Mar 10, 2026
Merged

feat(voice): implement speech-friendly response formatter#20989
spencer426 merged 18 commits intogoogle-gemini:mainfrom
ayush31010:fix/voice-response-formatter

Conversation

@ayush31010
Copy link

@ayush31010 ayush31010 commented Mar 3, 2026

Summary

Closes #20985
Related: #20779 (voice mode skeleton), #20456 (RFC)

  • Adds packages/core/src/voice/responseFormatter.ts with formatForSpeech() — a pure-TypeScript utility that converts markdown/ANSI-formatted output into speech-clean plain text for TTS playback in voice mode
  • Adds 33 unit tests covering every transformation and edge case
  • Zero new runtime dependencies

Transformations (applied in order)

Input Speech output
\x1b[31mError\x1b[0m Error
```json\n{...}\n``` (JSON object with N keys)
Multi-frame Node.js stack trace First line + (and N more frames)
**bold**, *italic*, `code` Text only, delimiters stripped
> blockquote, # Heading, [link](url) Text only
/home/user/project/src/tools/file.ts:142 …/src/tools/file.ts line 142
Output > 500 chars Truncated + … (N chars total)

Test plan

npx vitest run packages/core/src/voice/responseFormatter.test.ts
# ✓ 33 tests passed

Adds packages/core/src/voice/responseFormatter.ts with a
formatForSpeech() function that converts markdown/ANSI-formatted tool
output and LLM responses into speech-clean plain text suitable for
TTS playback in voice mode.

Transformations applied (in order):
1. Strip ANSI escape codes (color, bold, cursor movement)
2. Unwrap fenced code blocks; summarise large JSON content as
   "(JSON object with N keys)" / "(JSON array with N items)"
3. Collapse Node.js stack traces to first frame + "(and N more frames)"
4. Strip markdown syntax: bold, italic, inline code, blockquotes,
   headings, links, unordered/ordered list markers
5. Abbreviate deep absolute Unix and Windows paths to last pathDepth
   segments prefixed with "…"; convert ":142" suffixes to "line 142"
6. Normalise whitespace (collapse excess blank lines, trim)
7. Truncate to maxLength with "… (N chars total)" suffix

Public API:
  formatForSpeech(text, options?) → string
  options: { maxLength?, pathDepth?, jsonThreshold? }

All defaults chosen to produce natural-sounding output from typical
tool results without requiring any new runtime dependencies.

Closes google-gemini#20985

Related: google-gemini#20779 (voice mode skeleton), google-gemini#20456 (RFC)
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new utility function, formatForSpeech, designed to prepare complex, formatted text for clear and concise text-to-speech (TTS) playback within the voice mode. It systematically cleans up input by removing visual formatting like ANSI escape codes and markdown, summarizing verbose content such as large JSON objects and stack traces, and abbreviating file paths, ensuring that spoken output is streamlined and easily understandable for an optimal voice experience.

Highlights

  • New Speech-Friendly Response Formatter: Implemented the formatForSpeech() utility in packages/core/src/voice/responseFormatter.ts to convert markdown and ANSI-formatted output into plain text suitable for text-to-speech (TTS) playback.
  • Comprehensive Unit Testing: Added 33 unit tests covering every transformation and edge case for the formatForSpeech utility, ensuring robustness and correctness.
  • Zero New Dependencies: The new utility introduces no additional runtime dependencies, maintaining a lightweight footprint.
  • Key Transformations: The formatter applies several transformations including stripping ANSI escape codes, summarizing large JSON objects, collapsing multi-frame stack traces, stripping various markdown elements (bold, italic, code, blockquotes, headings, links, lists), abbreviating deep file paths, normalizing whitespace, and truncating long outputs.
Changelog
  • packages/core/src/voice/responseFormatter.test.ts
    • Added a new test file to validate the formatForSpeech utility.
    • Included tests for edge cases like empty input and plain text.
    • Verified correct stripping of ANSI color, bold, and cursor codes.
    • Tested markdown stripping for bold, italic, inline code, blockquotes, headings, links, and list markers.
    • Ensured proper unwrapping and summarization of fenced code blocks, including JSON objects and arrays.
    • Validated path abbreviation for Unix and Windows paths, including line number conversion.
    • Confirmed stack trace collapsing functionality.
    • Tested truncation behavior for long outputs.
    • Included tests for whitespace normalization.
    • Provided real-world examples to demonstrate end-to-end cleaning.
  • packages/core/src/voice/responseFormatter.ts
    • Added a new utility file containing the formatForSpeech function.
    • Defined FormatForSpeechOptions interface for configurable maxLength, pathDepth, and jsonThreshold.
    • Implemented regular expressions for identifying and processing ANSI codes, markdown elements, code blocks, stack traces, and file paths.
    • Developed abbreviatePath helper to shorten file paths and format line numbers.
    • Created summariseJson helper to provide concise descriptions for large JSON structures.
    • Implemented the main formatForSpeech logic, applying transformations in a specific order: ANSI stripping, code block processing, stack trace collapsing, markdown stripping, path abbreviation, whitespace normalization, and truncation.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-cli gemini-cli bot added the area/core Issues related to User Interface, OS Support, Core Functionality label Mar 3, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable utility for converting formatted text into a speech-friendly format. While no security vulnerabilities were found, critical bugs were identified in the implementation related to Windows path handling and stack trace collapsing, which can lead to incorrect or malformed output. A less severe issue with markdown parsing for bold/italic text was also noted. Detailed suggestions and code snippets are provided to address these, along with recommendations for additional test cases to prevent future regressions.

Ayush Debnath added 2 commits March 3, 2026 22:48
The WIN_PATH_RE replacement callback previously hardcoded 'C:\' as the
path prefix, so paths on any other drive (D:\, E:\, etc.) were silently
reconstructed as C: paths and abbreviated incorrectly.

Fix: capture the full path (drive letter included) as a single regex
group, matching the pattern already used by UNIX_PATH_RE, and pass it
directly to abbreviatePath() without any manual prefix concatenation.

Also adds two missing test cases identified in review:
- Windows path abbreviation on a non-C drive (D:\...)
- Stack trace collapsing preserves surrounding text before and after
  the trace frames

Addresses review feedback on google-gemini#20989
@gemini-cli gemini-cli bot added priority/p3 Backlog - a good idea but not currently a priority. help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! labels Mar 4, 2026
@spencer426 spencer426 self-requested a review March 7, 2026 04:57
@ayush31010

This comment was marked as spam.

Ayush Debnath added 2 commits March 9, 2026 12:43
Scoped package paths like @google/gemini-cli-core were only matched up
to the @ character, producing broken TTS output. Adding @ to the
character class fixes the match and a test case is added to cover it.
@ayush31010

This comment was marked as spam.

Copy link
Contributor

@mrpmohiburrahman mrpmohiburrahman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ayush31010

This comment was marked as spam.

@ayush31010

This comment was marked as spam.

Copy link
Contributor

@spencer426 spencer426 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Copy link
Contributor

@spencer426 spencer426 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few specific issues with the implementation that need to be addressed.

Copy link
Contributor

@spencer426 spencer426 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Export Rule

- Update copyright year to 2026 in responseFormatter.ts and test file
- Fix BOLD_ITALIC_RE to exclude newlines, preventing cross-line matches
  that could consume list markers before they are stripped
- Fix stack trace collapsing to replace frames in-place (STACK_BLOCK_RE)
  instead of stripping all frames and appending summary at end, which
  was mangling text that followed the trace
- Export formatForSpeech and FormatForSpeechOptions from packages/core index
@ayush31010

This comment was marked as spam.

Copy link
Contributor

@spencer426 spencer426 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

spencer426

This comment was marked as duplicate.

@ayush31010

This comment was marked as spam.

@spencer426 spencer426 enabled auto-merge March 10, 2026 15:15
@spencer426 spencer426 added this pull request to the merge queue Mar 10, 2026
@spencer426 spencer426 removed this pull request from the merge queue due to a manual request Mar 10, 2026
@spencer426 spencer426 self-requested a review March 10, 2026 15:32
Copy link
Contributor

@spencer426 spencer426 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the Polynomial regular expression used on uncontrolled data issue

@ayush31010

This comment was marked as spam.

@ayush31010

This comment was marked as spam.

@spencer426 spencer426 enabled auto-merge March 10, 2026 19:42
Copy link
Contributor

@spencer426 spencer426 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@spencer426 spencer426 added this pull request to the merge queue Mar 10, 2026
Merged via the queue into google-gemini:main with commit 9eae91a Mar 10, 2026
27 checks passed
JaisalJain pushed a commit to JaisalJain/gemini-cli that referenced this pull request Mar 11, 2026
liamhelmer pushed a commit to badal-io/gemini-cli that referenced this pull request Mar 12, 2026
@ayush31010

This comment was marked as spam.

yashodipmore pushed a commit to yashodipmore/geemi-cli that referenced this pull request Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core Issues related to User Interface, OS Support, Core Functionality help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! priority/p3 Backlog - a good idea but not currently a priority.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(voice): implement speech-friendly response formatter for voice mode output

3 participants