feat(cli): introduce experimental voice mode architecture skeleton by Sangini-spec · Pull Request #20779 · google-gemini/gemini-cli

Sangini-spec · 2026-03-01T19:20:49Z

Summary

This PR introduces a foundational architecture for a future experimental voice interaction mode. It adds a hidden --voice CLI flag, defines clean audio pipeline interfaces (AudioInputProvider, SpeechToTextAdapter, TextToSpeechAdapter), and creates a VoiceModeController skeleton — all following the established experimentalZedIntegration pattern. No real audio I/O is implemented; no dependencies are added; existing behavior is unchanged.

Details

Design decisions:

Followed the experimentalZedIntegration precedent exactly — the same CliArgs → ConfigParams → Config field → getter → main() dispatch chain. This is the repo's established convention for experimental modes, so reviewers see a familiar shape.
Interfaces live in packages/core/src/voice/, not cli/ — the audio contracts are backend concerns. Placing them in core lets future implementations be consumed by the sdk, a2a-server, and cli packages without circular dependencies.
--voice is hidden: true — matches the convention for experimental flags (--fake-responses, --record-responses). Does not appear in --help output.
Dispatch branch exits cleanly with an informational message rather than silently no-oping. This prevents users from entering a broken state while clearly communicating the flag is recognized but not yet functional.
VoiceModeController uses constructor injection for all three providers, making it unit-testable without real audio hardware from day one.
Zero new dependencies — the controller imports only the existing debugLogger from core.

Files changed (8 total, 202 lines):

File	Change
packages/core/src/voice/types.ts	NEW — AudioInputProvider, SpeechToTextAdapter, TextToSpeechAdapter, AudioChunk, VoiceSessionConfig, VoiceState
packages/core/src/voice/voiceModeController.ts	NEW — Lifecycle orchestrator skeleton (start/stop, state machine)
packages/core/src/config/config.ts	experimentalVoice param, field, constructor assignment, getter
packages/cli/src/config/config.ts	--voice yargs option, experimentalVoice in CliArgs, wired into loadCliConfig()
packages/cli/src/gemini.tsx	Voice mode dispatch branch in main()
packages/cli/src/test-utils/mockConfig.ts	getExperimentalVoice mock
packages/cli/src/gemini.test.tsx	experimentalVoice in CliArgs fixture
packages/cli/src/gemini_cleanup.test.tsx	getExperimentalVoice mock

Related Issues

Related to #21216 (Hands-Free Multimodal Voice Mode tracking issue)

How to Validate

1. Flag is accepted without error:

npm run build
node bundle/gemini.js --voice

Expected output:

[experimental] Voice mode is not yet implemented. The --voice flag registers the architectural skeleton only.

Process exits with code 0.

2. Flag is hidden from help:

node bundle/gemini.js --help

Expected: --voice does NOT appear in the output.

3. Normal CLI behavior is unchanged:

node bundle/gemini.js --version node bundle/gemini.js -p "hello" npm start

4. Tests pass:

npx vitest run packages/cli/src/gemini.test.tsx
# → 34 passed, 0 failed, 1 skipped

5. Build is clean:

npm run build
# → All packages compile with zero errors

Pre-Merge Checklist

Note to reviewers: Changes are purely additive (no existing lines deleted) and flag-gated behind --voice which defaults to false. The voice dispatch branch is unreachable in normal usage. All platform validation beyond Windows should be safe given the change is config plumbing + new files with no platform-specific code.

google-cla · 2026-03-01T19:20:54Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

gemini-code-assist · 2026-03-01T19:21:11Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for an experimental hands-free voice interaction mode within the CLI. It establishes the core architectural components, including interfaces for audio input, speech-to-text, and text-to-speech, along with a controller skeleton to orchestrate their lifecycle. A hidden --voice CLI flag is introduced to activate this experimental feature, which currently provides an informational message rather than full functionality, ensuring no changes to existing CLI behavior or dependencies.

Highlights

Experimental Voice Mode Architecture: Introduced a foundational architecture for a future experimental voice interaction mode.
Hidden CLI Flag: Added a hidden --voice CLI flag to enable the experimental voice mode.
Audio Pipeline Interfaces: Defined clear interfaces for AudioInputProvider, SpeechToTextAdapter, and TextToSpeechAdapter within the core package.
Voice Mode Controller Skeleton: Created a VoiceModeController skeleton to orchestrate the voice mode lifecycle, designed for testability via constructor injection.
No Functional Audio I/O: The current implementation includes no real audio input/output, no new dependencies, and does not alter existing CLI behavior.
Clean Exit for Experimental Flag: The --voice flag, when used, results in a clean exit with an informational message, preventing broken states.

Changelog

.claude/settings.local.json
- Added permissions for running node bundle/gemini.js --voice.
packages/cli/src/config/config.ts
- Added experimentalVoice property to CliArgs interface.
- Introduced a hidden --voice option to the CLI argument parser.
- Wired the experimentalVoice argument into the loadCliConfig function.
packages/cli/src/gemini.test.tsx
- Included experimentalVoice in the CliArgs test fixture.
packages/cli/src/gemini.tsx
- Implemented a conditional branch to handle the experimentalVoice flag, displaying a message and exiting.
packages/cli/src/gemini_cleanup.test.tsx
- Added a mock for getExperimentalVoice in cleanup tests.
packages/cli/src/test-utils/mockConfig.ts
- Added a mock for getExperimentalVoice in the createMockConfig utility.
packages/core/src/config/config.ts
- Added experimentalVoice to ConfigParameters interface.
- Declared and initialized experimentalVoice as a private class member.
- Assigned experimentalVoice in the Config class constructor.
- Implemented a getExperimentalVoice() getter method.
packages/core/src/voice/types.ts
- Created a new file defining core interfaces for audio processing (AudioChunk, AudioInputProvider, SpeechToTextAdapter, TextToSpeechAdapter) and voice session configuration (VoiceSessionConfig, VoiceState enum).
packages/core/src/voice/voiceModeController.ts
- Created a new file implementing the VoiceModeController class, which orchestrates the voice mode lifecycle using injected audio providers.

Activity

The author confirmed that relevant documentation was updated, noting that no user-facing docs were needed due to the hidden/experimental nature of the flag.
Tests were updated to ensure existing mock Config objects and CliArgs fixtures pass with the new getter.
The author explicitly stated that no breaking changes were introduced.
Validation was performed on Windows using npm run and npx.
A note was provided to reviewers emphasizing the additive and flag-gated nature of the changes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a well-structured skeleton for the experimental voice mode. The changes are cleanly implemented, following the existing experimentalZedIntegration pattern for feature flagging. The new interfaces in packages/core/src/voice/ are well-defined, and the VoiceModeController provides a solid, testable foundation for future development. The changes are additive and don't affect existing functionality. I have one high-severity comment regarding a potential race condition in the VoiceModeController's state management that should be addressed to ensure robustness, aligning with our guidelines on explicit state management for asynchronous operations.

packages/core/src/voice/voiceModeController.ts

Sangini-spec · 2026-03-01T20:00:25Z

Thanks for highlighting the potential race condition.

I’ve introduced an explicit isStopping flag to guard state transitions during async shutdown and updated the finally block accordingly.

Please let me know if this aligns better with the project’s async state management guidelines.

anowardear062-svg · 2026-03-04T13:44:10Z

Thanks

Sangini-spec · 2026-03-04T14:03:43Z

Thanks for the review!

Please let me know if any changes are required.
Happy to refine the architecture if needed.

anowardear062-svg · 2026-03-04T14:11:37Z

Thanks for the update

gemini-cli · 2026-03-16T03:07:49Z

Hi there! Thank you for your interest in contributing to Gemini CLI.

To ensure we maintain high code quality and focus on our prioritized roadmap, we have updated our contribution policy (see Discussion #17383).

We only guarantee review and consideration of pull requests for issues that are explicitly labeled as 'help wanted'. All other community pull requests are subject to closure after 14 days if they do not align with our current focus areas. For this reason, we strongly recommend that contributors only submit pull requests against issues explicitly labeled as 'help-wanted'.

This pull request is being closed as it has been open for 14 days without a 'help wanted' designation. We encourage you to find and contribute to existing 'help wanted' issues in our backlog! Thank you for your understanding and for being part of our community!

Sangini-spec requested a review from a team as a code owner March 1, 2026 19:20

gemini-code-assist bot reviewed Mar 1, 2026

View reviewed changes

packages/core/src/voice/voiceModeController.ts Show resolved Hide resolved

Sangini-spec force-pushed the feat/voice-architecture-skeleton branch from 6ae6c9a to 7fbb40e Compare March 1, 2026 19:56

gemini-code-assist bot mentioned this pull request Mar 3, 2026

feat(voice): implement speech-friendly response formatter #20989

Merged

anowardear062-svg approved these changes Mar 4, 2026

View reviewed changes

gemini-cli bot added the area/core Issues related to User Interface, OS Support, Core Functionality label Mar 5, 2026

feat(cli): introduce experimental voice mode architecture skeleton

5e313f9

Sangini-spec force-pushed the feat/voice-architecture-skeleton branch from db1f093 to 5e313f9 Compare March 8, 2026 16:35

gemini-cli bot closed this Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): introduce experimental voice mode architecture skeleton#20779

feat(cli): introduce experimental voice mode architecture skeleton#20779
Sangini-spec wants to merge 1 commit intogoogle-gemini:mainfrom
Sangini-spec:feat/voice-architecture-skeleton

Sangini-spec commented Mar 1, 2026 •

edited

Loading

Uh oh!

google-cla bot commented Mar 1, 2026

Uh oh!

gemini-code-assist bot commented Mar 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Sangini-spec commented Mar 1, 2026

Uh oh!

anowardear062-svg commented Mar 4, 2026

Uh oh!

Sangini-spec commented Mar 4, 2026

Uh oh!

anowardear062-svg commented Mar 4, 2026

Uh oh!

gemini-cli bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sangini-spec commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Related Issues

How to Validate

Pre-Merge Checklist

Uh oh!

google-cla bot commented Mar 1, 2026

Uh oh!

gemini-code-assist bot commented Mar 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Sangini-spec commented Mar 1, 2026

Uh oh!

anowardear062-svg commented Mar 4, 2026

Uh oh!

Sangini-spec commented Mar 4, 2026

Uh oh!

anowardear062-svg commented Mar 4, 2026

Uh oh!

gemini-cli bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sangini-spec commented Mar 1, 2026 •

edited

Loading