Skip to content

feat(cli): introduce experimental voice mode architecture skeleton#20779

Closed
Sangini-spec wants to merge 1 commit intogoogle-gemini:mainfrom
Sangini-spec:feat/voice-architecture-skeleton
Closed

feat(cli): introduce experimental voice mode architecture skeleton#20779
Sangini-spec wants to merge 1 commit intogoogle-gemini:mainfrom
Sangini-spec:feat/voice-architecture-skeleton

Conversation

@Sangini-spec
Copy link

@Sangini-spec Sangini-spec commented Mar 1, 2026

Summary

This PR introduces a foundational architecture for a future experimental voice interaction mode. It adds a hidden --voice CLI flag, defines clean audio pipeline interfaces (AudioInputProvider, SpeechToTextAdapter, TextToSpeechAdapter), and creates a VoiceModeController skeleton — all following the established experimentalZedIntegration pattern. No real audio I/O is implemented; no dependencies are added; existing behavior is unchanged.

Details

Design decisions:

  • Followed the experimentalZedIntegration precedent exactly — the same CliArgsConfigParamsConfig field → getter → main() dispatch chain. This is the repo's established convention for experimental modes, so reviewers see a familiar shape.

  • Interfaces live in packages/core/src/voice/, not cli/ — the audio contracts are backend concerns. Placing them in core lets future implementations be consumed by the sdk, a2a-server, and cli packages without circular dependencies.

  • --voice is hidden: true — matches the convention for experimental flags (--fake-responses, --record-responses). Does not appear in --help output.

  • Dispatch branch exits cleanly with an informational message rather than silently no-oping. This prevents users from entering a broken state while clearly communicating the flag is recognized but not yet functional.

  • VoiceModeController uses constructor injection for all three providers, making it unit-testable without real audio hardware from day one.

  • Zero new dependencies — the controller imports only the existing debugLogger from core.

Files changed (8 total, 202 lines):

File Change
packages/core/src/voice/types.ts NEW — AudioInputProvider, SpeechToTextAdapter, TextToSpeechAdapter, AudioChunk, VoiceSessionConfig, VoiceState
packages/core/src/voice/voiceModeController.ts NEW — Lifecycle orchestrator skeleton (start/stop, state machine)
packages/core/src/config/config.ts experimentalVoice param, field, constructor assignment, getter
packages/cli/src/config/config.ts --voice yargs option, experimentalVoice in CliArgs, wired into loadCliConfig()
packages/cli/src/gemini.tsx Voice mode dispatch branch in main()
packages/cli/src/test-utils/mockConfig.ts getExperimentalVoice mock
packages/cli/src/gemini.test.tsx experimentalVoice in CliArgs fixture
packages/cli/src/gemini_cleanup.test.tsx getExperimentalVoice mock

Related Issues

Related to #21216 (Hands-Free Multimodal Voice Mode tracking issue)

How to Validate

1. Flag is accepted without error:

npm run build
node bundle/gemini.js --voice

Expected output:

[experimental] Voice mode is not yet implemented. The --voice flag registers the architectural skeleton only.

Process exits with code 0.

2. Flag is hidden from help:

node bundle/gemini.js --help

Expected: --voice does NOT appear in the output.

3. Normal CLI behavior is unchanged:

node bundle/gemini.js --version

node bundle/gemini.js -p "hello"

npm start

4. Tests pass:

npx vitest run packages/cli/src/gemini.test.tsx
# → 34 passed, 0 failed, 1 skipped

5. Build is clean:

npm run build
# → All packages compile with zero errors

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
    • No user-facing docs needed — flag is hidden/experimental
  • Added/updated tests (if needed)
    • Updated all mock Config objects and CliArgs fixtures so existing tests pass with the new getter
  • Noted breaking changes (if any)
    • None — hidden flag, no API surface change, no dependency additions
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

Note to reviewers: Changes are purely additive (no existing lines deleted) and flag-gated behind --voice which defaults to false. The voice dispatch branch is unreachable in normal usage. All platform validation beyond Windows should be safe given the change is config plumbing + new files with no platform-specific code.

@Sangini-spec Sangini-spec requested a review from a team as a code owner March 1, 2026 19:20
@google-cla
Copy link

google-cla bot commented Mar 1, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for an experimental hands-free voice interaction mode within the CLI. It establishes the core architectural components, including interfaces for audio input, speech-to-text, and text-to-speech, along with a controller skeleton to orchestrate their lifecycle. A hidden --voice CLI flag is introduced to activate this experimental feature, which currently provides an informational message rather than full functionality, ensuring no changes to existing CLI behavior or dependencies.

Highlights

  • Experimental Voice Mode Architecture: Introduced a foundational architecture for a future experimental voice interaction mode.
  • Hidden CLI Flag: Added a hidden --voice CLI flag to enable the experimental voice mode.
  • Audio Pipeline Interfaces: Defined clear interfaces for AudioInputProvider, SpeechToTextAdapter, and TextToSpeechAdapter within the core package.
  • Voice Mode Controller Skeleton: Created a VoiceModeController skeleton to orchestrate the voice mode lifecycle, designed for testability via constructor injection.
  • No Functional Audio I/O: The current implementation includes no real audio input/output, no new dependencies, and does not alter existing CLI behavior.
  • Clean Exit for Experimental Flag: The --voice flag, when used, results in a clean exit with an informational message, preventing broken states.
Changelog
  • .claude/settings.local.json
    • Added permissions for running node bundle/gemini.js --voice.
  • packages/cli/src/config/config.ts
    • Added experimentalVoice property to CliArgs interface.
    • Introduced a hidden --voice option to the CLI argument parser.
    • Wired the experimentalVoice argument into the loadCliConfig function.
  • packages/cli/src/gemini.test.tsx
    • Included experimentalVoice in the CliArgs test fixture.
  • packages/cli/src/gemini.tsx
    • Implemented a conditional branch to handle the experimentalVoice flag, displaying a message and exiting.
  • packages/cli/src/gemini_cleanup.test.tsx
    • Added a mock for getExperimentalVoice in cleanup tests.
  • packages/cli/src/test-utils/mockConfig.ts
    • Added a mock for getExperimentalVoice in the createMockConfig utility.
  • packages/core/src/config/config.ts
    • Added experimentalVoice to ConfigParameters interface.
    • Declared and initialized experimentalVoice as a private class member.
    • Assigned experimentalVoice in the Config class constructor.
    • Implemented a getExperimentalVoice() getter method.
  • packages/core/src/voice/types.ts
    • Created a new file defining core interfaces for audio processing (AudioChunk, AudioInputProvider, SpeechToTextAdapter, TextToSpeechAdapter) and voice session configuration (VoiceSessionConfig, VoiceState enum).
  • packages/core/src/voice/voiceModeController.ts
    • Created a new file implementing the VoiceModeController class, which orchestrates the voice mode lifecycle using injected audio providers.
Activity
  • The author confirmed that relevant documentation was updated, noting that no user-facing docs were needed due to the hidden/experimental nature of the flag.
  • Tests were updated to ensure existing mock Config objects and CliArgs fixtures pass with the new getter.
  • The author explicitly stated that no breaking changes were introduced.
  • Validation was performed on Windows using npm run and npx.
  • A note was provided to reviewers emphasizing the additive and flag-gated nature of the changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-structured skeleton for the experimental voice mode. The changes are cleanly implemented, following the existing experimentalZedIntegration pattern for feature flagging. The new interfaces in packages/core/src/voice/ are well-defined, and the VoiceModeController provides a solid, testable foundation for future development. The changes are additive and don't affect existing functionality. I have one high-severity comment regarding a potential race condition in the VoiceModeController's state management that should be addressed to ensure robustness, aligning with our guidelines on explicit state management for asynchronous operations.

@Sangini-spec Sangini-spec force-pushed the feat/voice-architecture-skeleton branch from 6ae6c9a to 7fbb40e Compare March 1, 2026 19:56
@Sangini-spec
Copy link
Author

Thanks for highlighting the potential race condition.

I’ve introduced an explicit isStopping flag to guard state transitions during async shutdown and updated the finally block accordingly.

Please let me know if this aligns better with the project’s async state management guidelines.

@anowardear062-svg
Copy link

Thanks

@Sangini-spec
Copy link
Author

Thanks for the review!

Please let me know if any changes are required.
Happy to refine the architecture if needed.

@anowardear062-svg
Copy link

Thanks for the update

@gemini-cli gemini-cli bot added the area/core Issues related to User Interface, OS Support, Core Functionality label Mar 5, 2026
@Sangini-spec Sangini-spec force-pushed the feat/voice-architecture-skeleton branch from db1f093 to 5e313f9 Compare March 8, 2026 16:35
@gemini-cli
Copy link
Contributor

gemini-cli bot commented Mar 16, 2026

Hi there! Thank you for your interest in contributing to Gemini CLI.

To ensure we maintain high code quality and focus on our prioritized roadmap, we have updated our contribution policy (see Discussion #17383).

We only guarantee review and consideration of pull requests for issues that are explicitly labeled as 'help wanted'. All other community pull requests are subject to closure after 14 days if they do not align with our current focus areas. For this reason, we strongly recommend that contributors only submit pull requests against issues explicitly labeled as 'help-wanted'.

This pull request is being closed as it has been open for 14 days without a 'help wanted' designation. We encourage you to find and contribute to existing 'help wanted' issues in our backlog! Thank you for your understanding and for being part of our community!

@gemini-cli gemini-cli bot closed this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core Issues related to User Interface, OS Support, Core Functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants