feat(cli): introduce experimental voice mode architecture skeleton#20779
feat(cli): introduce experimental voice mode architecture skeleton#20779Sangini-spec wants to merge 1 commit intogoogle-gemini:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request lays the groundwork for an experimental hands-free voice interaction mode within the CLI. It establishes the core architectural components, including interfaces for audio input, speech-to-text, and text-to-speech, along with a controller skeleton to orchestrate their lifecycle. A hidden Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a well-structured skeleton for the experimental voice mode. The changes are cleanly implemented, following the existing experimentalZedIntegration pattern for feature flagging. The new interfaces in packages/core/src/voice/ are well-defined, and the VoiceModeController provides a solid, testable foundation for future development. The changes are additive and don't affect existing functionality. I have one high-severity comment regarding a potential race condition in the VoiceModeController's state management that should be addressed to ensure robustness, aligning with our guidelines on explicit state management for asynchronous operations.
6ae6c9a to
7fbb40e
Compare
|
Thanks for highlighting the potential race condition. I’ve introduced an explicit Please let me know if this aligns better with the project’s async state management guidelines. |
|
Thanks |
|
Thanks for the review! Please let me know if any changes are required. |
|
Thanks for the update |
db1f093 to
5e313f9
Compare
|
Hi there! Thank you for your interest in contributing to Gemini CLI. To ensure we maintain high code quality and focus on our prioritized roadmap, we have updated our contribution policy (see Discussion #17383). We only guarantee review and consideration of pull requests for issues that are explicitly labeled as 'help wanted'. All other community pull requests are subject to closure after 14 days if they do not align with our current focus areas. For this reason, we strongly recommend that contributors only submit pull requests against issues explicitly labeled as 'help-wanted'. This pull request is being closed as it has been open for 14 days without a 'help wanted' designation. We encourage you to find and contribute to existing 'help wanted' issues in our backlog! Thank you for your understanding and for being part of our community! |
Summary
This PR introduces a foundational architecture for a future experimental voice interaction mode. It adds a hidden
--voiceCLI flag, defines clean audio pipeline interfaces (AudioInputProvider,SpeechToTextAdapter,TextToSpeechAdapter), and creates aVoiceModeControllerskeleton — all following the establishedexperimentalZedIntegrationpattern. No real audio I/O is implemented; no dependencies are added; existing behavior is unchanged.Details
Design decisions:
Followed the
experimentalZedIntegrationprecedent exactly — the sameCliArgs→ConfigParams→Configfield → getter →main()dispatch chain. This is the repo's established convention for experimental modes, so reviewers see a familiar shape.Interfaces live in
packages/core/src/voice/, notcli/— the audio contracts are backend concerns. Placing them incorelets future implementations be consumed by thesdk,a2a-server, andclipackages without circular dependencies.--voiceishidden: true— matches the convention for experimental flags (--fake-responses,--record-responses). Does not appear in--helpoutput.Dispatch branch exits cleanly with an informational message rather than silently no-oping. This prevents users from entering a broken state while clearly communicating the flag is recognized but not yet functional.
VoiceModeControlleruses constructor injection for all three providers, making it unit-testable without real audio hardware from day one.Zero new dependencies — the controller imports only the existing
debugLoggerfrom core.Files changed (8 total, 202 lines):
Related Issues
Related to #21216 (Hands-Free Multimodal Voice Mode tracking issue)
How to Validate
1. Flag is accepted without error:
Expected output:
Process exits with code 0.
2. Flag is hidden from help:
Expected:
--voicedoes NOT appear in the output.3. Normal CLI behavior is unchanged:
4. Tests pass:
5. Build is clean:
Pre-Merge Checklist