feat: Enable smart barge-in with AEC for seamless voice interaction, allowing users to naturally interrupt the AI's TTS playback by speaking.#4171
Open
JedLee6 wants to merge 5 commits intoalibaba:masterfrom
Conversation
…ted at the bottom of the Voice Chat screen were incorrectly propagating to the Text Chat input field in the background layer. Changes: - Configured the root layout of the Voice Chat fragment (`fragment_voice_chat.xml`) to be clickable (`android:clickable="true"`). - This ensures the Voice Chat overlay consumes all touch events within its bounds, preventing unintended activation of the text input box or keyboard in the underlying activity.
…omatically scroll to the bottom during lengthy AI responses, causing new content to remain off-screen. Changes: - Implemented auto-scroll logic to ensure the latest generated text is always visible. - Replaced the instant jump with a smooth scrolling animation (`smoothScrollToPosition`) to improve the user experience during text streaming.
…allowing users to naturally interrupt the AI's TTS playback by speaking, creating a full-duplex conversational experience mimicking natural human conversation flow. Key Implementation Details: Acoustic Echo Cancellation (AEC): Configured AsrService to use the VOICE_COMMUNICATION audio source, which leverages hardware AEC. This effectively filters out the AI's own voice (TTS output) from the microphone input, preventing the AI from hearing itself and re-triggering the ASR. Real-time Interruption Logic: Implemented an onSpeechDetected callback in AsrService that monitors partial ASR results. As soon as valid user speech is detected (even during TTS playback), an interruption signal is sent immediately. Session State Management: Updated VoiceChatPresenter to keep the microphone active during playback. Upon receiving the onSpeechDetected signal, the interruptCurrentSession() method is triggered to: Halt the current TTS playback instantly. Stop ongoing LLM text generation. Reset internal buffers and state to prioritize processing the new user input.
…es (with good AEC) and older/budget devices (needing software assistance), ensuring clear communication without audio feedback loops. Key Changes: Manual Mute: Added a dedicated mute button in the UI, allowing users to manually disable microphone input at any time. Echo Cancellation Modes: Hardware AEC (Default): Relies on the device's built-in acoustic echo cancellation (VOICE_COMMUNICATION source). Auto-Mute Mode (Software Fallback): Designed for devices with poor hardware AEC. In this mode, the microphone is automatically muted while the AI is speaking (TTS playback) and unmuted when it finishes. This prevents the AI from hearing itself and re-triggering the ASR, though it temporarily disables the "Barge-in" capability. Silence Injection: Updated AsrService to actively fill the audio buffer with zeros when muted (buffer.fill(0)), ensuring absolutely no audio leaks to the ASR engine even if the hardware continues recording. UI Enhancements: Added a toggle switch to change AEC modes and visual indicators for the current mute state.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
😃 Hi, @wangzhaode @Juude . Could you please review and merge the following Pull Request at your convenience? Thanks!
This PR introduces significant improvements to the Voice Chat experience, allowing users to naturally interrupt the AI's TTS playback by speaking, creating a full-duplex conversational experience mimicking natural human conversation flow.
d8ff5fae265109d5945850ebcae7d0aa.mp4
Key Changes:
Testing: