Skip to content

feat: Enable smart barge-in with AEC for seamless voice interaction, allowing users to naturally interrupt the AI's TTS playback by speaking.#4171

Open
JedLee6 wants to merge 5 commits intoalibaba:masterfrom
JedLee6:jedlee/ft/master_260218v3
Open

Conversation

@JedLee6
Copy link

@JedLee6 JedLee6 commented Feb 18, 2026

😃 Hi, @wangzhaode @Juude . Could you please review and merge the following Pull Request at your convenience? Thanks!

This PR introduces significant improvements to the Voice Chat experience, allowing users to naturally interrupt the AI's TTS playback by speaking, creating a full-duplex conversational experience mimicking natural human conversation flow.

Screenshot Video Demo
d8ff5fae265109d5945850ebcae7d0aa.mp4

Key Changes:

  1. Real-time Interruption Logic:
  • Acoustic Echo Cancellation (AEC): Configured AsrService to use the VOICE_COMMUNICATION audio source, which leverages hardware AEC. This effectively filters out the AI's own voice (TTS output) from the microphone input, preventing the AI from hearing itself and re-triggering the ASR.
  • Implemented an onSpeechDetected callback in AsrService that monitors partial ASR results. As soon as valid user speech is detected (even during TTS playback), an interruption signal is sent immediately.
  1. Smart Barge-in with Auto-Mute (Software AEC Fallback):
  • Introduced an "Auto-Mute" mode for devices with poor hardware AEC.
  • Logic: Automatically mutes the microphone while the AI is speaking (TTS playback) and unmutes it immediately after. This prevents the AI's own voice from feeding back into the microphone and triggering a false interruption loop.
  • UI: Added a toggle to switch between "Hardware AEC" (default) and "Auto-Mute" modes, giving users control based on their device performance.
  1. Manual Mute Control:
  • Added a manual mute button to the Voice Chat interface.
  • Implemented strict silence injection in AsrService (buffer.fill(0)) when muted to ensure absolute privacy and prevent accidental ASR triggers.

Testing:

  • Verified "Barge-in" works on devices with good hardware AEC.
  • Verified "Auto-Mute" mode effectively stops self-triggering on devices with poor AEC (at the cost of barge-in during playback).
  • Verified manual mute completely stops audio data transmission.

…ted at the bottom of the Voice Chat screen were incorrectly propagating to the Text Chat input field in the background layer.

Changes:
- Configured the root layout of the Voice Chat fragment (`fragment_voice_chat.xml`) to be clickable (`android:clickable="true"`).
- This ensures the Voice Chat overlay consumes all touch events within its bounds, preventing unintended activation of the text input box or keyboard in the underlying activity.
…omatically scroll to the bottom during lengthy AI responses, causing new content to remain off-screen.

Changes:
- Implemented auto-scroll logic to ensure the latest generated text is always visible.
- Replaced the instant jump with a smooth scrolling animation (`smoothScrollToPosition`) to improve the user experience during text streaming.
…allowing users to naturally interrupt the AI's TTS playback by speaking, creating a full-duplex conversational experience mimicking natural human conversation flow.

Key Implementation Details:
Acoustic Echo Cancellation (AEC): Configured AsrService to use the VOICE_COMMUNICATION audio source, which leverages hardware AEC. This effectively filters out the AI's own voice (TTS output) from the microphone input, preventing the AI from hearing itself and re-triggering the ASR.
Real-time Interruption Logic: Implemented an onSpeechDetected callback in AsrService that monitors partial ASR results. As soon as valid user speech is detected (even during TTS playback), an interruption signal is sent immediately.
Session State Management: Updated VoiceChatPresenter to keep the microphone active during playback. Upon receiving the onSpeechDetected signal, the interruptCurrentSession() method is triggered to: Halt the current TTS playback instantly.
Stop ongoing LLM text generation.
Reset internal buffers and state to prioritize processing the new user input.
…es (with good AEC) and older/budget devices (needing software assistance), ensuring clear communication without audio feedback loops.

Key Changes:
Manual Mute: Added a dedicated mute button in the UI, allowing users to manually disable microphone input at any time.
Echo Cancellation Modes:
Hardware AEC (Default): Relies on the device's built-in acoustic echo cancellation (VOICE_COMMUNICATION source).
Auto-Mute Mode (Software Fallback): Designed for devices with poor hardware AEC. In this mode, the microphone is automatically muted while the AI is speaking (TTS playback) and unmuted when it finishes. This prevents the AI from hearing itself and re-triggering the ASR, though it temporarily disables the "Barge-in" capability.
Silence Injection: Updated AsrService to actively fill the audio buffer with zeros when muted (buffer.fill(0)), ensuring absolutely no audio leaks to the ASR engine even if the hardware continues recording.
UI Enhancements: Added a toggle switch to change AEC modes and visual indicators for the current mute state.
@wangzhaode wangzhaode requested a review from Juude February 23, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant