feat(channels): add voice-loop mode for hands-free audio conversations#3005
feat(channels): add voice-loop mode for hands-free audio conversations#3005rareba wants to merge 5 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note
|
| Cohort / File(s) | Summary |
|---|---|
Core struct & reexports src/channels/traits.rs, src/channels/mod.rs |
Added pub is_voice: Option<bool> to ChannelMessage; introduced and re-exported voice_loop module plus should_reply_as_audio and VoiceLoopConfig. |
Voice loop implementation src/channels/voice_loop.rs |
New module: VoiceLoopConfig (serde/JsonSchema, Default), should_reply_as_audio() decision function, synthesize_response() async placeholder, defaults helpers, and unit tests. |
Channel message initializers src/channels/{bluebubbles,cli,dingtalk,discord,email_channel,github,imessage,irc,lark,linq,matrix,mattermost,napcat,nextcloud_talk,nostr,qq,signal,slack,telegram,wati,whatsapp,whatsapp_web}.rs |
Updated ChannelMessage constructions to include is_voice: None across all parsing/emit paths. |
Config schema & reexports src/config/schema.rs, src/config/mod.rs |
Added voice_loop: VoiceLoopConfig to TelegramConfig, SlackConfig, WhatsAppConfig with serde(default); tests updated to supply voice_loop. |
Onboarding / setup changes src/onboard/wizard.rs, src/onboard/tui.rs, src/integrations/registry.rs |
Channel config initializers updated to include voice_loop: Default::default() for relevant channels. |
Tests & callers updated src/daemon/mod.rs, src/cron/scheduler.rs, src/migration.rs, src/gateway/mod.rs, tests/channel_routing.rs |
Test literals and config instantiations updated to include is_voice: None and voice_loop: Default::default() where applicable. |
Misc. small updates src/cron/scheduler.rs (test), other test scaffolding |
Minor test adjustments to initialize new config/field values. |
Sequence Diagram(s)
sequenceDiagram
participant Channel as Channel Listener
participant Config as VoiceLoopConfig
participant Decision as should_reply_as_audio
participant TTS as synthesize_response
participant Outbound as Outbound Sender
Note over Channel,Decision: incoming message parsed into ChannelMessage (includes is_voice)
Channel->>Config: read channel.voice_loop
Channel->>Decision: should_reply_as_audio(original_message, config, response_text)
Decision-->>Channel: bool (true/false)
alt true
Channel->>TTS: synthesize_response(response_text, config)
TTS-->>Channel: Option<Vec<u8>> (audio payload or None)
alt audio payload
Channel->>Outbound: send audio response
else no audio
Channel->>Outbound: send text response
end
else false
Channel->>Outbound: send text response
end
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
- feat: add WhatsApp and Email channel integrations #37: Modifies channel message construction for WhatsApp/Email and touches channel parsing paths that now include
is_voiceand integratevoice_loopconfig.
Suggested labels
size: M, risk: medium, channel, config: core
Suggested reviewers
- theonlyhennygod
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title 'feat(channels): add voice-loop mode for hands-free audio conversations' clearly summarizes the main feature addition, is concise, and accurately reflects the primary change across all modified files. |
| Description check | ✅ Passed | The PR description covers key requirements: addresses the problem (voice-loop support), explains what changed (VoiceLoopConfig, is_voice field, config fields), identifies non-goals, provides test evidence, and includes risk/rollback assessment. All critical template sections are present. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Post copyable unit tests in a comment
Comment @coderabbitai help to get the list of available commands and usage tips.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/channels/whatsapp_web.rs (1)
593-683:⚠️ Potential issue | 🟠 MajorDon't drop the voice-note bit after transcription.
This closure already knows when the inbound payload came from
msg.audio_message, but the forwardedChannelMessagestill setsis_voice: None. That makes a transcribed voice note look identical to plain text downstream, so the WhatsApp voice-loop can never reply in audio to voice-originated messages. SetSome(true)for the audio path andSome(false)for normal text.Suggested fix
- let trimmed = text.trim(); - let content = if !trimmed.is_empty() { - trimmed.to_string() - } else if let Some(ref tc) = transcription { + let trimmed = text.trim(); + let (content, is_voice) = if !trimmed.is_empty() { + (trimmed.to_string(), false) + } else if let Some(ref tc) = transcription { // Attempt to transcribe audio/voice messages if let Some(ref audio_msg) = msg.audio_message { let duration_secs = audio_msg.seconds.unwrap_or(0) as u64; if duration_secs > tc.max_duration_secs { @@ - Ok(t) if !t.trim().is_empty() => { - format!("[Voice] {}", t.trim()) - } + Ok(t) if !t.trim().is_empty() => { + (format!("[Voice] {}", t.trim()), true) + } Ok(_) => { tracing::info!( "WhatsApp Web: voice transcription \ @@ } else { tracing::debug!( "WhatsApp Web: ignoring non-text/non-audio \ @@ } else { tracing::debug!( "WhatsApp Web: ignoring empty or non-text message \ @@ content, timestamp: chrono::Utc::now().timestamp() as u64, thread_ts: None, - is_voice: None, + is_voice: Some(is_voice), })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/whatsapp_web.rs` around lines 593 - 683, The ChannelMessage sent from this handler always sets is_voice: None, losing the fact that content may have come from msg.audio_message; update the send logic in the block that builds ChannelMessage so that when the content was produced from transcription of msg.audio_message you set is_voice: Some(true), and for plain/text paths set is_voice: Some(false) (e.g. when using trimmed text or the non-audio branches). Ensure this change is applied where ChannelMessage is constructed and sent so downstream logic can distinguish transcribed voice messages.src/channels/slack.rs (1)
641-653:⚠️ Potential issue | 🟠 MajorSlack voice inputs still never reach the new voice-loop gate.
Both inbound paths hard-code
is_voice: None, so downstreamshould_reply_as_audio()will never see a positive Slack voice/audio signal. With the current subtype/text filtering, Slack voice notes will always fall back to text despite this PR advertising Slack voice-loop support. Please plumb Slack audio/file-share detection into theseChannelMessagebuilders and add a regression test for a voice-note event.Also applies to: 991-1003
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/slack.rs` around lines 641 - 653, The ChannelMessage builders (creating ChannelMessage with fields like id, sender, reply_target, content, channel, timestamp, thread_ts via Self::inbound_thread_ts) currently hard-code is_voice: None; update both builders (the shown block and the similar block around the other occurrence) to detect Slack audio/file-share signals (e.g., event subtype file_share or presence of event.files with audio mime types or audio filetypes) and set is_voice to Some(true) when such audio is present (else Some(false) or None per existing enum expectations), so downstream should_reply_as_audio() can see the flag; also add a regression test that constructs a Slack voice-note event and asserts the produced ChannelMessage has is_voice set appropriately and triggers the audio reply path.src/channels/telegram.rs (1)
1925-1936:⚠️ Potential issue | 🟠 MajorSet
is_voiceon the Telegram voice/audio path.This constructor is the only one that represents an actual incoming Telegram voice/audio message, but it still emits
is_voice: None. That means the new voice-loop gate cannot distinguish a transcribed voice note from plain text, so Telegram will never qualify for the audio-reply branch from this path.🐛 Proposed fix
Some(ChannelMessage { id: format!("telegram_{chat_id}_{message_id}"), sender: sender_identity, reply_target, content, channel: "telegram".to_string(), timestamp: std::time::SystemTime::now() .duration_since(std::time::UNIX_EPOCH) .unwrap_or_default() .as_secs(), thread_ts: thread_id, - is_voice: None, + is_voice: Some(true), })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/telegram.rs` around lines 1925 - 1936, The ChannelMessage constructed here for incoming Telegram voice/audio messages sets is_voice to None so the voice-loop gate can't detect audio; update this constructor (the ChannelMessage created in the Telegram voice/audio handling path) to set is_voice: Some(true) instead of None so downstream logic can recognize and route audio messages to the audio-reply branch; ensure the field is_voice on ChannelMessage is populated only for this voice/audio path and leave other constructors unchanged.
🧹 Nitpick comments (4)
src/onboard/tui.rs (1)
2272-2284: Expose Telegram voice-loop in the TUI, or explicitly point users to the manual config step.Line 2283 always writes the new field with defaults, but this file has no onboarding field for it. As a result, TUI-generated Telegram configs can’t opt into voice-loop without a post-setup manual edit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/onboard/tui.rs` around lines 2272 - 2284, The current onboarding TUI always sets TelegramConfig.voice_loop to Default::default() when building channels.telegram, but the TUI has no input for configuring voice_loop so users cannot opt into it; update the onboarding flow to either (A) add a TUI field/bool (e.g., telegram_voice_loop_enabled and any needed params) and populate TelegramConfig.voice_loop from that input when constructing channels.telegram, or (B) only set voice_loop if an existing onboarding value is present (leave it None or omit the field) and update the TUI/README to point users to the manual configuration step; locate the TelegramConfig construction in the function that sets channels.telegram and wire it to the new onboarding field (or conditional assignment) so voice_loop is user-configurable.tests/channel_routing.rs (1)
90-110: Assertis_voicein the clone coverage.This test no longer verifies “all fields” after adding
ChannelMessage::is_voice, so a regression in the new voice-loop metadata would still pass here.🧪 Suggested assertion
assert_eq!(cloned.reply_target, original.reply_target); assert_eq!(cloned.content, original.content); assert_eq!(cloned.channel, original.channel); assert_eq!(cloned.timestamp, original.timestamp); + assert_eq!(cloned.is_voice, original.is_voice);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/channel_routing.rs` around lines 90 - 110, The test channel_message_preserves_all_fields_on_clone no longer checks the newly added ChannelMessage::is_voice (and optionally thread_ts), so update the test to assert those fields are preserved after cloning: in the channel_message_preserves_all_fields_on_clone function, set original.is_voice (and original.thread_ts if relevant) to a non-None value and add assert_eq!(cloned.is_voice, original.is_voice) (and assert_eq!(cloned.thread_ts, original.thread_ts)) alongside the existing assertions to ensure clone() preserves these fields.src/channels/mod.rs (1)
43-43: Keepvoice_loopprivate and re-export only the stable API.
pub mod voice_loop;is redundant once Line 75 re-exports the intended entry points, and it also makes everypubitem insrc/channels/voice_loop.rspart of thechannelsAPI surface.mod voice_loop;keeps placeholder/helper internals hidden until they need to be public.Proposed change
-pub mod voice_loop; +mod voice_loop;As per coding guidelines "Deny-by-default for access and exposure boundaries; never silently broaden permissions/capabilities".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/mod.rs` at line 43, Change the module declaration from a public module to a private one by replacing "pub mod voice_loop;" with "mod voice_loop;". Keep the existing top-level re-exports (the "pub use ... voice_loop::{...}" entries) as the only public surface so helper items in voice_loop remain hidden; if any item currently relied on being pub within voice_loop, make it private and add an explicit pub re-export in the existing "pub use self::voice_loop::{...}" list to expose only the stable API.src/channels/voice_loop.rs (1)
64-73: Consider using character count instead of byte length formax_tts_lengthcheck.
response_text.len()returns byte length, not character count. For UTF-8 strings with multibyte characters (e.g., emoji, non-ASCII text), the byte count can significantly exceed the character count. TTS providers typically bill and limit based on character count.This could cause valid short messages to be rejected when they contain multibyte characters.
♻️ Proposed fix using character count
pub fn should_reply_as_audio( original_message: &ChannelMessage, voice_config: &VoiceLoopConfig, response_text: &str, ) -> bool { voice_config.enabled && original_message.is_voice.unwrap_or(false) - && response_text.len() <= voice_config.max_tts_length + && response_text.chars().count() <= voice_config.max_tts_length }If byte-based limits are intentional, consider renaming the field to
max_tts_bytesand updating the doc comment for clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/voice_loop.rs` around lines 64 - 73, The check in should_reply_as_audio uses response_text.len() which returns byte length; change it to compare the character count (e.g., response_text.chars().count()) against voice_config.max_tts_length so multibyte characters (emoji/non-ASCII) are measured correctly; update the use in should_reply_as_audio to use chars().count() and, if you intended byte-based limits instead, rename voice_config.max_tts_length to max_tts_bytes and update its doc comment to clarify bytes vs characters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/channels/mod.rs`:
- Line 75: The TTS length gate uses byte length rather than character count,
causing CJK/emoji-heavy responses to be treated as too long; update
should_reply_as_audio in voice_loop.rs to compare the character count of
response_text (e.g., response_text.chars().count()) against
VoiceLoopConfig::max_tts_length (or the max_tts_length field) instead of using
response_text.len(), ensuring the comparison uses characters/unicode scalars
rather than UTF-8 bytes.
In `@src/channels/whatsapp.rs`:
- Line 145: The ChannelMessage produced in the WhatsApp handler sets is_voice to
None, preventing the voice-loop from ever activating; update
parse_webhook_payload()/the WhatsApp branch that constructs ChannelMessage to
detect WhatsApp voice/audio messages (e.g., inspect the incoming message type
field or presence of an audio/voice URL/ID) and set is_voice: Some(true) for
those cases (and Some(false) for explicit text), ensuring non-text branches that
represent voice media populate ChannelMessage::is_voice so the voice-loop path
can trigger.
In `@src/config/schema.rs`:
- Around line 4961-4963: Update the documentation comment for the public field
voice_loop (type VoiceLoopConfig) to explicitly state the config contract: that
this key is optional and defaults to the crate::channels::voice_loop default
(opt-in/opt-out behavior), what behavior is expected when omitted
(backward-compatible/no-op), the compatibility and rollback guidance (how to
remove or revert the key safely), and the current temporary fallback (text
response only while TTS synthesis is stubbed) instead of claiming “auto-reply
with audio”; apply the same expanded doc-comment pattern to the other
VoiceLoopConfig occurrences noted elsewhere in this module so the schema exposes
defaults, compatibility impact, migration/rollback instructions, and the
intended text fallback until TTS is implemented.
- Around line 6752-6754: The calls to sync_directory are missing .await on Unix:
update the two call sites that reference sync_directory(marker_root) (the one
shown and the other around where marker_root is used) to use the cfg-split
pattern used elsewhere in this file and invoke sync_directory(...).await under
#[cfg(unix)] while keeping the synchronous no-op or blocking call under
#[cfg(not(unix))]; specifically modify the blocks that call sync_directory to
match the existing pattern used near the other sync_directory usages so the
async function (sync_directory) is awaited on Unix builds.
---
Outside diff comments:
In `@src/channels/slack.rs`:
- Around line 641-653: The ChannelMessage builders (creating ChannelMessage with
fields like id, sender, reply_target, content, channel, timestamp, thread_ts via
Self::inbound_thread_ts) currently hard-code is_voice: None; update both
builders (the shown block and the similar block around the other occurrence) to
detect Slack audio/file-share signals (e.g., event subtype file_share or
presence of event.files with audio mime types or audio filetypes) and set
is_voice to Some(true) when such audio is present (else Some(false) or None per
existing enum expectations), so downstream should_reply_as_audio() can see the
flag; also add a regression test that constructs a Slack voice-note event and
asserts the produced ChannelMessage has is_voice set appropriately and triggers
the audio reply path.
In `@src/channels/telegram.rs`:
- Around line 1925-1936: The ChannelMessage constructed here for incoming
Telegram voice/audio messages sets is_voice to None so the voice-loop gate can't
detect audio; update this constructor (the ChannelMessage created in the
Telegram voice/audio handling path) to set is_voice: Some(true) instead of None
so downstream logic can recognize and route audio messages to the audio-reply
branch; ensure the field is_voice on ChannelMessage is populated only for this
voice/audio path and leave other constructors unchanged.
In `@src/channels/whatsapp_web.rs`:
- Around line 593-683: The ChannelMessage sent from this handler always sets
is_voice: None, losing the fact that content may have come from
msg.audio_message; update the send logic in the block that builds ChannelMessage
so that when the content was produced from transcription of msg.audio_message
you set is_voice: Some(true), and for plain/text paths set is_voice: Some(false)
(e.g. when using trimmed text or the non-audio branches). Ensure this change is
applied where ChannelMessage is constructed and sent so downstream logic can
distinguish transcribed voice messages.
---
Nitpick comments:
In `@src/channels/mod.rs`:
- Line 43: Change the module declaration from a public module to a private one
by replacing "pub mod voice_loop;" with "mod voice_loop;". Keep the existing
top-level re-exports (the "pub use ... voice_loop::{...}" entries) as the only
public surface so helper items in voice_loop remain hidden; if any item
currently relied on being pub within voice_loop, make it private and add an
explicit pub re-export in the existing "pub use self::voice_loop::{...}" list to
expose only the stable API.
In `@src/channels/voice_loop.rs`:
- Around line 64-73: The check in should_reply_as_audio uses response_text.len()
which returns byte length; change it to compare the character count (e.g.,
response_text.chars().count()) against voice_config.max_tts_length so multibyte
characters (emoji/non-ASCII) are measured correctly; update the use in
should_reply_as_audio to use chars().count() and, if you intended byte-based
limits instead, rename voice_config.max_tts_length to max_tts_bytes and update
its doc comment to clarify bytes vs characters.
In `@src/onboard/tui.rs`:
- Around line 2272-2284: The current onboarding TUI always sets
TelegramConfig.voice_loop to Default::default() when building channels.telegram,
but the TUI has no input for configuring voice_loop so users cannot opt into it;
update the onboarding flow to either (A) add a TUI field/bool (e.g.,
telegram_voice_loop_enabled and any needed params) and populate
TelegramConfig.voice_loop from that input when constructing channels.telegram,
or (B) only set voice_loop if an existing onboarding value is present (leave it
None or omit the field) and update the TUI/README to point users to the manual
configuration step; locate the TelegramConfig construction in the function that
sets channels.telegram and wire it to the new onboarding field (or conditional
assignment) so voice_loop is user-configurable.
In `@tests/channel_routing.rs`:
- Around line 90-110: The test channel_message_preserves_all_fields_on_clone no
longer checks the newly added ChannelMessage::is_voice (and optionally
thread_ts), so update the test to assert those fields are preserved after
cloning: in the channel_message_preserves_all_fields_on_clone function, set
original.is_voice (and original.thread_ts if relevant) to a non-None value and
add assert_eq!(cloned.is_voice, original.is_voice) (and
assert_eq!(cloned.thread_ts, original.thread_ts)) alongside the existing
assertions to ensure clone() preserves these fields.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fb112ca8-9fde-41e2-bd43-baa471662b73
📒 Files selected for processing (35)
src/channels/bluebubbles.rssrc/channels/cli.rssrc/channels/dingtalk.rssrc/channels/discord.rssrc/channels/email_channel.rssrc/channels/github.rssrc/channels/imessage.rssrc/channels/irc.rssrc/channels/lark.rssrc/channels/linq.rssrc/channels/matrix.rssrc/channels/mattermost.rssrc/channels/mod.rssrc/channels/napcat.rssrc/channels/nextcloud_talk.rssrc/channels/nostr.rssrc/channels/qq.rssrc/channels/signal.rssrc/channels/slack.rssrc/channels/telegram.rssrc/channels/traits.rssrc/channels/voice_loop.rssrc/channels/wati.rssrc/channels/whatsapp.rssrc/channels/whatsapp_web.rssrc/config/mod.rssrc/config/schema.rssrc/cron/scheduler.rssrc/daemon/mod.rssrc/gateway/mod.rssrc/integrations/registry.rssrc/migration.rssrc/onboard/tui.rssrc/onboard/wizard.rstests/channel_routing.rs
| channel: "whatsapp".to_string(), | ||
| timestamp, | ||
| thread_ts: None, | ||
| is_voice: None, |
There was a problem hiding this comment.
Populate is_voice for WhatsApp voice inputs, or this channel won't enter voice-loop mode.
parse_webhook_payload() only emits ChannelMessage values from the text branch and skips every non-text payload earlier in the function, so this field is effectively always None here. That means the new voice-loop path can never be activated for Cloud API WhatsApp messages.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/channels/whatsapp.rs` at line 145, The ChannelMessage produced in the
WhatsApp handler sets is_voice to None, preventing the voice-loop from ever
activating; update parse_webhook_payload()/the WhatsApp branch that constructs
ChannelMessage to detect WhatsApp voice/audio messages (e.g., inspect the
incoming message type field or presence of an audio/voice URL/ID) and set
is_voice: Some(true) for those cases (and Some(false) for explicit text),
ensuring non-text branches that represent voice media populate
ChannelMessage::is_voice so the voice-loop path can trigger.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/channels/voice_loop.rs (1)
154-167: Minor: Test comment references "bytes" but implementation counts characters.The comment "this text is longer than ten bytes" is slightly misleading since
should_reply_as_audiouseschars().count()for character-based length checking. Consider updating for clarity.📝 Suggested comment fix
#[test] fn should_not_reply_when_text_exceeds_max_length() { let config = VoiceLoopConfig { enabled: true, max_tts_length: 10, ..Default::default() }; let msg = make_message(Some(true)); - assert!(!should_reply_as_audio( - &msg, - &config, - "this text is longer than ten bytes" - )); + // 35 characters, exceeds max_tts_length of 10 + assert!(!should_reply_as_audio( + &msg, + &config, + "this text is longer than ten chars" + )); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/voice_loop.rs` around lines 154 - 167, The test's message text and/or comment is misleading because should_reply_as_audio uses character count (chars().count()) not byte length; update the test in should_not_reply_when_text_exceeds_max_length to use wording like "this text is longer than ten characters" or adjust the message string to demonstrate character length behavior, referencing VoiceLoopConfig.max_tts_length and the should_reply_as_audio function so the test clearly reflects character-based length checking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/config/schema.rs`:
- Around line 4961-4965: Update the doc comment for the public field voice_loop
(crate::channels::voice_loop::VoiceLoopConfig) to explicitly state that not
every voice message will become an audio reply even when wired: responses longer
than voice_loop.max_tts_length will continue to fall back to text, so users
should expect that runtime TTS length gating; apply the same clarifying phrasing
to the other voice_loop documentation blocks referenced in the file to keep the
public config schema contract exact.
---
Nitpick comments:
In `@src/channels/voice_loop.rs`:
- Around line 154-167: The test's message text and/or comment is misleading
because should_reply_as_audio uses character count (chars().count()) not byte
length; update the test in should_not_reply_when_text_exceeds_max_length to use
wording like "this text is longer than ten characters" or adjust the message
string to demonstrate character length behavior, referencing
VoiceLoopConfig.max_tts_length and the should_reply_as_audio function so the
test clearly reflects character-based length checking.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a7ffc7dd-2794-4070-8297-2f1102892fce
📒 Files selected for processing (2)
src/channels/voice_loop.rssrc/config/schema.rs
| /// Voice-loop configuration. Disabled by default (opt-in). When enabled and | ||
| /// TTS is wired, voice messages receive audio replies; until then responses | ||
| /// fall back to text. Safe to omit or remove — no migration needed. | ||
| #[serde(default)] | ||
| pub voice_loop: crate::channels::voice_loop::VoiceLoopConfig, |
There was a problem hiding this comment.
Tighten the voice_loop contract to cover the remaining text-fallback case.
These docs are much better now, but they still imply every voice message becomes an audio reply once the feature is wired. The runtime gate also depends on voice_loop.max_tts_length, so longer replies will continue to fall back to text. Calling that out here would keep the public schema contract exact.
✏️ Suggested wording
- /// Voice-loop configuration. Disabled by default (opt-in). When enabled and
- /// TTS is wired, voice messages receive audio replies; until then responses
- /// fall back to text. Safe to omit or remove — no migration needed.
+ /// Voice-loop configuration. Disabled by default (opt-in). When enabled,
+ /// eligible voice messages may receive audio replies once audio reply
+ /// support is available; replies over `voice_loop.max_tts_length` still
+ /// fall back to text. Today all responses fall back to text. Safe to omit
+ /// or remove — no migration needed.Based on learnings, config/schema changes are a public API contract and intentional fallback behavior should be documented.
Also applies to: 5062-5066, 5273-5277
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/config/schema.rs` around lines 4961 - 4965, Update the doc comment for
the public field voice_loop (crate::channels::voice_loop::VoiceLoopConfig) to
explicitly state that not every voice message will become an audio reply even
when wired: responses longer than voice_loop.max_tts_length will continue to
fall back to text, so users should expect that runtime TTS length gating; apply
the same clarifying phrasing to the other voice_loop documentation blocks
referenced in the file to keep the public config schema contract exact.
|
Thank you for the contribution! We appreciate the effort. We are closing this PR because it introduces a standalone module with no integration into the existing codebase — the new types/functions are not called by any existing code path. ZeroClaw follows a trait-driven architecture where new features must be wired through factory registration and have at least one active caller. To reopen, please:
We welcome smaller, integrated contributions. See CLAUDE.md §7 for playbooks on adding providers, channels, tools, and peripherals. |
|
Review: This PR introduces a standalone module that is not wired into any existing code path. The new types/functions have no callers in the codebase. To make this mergeable:
See CLAUDE.md §7 for playbooks. Recommend closing and resubmitting as smaller, integrated PRs. |
7f535f2 to
2639674
Compare
|
Rebased onto current master as a clean, focused single commit. Changes from previous version:
This is now a focused, reviewable PR that integrates properly with the existing codebase rather than being a standalone mega-patch. |
c6a5826 to
75b45c2
Compare
8b8a64a to
23ea9c5
Compare
Add is_voice: None to all ChannelMessage constructors in slack.rs, channels/mod.rs tests, daemon/mod.rs tests, and integration tests. Add voice_loop field to TelegramConfig constructors in channels/mod.rs and daemon/mod.rs. Fix clippy::large_futures in cron/scheduler.rs and tools/cron_run.rs with Box::pin().
23ea9c5 to
67e7e64
Compare
|
Superseded: reopening from feat/voice-loop branch (corrected prefix per CONTRIBUTING.md). |
Summary
masterVoiceLoopConfigand voice-loop infrastructure module (src/channels/voice_loop.rs) withshould_reply_as_audio()logic andsynthesize_response()placeholder. Addedis_voicefield toChannelMessagefor detecting incoming voice/audio messages. Added backward-compatiblevoice_loopconfig field toTelegramConfig,SlackConfig, andWhatsAppConfig.#[serde(default)].Label Snapshot (required)
risk: mediumsize: Mchannel,configchannel: telegram,channel: slack,channel: whatsappChange Metadata
featurechannelLinked Issue
Supersede Attribution (required when
Supersedes #is used)N/A
Validation Evidence (required)
Commands and result summary:
cargo fmt --all -- --check cargo clippy --all-targets -- -D warnings cargo testVoiceLoopConfigdefaults,should_reply_as_audio()logic, boundary conditions, and deserialization.cargo fmt --allpasses.cargo buildsucceeds.cargo test --lib channels::voice_loop— all 10 tests pass.cargo test --lib channels::traits— all 7 tests pass.cargo test --lib config::tests— all 30 tests pass.Security Impact (required)
Privacy and Data Hygiene (required)
passis_voiceis a boolean flag onlyCompatibility / Migration
voice_loopconfig field on Telegram, Slack, and WhatsApp configsi18n Follow-Through (required when docs or user-facing wording changes)
Human Verification (required)
What was personally validated beyond CI:
Side Effects / Blast Radius (required)
is_voicefield added toChannelMessage), Telegram/Slack/WhatsApp configs, channel traitssynthesize_response()returnsNoneuntil TTS module landsAgent Collaboration Notes (recommended)
AGENTS.md+CONTRIBUTING.md)Rollback Plan (required)
git revert <commit>voice_loop.enabled = false(default) per channel configRisks and Mitigations
is_voicetoChannelMessagetouches many channel filesis_voice: falsedefaults; no behavior change in existing channelssynthesize_response()returnsNoneuntil TTS module PR lands