Skip to content

feat(channels): add voice-loop mode for hands-free audio conversations#3005

Closed
rareba wants to merge 5 commits into
zeroclaw-labs:masterfrom
rareba:feature/voice-loop
Closed

feat(channels): add voice-loop mode for hands-free audio conversations#3005
rareba wants to merge 5 commits into
zeroclaw-labs:masterfrom
rareba:feature/voice-loop

Conversation

@rareba
Copy link
Copy Markdown
Contributor

@rareba rareba commented Mar 8, 2026

Summary

  • Base branch target: master
  • Problem: ZeroClaw channels had no infrastructure for responding to voice messages with audio, requiring all responses to be text-only regardless of input modality.
  • Why it matters: Hands-free and voice-first use cases (e.g., WhatsApp voice notes, Telegram voice messages) benefit from audio responses, enabling more natural conversational flows.
  • What changed: Added VoiceLoopConfig and voice-loop infrastructure module (src/channels/voice_loop.rs) with should_reply_as_audio() logic and synthesize_response() placeholder. Added is_voice field to ChannelMessage for detecting incoming voice/audio messages. Added backward-compatible voice_loop config field to TelegramConfig, SlackConfig, and WhatsAppConfig.
  • What did not change (scope boundary): No actual TTS synthesis (placeholder until TTS module PR lands). No channel-specific audio sending logic (future PR). All existing channel behavior unchanged — new fields use #[serde(default)].

Label Snapshot (required)

  • Risk label: risk: medium
  • Size label: size: M
  • Scope labels: channel, config
  • Module labels: channel: telegram, channel: slack, channel: whatsapp
  • Contributor tier label: (auto-managed)
  • If any auto-label is incorrect, note requested correction: N/A

Change Metadata

  • Change type: feature
  • Primary scope: channel

Linked Issue

  • Closes #
  • Related #
  • Depends on #
  • Supersedes #

Supersede Attribution (required when Supersedes # is used)

N/A

Validation Evidence (required)

Commands and result summary:

cargo fmt --all -- --check
cargo clippy --all-targets -- -D warnings
cargo test
  • Evidence provided: 10 unit tests for VoiceLoopConfig defaults, should_reply_as_audio() logic, boundary conditions, and deserialization. cargo fmt --all passes. cargo build succeeds. cargo test --lib channels::voice_loop — all 10 tests pass. cargo test --lib channels::traits — all 7 tests pass. cargo test --lib config::tests — all 30 tests pass.
  • If any command is intentionally skipped, explain why: Only pre-existing build errors remain.

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No

Privacy and Data Hygiene (required)

  • Data-hygiene status: pass
  • Redaction/anonymization notes: No personal data introduced; is_voice is a boolean flag only
  • Neutral wording confirmation: Confirmed

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? Yes — new optional voice_loop config field on Telegram, Slack, and WhatsApp configs
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

i18n Follow-Through (required when docs or user-facing wording changes)

  • i18n follow-through triggered? No

Human Verification (required)

What was personally validated beyond CI:

  • Verified scenarios: VoiceLoopConfig defaults, should_reply_as_audio with voice-loop enabled/disabled, is_voice true/false, text length within/exceeding max TTS length
  • Edge cases checked: Boundary conditions for max TTS length, deserialization of configs without voice_loop field (backward compat)
  • What was not verified: End-to-end voice message flow (TTS not yet implemented), channel-specific audio sending

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: All channel implementations (minor — is_voice field added to ChannelMessage), Telegram/Slack/WhatsApp configs, channel traits
  • Potential unintended effects: None — all new fields default to disabled/false; existing message construction is unaffected
  • Guardrails/monitoring for early detection: Voice-loop is opt-in per channel; synthesize_response() returns None until TTS module lands

Agent Collaboration Notes (recommended)

  • Agent tools used: Claude Code
  • Workflow/plan summary: Infrastructure-first approach — lay groundwork for voice-loop before TTS integration
  • Verification focus: Backward compatibility of config changes, correctness of should_reply_as_audio logic
  • Confirmation: naming + architecture boundaries followed (AGENTS.md + CONTRIBUTING.md)

Rollback Plan (required)

  • Fast rollback command/path: git revert <commit>
  • Feature flags or config toggles: voice_loop.enabled = false (default) per channel config
  • Observable failure symptoms: Unexpected audio response attempts, config deserialization failures (unlikely due to serde defaults)

Risks and Mitigations

  • Risk: Adding is_voice to ChannelMessage touches many channel files
    • Mitigation: All additions are is_voice: false defaults; no behavior change in existing channels
  • Risk: TTS placeholder may confuse contributors expecting working synthesis
    • Mitigation: Clear documentation that synthesize_response() returns None until TTS module PR lands

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 8, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)
  • master

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1e00bfcc-e8d3-4c17-bc1c-b610c225c6c6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'tools', 'path_filters', 'review_instructions'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

Adds a new optional is_voice: Option<bool> to ChannelMessage, a new voice_loop module with TTS decision/config (VoiceLoopConfig, should_reply_as_audio, synthesize_response), and wires voice_loop config into channel and onboarding/config structs; channels initialize is_voice: None.

Changes

Cohort / File(s) Summary
Core struct & reexports
src/channels/traits.rs, src/channels/mod.rs
Added pub is_voice: Option<bool> to ChannelMessage; introduced and re-exported voice_loop module plus should_reply_as_audio and VoiceLoopConfig.
Voice loop implementation
src/channels/voice_loop.rs
New module: VoiceLoopConfig (serde/JsonSchema, Default), should_reply_as_audio() decision function, synthesize_response() async placeholder, defaults helpers, and unit tests.
Channel message initializers
src/channels/{bluebubbles,cli,dingtalk,discord,email_channel,github,imessage,irc,lark,linq,matrix,mattermost,napcat,nextcloud_talk,nostr,qq,signal,slack,telegram,wati,whatsapp,whatsapp_web}.rs
Updated ChannelMessage constructions to include is_voice: None across all parsing/emit paths.
Config schema & reexports
src/config/schema.rs, src/config/mod.rs
Added voice_loop: VoiceLoopConfig to TelegramConfig, SlackConfig, WhatsAppConfig with serde(default); tests updated to supply voice_loop.
Onboarding / setup changes
src/onboard/wizard.rs, src/onboard/tui.rs, src/integrations/registry.rs
Channel config initializers updated to include voice_loop: Default::default() for relevant channels.
Tests & callers updated
src/daemon/mod.rs, src/cron/scheduler.rs, src/migration.rs, src/gateway/mod.rs, tests/channel_routing.rs
Test literals and config instantiations updated to include is_voice: None and voice_loop: Default::default() where applicable.
Misc. small updates
src/cron/scheduler.rs (test), other test scaffolding
Minor test adjustments to initialize new config/field values.

Sequence Diagram(s)

sequenceDiagram
  participant Channel as Channel Listener
  participant Config as VoiceLoopConfig
  participant Decision as should_reply_as_audio
  participant TTS as synthesize_response
  participant Outbound as Outbound Sender

  Note over Channel,Decision: incoming message parsed into ChannelMessage (includes is_voice)
  Channel->>Config: read channel.voice_loop
  Channel->>Decision: should_reply_as_audio(original_message, config, response_text)
  Decision-->>Channel: bool (true/false)
  alt true
    Channel->>TTS: synthesize_response(response_text, config)
    TTS-->>Channel: Option<Vec<u8>> (audio payload or None)
    alt audio payload
      Channel->>Outbound: send audio response
    else no audio
      Channel->>Outbound: send text response
    end
  else false
    Channel->>Outbound: send text response
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

size: M, risk: medium, channel, config: core

Suggested reviewers

  • theonlyhennygod
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(channels): add voice-loop mode for hands-free audio conversations' clearly summarizes the main feature addition, is concise, and accurately reflects the primary change across all modified files.
Description check ✅ Passed The PR description covers key requirements: addresses the problem (voice-loop support), explains what changed (VoiceLoopConfig, is_voice field, config fields), identifies non-goals, provides test evidence, and includes risk/rollback assessment. All critical template sections are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/channels/whatsapp_web.rs (1)

593-683: ⚠️ Potential issue | 🟠 Major

Don't drop the voice-note bit after transcription.

This closure already knows when the inbound payload came from msg.audio_message, but the forwarded ChannelMessage still sets is_voice: None. That makes a transcribed voice note look identical to plain text downstream, so the WhatsApp voice-loop can never reply in audio to voice-originated messages. Set Some(true) for the audio path and Some(false) for normal text.

Suggested fix
-                                let trimmed = text.trim();
-                                let content = if !trimmed.is_empty() {
-                                    trimmed.to_string()
-                                } else if let Some(ref tc) = transcription {
+                                let trimmed = text.trim();
+                                let (content, is_voice) = if !trimmed.is_empty() {
+                                    (trimmed.to_string(), false)
+                                } else if let Some(ref tc) = transcription {
                                     // Attempt to transcribe audio/voice messages
                                     if let Some(ref audio_msg) = msg.audio_message {
                                         let duration_secs =
                                             audio_msg.seconds.unwrap_or(0) as u64;
                                         if duration_secs > tc.max_duration_secs {
@@
-                                                    Ok(t) if !t.trim().is_empty() => {
-                                                        format!("[Voice] {}", t.trim())
-                                                    }
+                                                    Ok(t) if !t.trim().is_empty() => {
+                                                        (format!("[Voice] {}", t.trim()), true)
+                                                    }
                                                     Ok(_) => {
                                                         tracing::info!(
                                                             "WhatsApp Web: voice transcription \
@@
                                     } else {
                                         tracing::debug!(
                                             "WhatsApp Web: ignoring non-text/non-audio \
@@
                                 } else {
                                     tracing::debug!(
                                         "WhatsApp Web: ignoring empty or non-text message \
@@
                                         content,
                                         timestamp: chrono::Utc::now().timestamp() as u64,
                                         thread_ts: None,
-                                        is_voice: None,
+                                        is_voice: Some(is_voice),
                                     })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/whatsapp_web.rs` around lines 593 - 683, The ChannelMessage sent
from this handler always sets is_voice: None, losing the fact that content may
have come from msg.audio_message; update the send logic in the block that builds
ChannelMessage so that when the content was produced from transcription of
msg.audio_message you set is_voice: Some(true), and for plain/text paths set
is_voice: Some(false) (e.g. when using trimmed text or the non-audio branches).
Ensure this change is applied where ChannelMessage is constructed and sent so
downstream logic can distinguish transcribed voice messages.
src/channels/slack.rs (1)

641-653: ⚠️ Potential issue | 🟠 Major

Slack voice inputs still never reach the new voice-loop gate.

Both inbound paths hard-code is_voice: None, so downstream should_reply_as_audio() will never see a positive Slack voice/audio signal. With the current subtype/text filtering, Slack voice notes will always fall back to text despite this PR advertising Slack voice-loop support. Please plumb Slack audio/file-share detection into these ChannelMessage builders and add a regression test for a voice-note event.

Also applies to: 991-1003

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/slack.rs` around lines 641 - 653, The ChannelMessage builders
(creating ChannelMessage with fields like id, sender, reply_target, content,
channel, timestamp, thread_ts via Self::inbound_thread_ts) currently hard-code
is_voice: None; update both builders (the shown block and the similar block
around the other occurrence) to detect Slack audio/file-share signals (e.g.,
event subtype file_share or presence of event.files with audio mime types or
audio filetypes) and set is_voice to Some(true) when such audio is present (else
Some(false) or None per existing enum expectations), so downstream
should_reply_as_audio() can see the flag; also add a regression test that
constructs a Slack voice-note event and asserts the produced ChannelMessage has
is_voice set appropriately and triggers the audio reply path.
src/channels/telegram.rs (1)

1925-1936: ⚠️ Potential issue | 🟠 Major

Set is_voice on the Telegram voice/audio path.

This constructor is the only one that represents an actual incoming Telegram voice/audio message, but it still emits is_voice: None. That means the new voice-loop gate cannot distinguish a transcribed voice note from plain text, so Telegram will never qualify for the audio-reply branch from this path.

🐛 Proposed fix
         Some(ChannelMessage {
             id: format!("telegram_{chat_id}_{message_id}"),
             sender: sender_identity,
             reply_target,
             content,
             channel: "telegram".to_string(),
             timestamp: std::time::SystemTime::now()
                 .duration_since(std::time::UNIX_EPOCH)
                 .unwrap_or_default()
                 .as_secs(),
             thread_ts: thread_id,
-            is_voice: None,
+            is_voice: Some(true),
         })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/telegram.rs` around lines 1925 - 1936, The ChannelMessage
constructed here for incoming Telegram voice/audio messages sets is_voice to
None so the voice-loop gate can't detect audio; update this constructor (the
ChannelMessage created in the Telegram voice/audio handling path) to set
is_voice: Some(true) instead of None so downstream logic can recognize and route
audio messages to the audio-reply branch; ensure the field is_voice on
ChannelMessage is populated only for this voice/audio path and leave other
constructors unchanged.
🧹 Nitpick comments (4)
src/onboard/tui.rs (1)

2272-2284: Expose Telegram voice-loop in the TUI, or explicitly point users to the manual config step.

Line 2283 always writes the new field with defaults, but this file has no onboarding field for it. As a result, TUI-generated Telegram configs can’t opt into voice-loop without a post-setup manual edit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/onboard/tui.rs` around lines 2272 - 2284, The current onboarding TUI
always sets TelegramConfig.voice_loop to Default::default() when building
channels.telegram, but the TUI has no input for configuring voice_loop so users
cannot opt into it; update the onboarding flow to either (A) add a TUI
field/bool (e.g., telegram_voice_loop_enabled and any needed params) and
populate TelegramConfig.voice_loop from that input when constructing
channels.telegram, or (B) only set voice_loop if an existing onboarding value is
present (leave it None or omit the field) and update the TUI/README to point
users to the manual configuration step; locate the TelegramConfig construction
in the function that sets channels.telegram and wire it to the new onboarding
field (or conditional assignment) so voice_loop is user-configurable.
tests/channel_routing.rs (1)

90-110: Assert is_voice in the clone coverage.

This test no longer verifies “all fields” after adding ChannelMessage::is_voice, so a regression in the new voice-loop metadata would still pass here.

🧪 Suggested assertion
     assert_eq!(cloned.reply_target, original.reply_target);
     assert_eq!(cloned.content, original.content);
     assert_eq!(cloned.channel, original.channel);
     assert_eq!(cloned.timestamp, original.timestamp);
+    assert_eq!(cloned.is_voice, original.is_voice);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/channel_routing.rs` around lines 90 - 110, The test
channel_message_preserves_all_fields_on_clone no longer checks the newly added
ChannelMessage::is_voice (and optionally thread_ts), so update the test to
assert those fields are preserved after cloning: in the
channel_message_preserves_all_fields_on_clone function, set original.is_voice
(and original.thread_ts if relevant) to a non-None value and add
assert_eq!(cloned.is_voice, original.is_voice) (and assert_eq!(cloned.thread_ts,
original.thread_ts)) alongside the existing assertions to ensure clone()
preserves these fields.
src/channels/mod.rs (1)

43-43: Keep voice_loop private and re-export only the stable API.

pub mod voice_loop; is redundant once Line 75 re-exports the intended entry points, and it also makes every pub item in src/channels/voice_loop.rs part of the channels API surface. mod voice_loop; keeps placeholder/helper internals hidden until they need to be public.

Proposed change
-pub mod voice_loop;
+mod voice_loop;

As per coding guidelines "Deny-by-default for access and exposure boundaries; never silently broaden permissions/capabilities".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/mod.rs` at line 43, Change the module declaration from a public
module to a private one by replacing "pub mod voice_loop;" with "mod
voice_loop;". Keep the existing top-level re-exports (the "pub use ...
voice_loop::{...}" entries) as the only public surface so helper items in
voice_loop remain hidden; if any item currently relied on being pub within
voice_loop, make it private and add an explicit pub re-export in the existing
"pub use self::voice_loop::{...}" list to expose only the stable API.
src/channels/voice_loop.rs (1)

64-73: Consider using character count instead of byte length for max_tts_length check.

response_text.len() returns byte length, not character count. For UTF-8 strings with multibyte characters (e.g., emoji, non-ASCII text), the byte count can significantly exceed the character count. TTS providers typically bill and limit based on character count.

This could cause valid short messages to be rejected when they contain multibyte characters.

♻️ Proposed fix using character count
 pub fn should_reply_as_audio(
     original_message: &ChannelMessage,
     voice_config: &VoiceLoopConfig,
     response_text: &str,
 ) -> bool {
     voice_config.enabled
         && original_message.is_voice.unwrap_or(false)
-        && response_text.len() <= voice_config.max_tts_length
+        && response_text.chars().count() <= voice_config.max_tts_length
 }

If byte-based limits are intentional, consider renaming the field to max_tts_bytes and updating the doc comment for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/voice_loop.rs` around lines 64 - 73, The check in
should_reply_as_audio uses response_text.len() which returns byte length; change
it to compare the character count (e.g., response_text.chars().count()) against
voice_config.max_tts_length so multibyte characters (emoji/non-ASCII) are
measured correctly; update the use in should_reply_as_audio to use
chars().count() and, if you intended byte-based limits instead, rename
voice_config.max_tts_length to max_tts_bytes and update its doc comment to
clarify bytes vs characters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/mod.rs`:
- Line 75: The TTS length gate uses byte length rather than character count,
causing CJK/emoji-heavy responses to be treated as too long; update
should_reply_as_audio in voice_loop.rs to compare the character count of
response_text (e.g., response_text.chars().count()) against
VoiceLoopConfig::max_tts_length (or the max_tts_length field) instead of using
response_text.len(), ensuring the comparison uses characters/unicode scalars
rather than UTF-8 bytes.

In `@src/channels/whatsapp.rs`:
- Line 145: The ChannelMessage produced in the WhatsApp handler sets is_voice to
None, preventing the voice-loop from ever activating; update
parse_webhook_payload()/the WhatsApp branch that constructs ChannelMessage to
detect WhatsApp voice/audio messages (e.g., inspect the incoming message type
field or presence of an audio/voice URL/ID) and set is_voice: Some(true) for
those cases (and Some(false) for explicit text), ensuring non-text branches that
represent voice media populate ChannelMessage::is_voice so the voice-loop path
can trigger.

In `@src/config/schema.rs`:
- Around line 4961-4963: Update the documentation comment for the public field
voice_loop (type VoiceLoopConfig) to explicitly state the config contract: that
this key is optional and defaults to the crate::channels::voice_loop default
(opt-in/opt-out behavior), what behavior is expected when omitted
(backward-compatible/no-op), the compatibility and rollback guidance (how to
remove or revert the key safely), and the current temporary fallback (text
response only while TTS synthesis is stubbed) instead of claiming “auto-reply
with audio”; apply the same expanded doc-comment pattern to the other
VoiceLoopConfig occurrences noted elsewhere in this module so the schema exposes
defaults, compatibility impact, migration/rollback instructions, and the
intended text fallback until TTS is implemented.
- Around line 6752-6754: The calls to sync_directory are missing .await on Unix:
update the two call sites that reference sync_directory(marker_root) (the one
shown and the other around where marker_root is used) to use the cfg-split
pattern used elsewhere in this file and invoke sync_directory(...).await under
#[cfg(unix)] while keeping the synchronous no-op or blocking call under
#[cfg(not(unix))]; specifically modify the blocks that call sync_directory to
match the existing pattern used near the other sync_directory usages so the
async function (sync_directory) is awaited on Unix builds.

---

Outside diff comments:
In `@src/channels/slack.rs`:
- Around line 641-653: The ChannelMessage builders (creating ChannelMessage with
fields like id, sender, reply_target, content, channel, timestamp, thread_ts via
Self::inbound_thread_ts) currently hard-code is_voice: None; update both
builders (the shown block and the similar block around the other occurrence) to
detect Slack audio/file-share signals (e.g., event subtype file_share or
presence of event.files with audio mime types or audio filetypes) and set
is_voice to Some(true) when such audio is present (else Some(false) or None per
existing enum expectations), so downstream should_reply_as_audio() can see the
flag; also add a regression test that constructs a Slack voice-note event and
asserts the produced ChannelMessage has is_voice set appropriately and triggers
the audio reply path.

In `@src/channels/telegram.rs`:
- Around line 1925-1936: The ChannelMessage constructed here for incoming
Telegram voice/audio messages sets is_voice to None so the voice-loop gate can't
detect audio; update this constructor (the ChannelMessage created in the
Telegram voice/audio handling path) to set is_voice: Some(true) instead of None
so downstream logic can recognize and route audio messages to the audio-reply
branch; ensure the field is_voice on ChannelMessage is populated only for this
voice/audio path and leave other constructors unchanged.

In `@src/channels/whatsapp_web.rs`:
- Around line 593-683: The ChannelMessage sent from this handler always sets
is_voice: None, losing the fact that content may have come from
msg.audio_message; update the send logic in the block that builds ChannelMessage
so that when the content was produced from transcription of msg.audio_message
you set is_voice: Some(true), and for plain/text paths set is_voice: Some(false)
(e.g. when using trimmed text or the non-audio branches). Ensure this change is
applied where ChannelMessage is constructed and sent so downstream logic can
distinguish transcribed voice messages.

---

Nitpick comments:
In `@src/channels/mod.rs`:
- Line 43: Change the module declaration from a public module to a private one
by replacing "pub mod voice_loop;" with "mod voice_loop;". Keep the existing
top-level re-exports (the "pub use ... voice_loop::{...}" entries) as the only
public surface so helper items in voice_loop remain hidden; if any item
currently relied on being pub within voice_loop, make it private and add an
explicit pub re-export in the existing "pub use self::voice_loop::{...}" list to
expose only the stable API.

In `@src/channels/voice_loop.rs`:
- Around line 64-73: The check in should_reply_as_audio uses response_text.len()
which returns byte length; change it to compare the character count (e.g.,
response_text.chars().count()) against voice_config.max_tts_length so multibyte
characters (emoji/non-ASCII) are measured correctly; update the use in
should_reply_as_audio to use chars().count() and, if you intended byte-based
limits instead, rename voice_config.max_tts_length to max_tts_bytes and update
its doc comment to clarify bytes vs characters.

In `@src/onboard/tui.rs`:
- Around line 2272-2284: The current onboarding TUI always sets
TelegramConfig.voice_loop to Default::default() when building channels.telegram,
but the TUI has no input for configuring voice_loop so users cannot opt into it;
update the onboarding flow to either (A) add a TUI field/bool (e.g.,
telegram_voice_loop_enabled and any needed params) and populate
TelegramConfig.voice_loop from that input when constructing channels.telegram,
or (B) only set voice_loop if an existing onboarding value is present (leave it
None or omit the field) and update the TUI/README to point users to the manual
configuration step; locate the TelegramConfig construction in the function that
sets channels.telegram and wire it to the new onboarding field (or conditional
assignment) so voice_loop is user-configurable.

In `@tests/channel_routing.rs`:
- Around line 90-110: The test channel_message_preserves_all_fields_on_clone no
longer checks the newly added ChannelMessage::is_voice (and optionally
thread_ts), so update the test to assert those fields are preserved after
cloning: in the channel_message_preserves_all_fields_on_clone function, set
original.is_voice (and original.thread_ts if relevant) to a non-None value and
add assert_eq!(cloned.is_voice, original.is_voice) (and
assert_eq!(cloned.thread_ts, original.thread_ts)) alongside the existing
assertions to ensure clone() preserves these fields.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb112ca8-9fde-41e2-bd43-baa471662b73

📥 Commits

Reviewing files that changed from the base of the PR and between 326b60d and 871008c.

📒 Files selected for processing (35)
  • src/channels/bluebubbles.rs
  • src/channels/cli.rs
  • src/channels/dingtalk.rs
  • src/channels/discord.rs
  • src/channels/email_channel.rs
  • src/channels/github.rs
  • src/channels/imessage.rs
  • src/channels/irc.rs
  • src/channels/lark.rs
  • src/channels/linq.rs
  • src/channels/matrix.rs
  • src/channels/mattermost.rs
  • src/channels/mod.rs
  • src/channels/napcat.rs
  • src/channels/nextcloud_talk.rs
  • src/channels/nostr.rs
  • src/channels/qq.rs
  • src/channels/signal.rs
  • src/channels/slack.rs
  • src/channels/telegram.rs
  • src/channels/traits.rs
  • src/channels/voice_loop.rs
  • src/channels/wati.rs
  • src/channels/whatsapp.rs
  • src/channels/whatsapp_web.rs
  • src/config/mod.rs
  • src/config/schema.rs
  • src/cron/scheduler.rs
  • src/daemon/mod.rs
  • src/gateway/mod.rs
  • src/integrations/registry.rs
  • src/migration.rs
  • src/onboard/tui.rs
  • src/onboard/wizard.rs
  • tests/channel_routing.rs

Comment thread src/channels/mod.rs
Comment thread src/channels/whatsapp.rs
channel: "whatsapp".to_string(),
timestamp,
thread_ts: None,
is_voice: None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Populate is_voice for WhatsApp voice inputs, or this channel won't enter voice-loop mode.

parse_webhook_payload() only emits ChannelMessage values from the text branch and skips every non-text payload earlier in the function, so this field is effectively always None here. That means the new voice-loop path can never be activated for Cloud API WhatsApp messages.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/whatsapp.rs` at line 145, The ChannelMessage produced in the
WhatsApp handler sets is_voice to None, preventing the voice-loop from ever
activating; update parse_webhook_payload()/the WhatsApp branch that constructs
ChannelMessage to detect WhatsApp voice/audio messages (e.g., inspect the
incoming message type field or presence of an audio/voice URL/ID) and set
is_voice: Some(true) for those cases (and Some(false) for explicit text),
ensuring non-text branches that represent voice media populate
ChannelMessage::is_voice so the voice-loop path can trigger.

Comment thread src/config/schema.rs Outdated
Comment thread src/config/schema.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/channels/voice_loop.rs (1)

154-167: Minor: Test comment references "bytes" but implementation counts characters.

The comment "this text is longer than ten bytes" is slightly misleading since should_reply_as_audio uses chars().count() for character-based length checking. Consider updating for clarity.

📝 Suggested comment fix
     #[test]
     fn should_not_reply_when_text_exceeds_max_length() {
         let config = VoiceLoopConfig {
             enabled: true,
             max_tts_length: 10,
             ..Default::default()
         };
         let msg = make_message(Some(true));
-        assert!(!should_reply_as_audio(
-            &msg,
-            &config,
-            "this text is longer than ten bytes"
-        ));
+        // 35 characters, exceeds max_tts_length of 10
+        assert!(!should_reply_as_audio(
+            &msg,
+            &config,
+            "this text is longer than ten chars"
+        ));
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/voice_loop.rs` around lines 154 - 167, The test's message text
and/or comment is misleading because should_reply_as_audio uses character count
(chars().count()) not byte length; update the test in
should_not_reply_when_text_exceeds_max_length to use wording like "this text is
longer than ten characters" or adjust the message string to demonstrate
character length behavior, referencing VoiceLoopConfig.max_tts_length and the
should_reply_as_audio function so the test clearly reflects character-based
length checking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/config/schema.rs`:
- Around line 4961-4965: Update the doc comment for the public field voice_loop
(crate::channels::voice_loop::VoiceLoopConfig) to explicitly state that not
every voice message will become an audio reply even when wired: responses longer
than voice_loop.max_tts_length will continue to fall back to text, so users
should expect that runtime TTS length gating; apply the same clarifying phrasing
to the other voice_loop documentation blocks referenced in the file to keep the
public config schema contract exact.

---

Nitpick comments:
In `@src/channels/voice_loop.rs`:
- Around line 154-167: The test's message text and/or comment is misleading
because should_reply_as_audio uses character count (chars().count()) not byte
length; update the test in should_not_reply_when_text_exceeds_max_length to use
wording like "this text is longer than ten characters" or adjust the message
string to demonstrate character length behavior, referencing
VoiceLoopConfig.max_tts_length and the should_reply_as_audio function so the
test clearly reflects character-based length checking.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a7ffc7dd-2794-4070-8297-2f1102892fce

📥 Commits

Reviewing files that changed from the base of the PR and between 871008c and 7f535f2.

📒 Files selected for processing (2)
  • src/channels/voice_loop.rs
  • src/config/schema.rs

Comment thread src/config/schema.rs Outdated
Comment on lines +4961 to +4965
/// Voice-loop configuration. Disabled by default (opt-in). When enabled and
/// TTS is wired, voice messages receive audio replies; until then responses
/// fall back to text. Safe to omit or remove — no migration needed.
#[serde(default)]
pub voice_loop: crate::channels::voice_loop::VoiceLoopConfig,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten the voice_loop contract to cover the remaining text-fallback case.

These docs are much better now, but they still imply every voice message becomes an audio reply once the feature is wired. The runtime gate also depends on voice_loop.max_tts_length, so longer replies will continue to fall back to text. Calling that out here would keep the public schema contract exact.

✏️ Suggested wording
-    /// Voice-loop configuration. Disabled by default (opt-in). When enabled and
-    /// TTS is wired, voice messages receive audio replies; until then responses
-    /// fall back to text. Safe to omit or remove — no migration needed.
+    /// Voice-loop configuration. Disabled by default (opt-in). When enabled,
+    /// eligible voice messages may receive audio replies once audio reply
+    /// support is available; replies over `voice_loop.max_tts_length` still
+    /// fall back to text. Today all responses fall back to text. Safe to omit
+    /// or remove — no migration needed.

Based on learnings, config/schema changes are a public API contract and intentional fallback behavior should be documented.

Also applies to: 5062-5066, 5273-5277

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 4961 - 4965, Update the doc comment for
the public field voice_loop (crate::channels::voice_loop::VoiceLoopConfig) to
explicitly state that not every voice message will become an audio reply even
when wired: responses longer than voice_loop.max_tts_length will continue to
fall back to text, so users should expect that runtime TTS length gating; apply
the same clarifying phrasing to the other voice_loop documentation blocks
referenced in the file to keep the public config schema contract exact.

@rikitrader
Copy link
Copy Markdown

Thank you for the contribution! We appreciate the effort.

We are closing this PR because it introduces a standalone module with no integration into the existing codebase — the new types/functions are not called by any existing code path. ZeroClaw follows a trait-driven architecture where new features must be wired through factory registration and have at least one active caller.

To reopen, please:

  1. Wire the feature into an existing subsystem (channel, provider, tool factory, or agent loop)
  2. Add integration tests that exercise the end-to-end flow
  3. Keep PR scope focused — one feature per PR, ideally under 500 lines

We welcome smaller, integrated contributions. See CLAUDE.md §7 for playbooks on adding providers, channels, tools, and peripherals.

@rikitrader
Copy link
Copy Markdown

Review: This PR introduces a standalone module that is not wired into any existing code path. The new types/functions have no callers in the codebase.

To make this mergeable:

  1. Wire into an existing subsystem (channel, provider, tool factory, or agent loop)
  2. Add integration tests exercising the end-to-end flow
  3. Keep scope under 500 lines per PR

See CLAUDE.md §7 for playbooks. Recommend closing and resubmitting as smaller, integrated PRs.

@rareba rareba force-pushed the feature/voice-loop branch from 7f535f2 to 2639674 Compare March 9, 2026 08:42
@rareba
Copy link
Copy Markdown
Contributor Author

rareba commented Mar 9, 2026

Rebased onto current master as a clean, focused single commit.

Changes from previous version:

  • Removed the 200K-line mega-patch (was based on divergent dev branch)
  • Extracted only the voice-loop feature files and cleanly wired them into master's existing factory/config architecture
  • Added VoiceLoopConfig to config schema with #[serde(default)]
  • Added is_voice field to ChannelMessage and SendMessage in channel traits
  • Wired voice-loop channel support into src/channels/voice_loop.rs
  • cargo check passes cleanly against current master

This is now a focused, reviewable PR that integrates properly with the existing codebase rather than being a standalone mega-patch.

@rareba rareba force-pushed the feature/voice-loop branch from 8b8a64a to 23ea9c5 Compare March 15, 2026 15:50
Add is_voice: None to all ChannelMessage constructors in slack.rs,
channels/mod.rs tests, daemon/mod.rs tests, and integration tests.
Add voice_loop field to TelegramConfig constructors in channels/mod.rs
and daemon/mod.rs. Fix clippy::large_futures in cron/scheduler.rs and
tools/cron_run.rs with Box::pin().
@rareba rareba force-pushed the feature/voice-loop branch from 23ea9c5 to 67e7e64 Compare March 15, 2026 17:25
@rareba
Copy link
Copy Markdown
Contributor Author

rareba commented Mar 15, 2026

Superseded: reopening from feat/voice-loop branch (corrected prefix per CONTRIBUTING.md).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants