Skip to content

fix(channel): reconnect WhatsApp Web session and show QR on logout#3045

Merged
theonlyhennygod merged 15 commits intozeroclaw-labs:masterfrom
parziva-1:fix/whatsapp-web-session-reconnect-qr
Mar 12, 2026
Merged

fix(channel): reconnect WhatsApp Web session and show QR on logout#3045
theonlyhennygod merged 15 commits intozeroclaw-labs:masterfrom
parziva-1:fix/whatsapp-web-session-reconnect-qr

Conversation

@parziva-1
Copy link
Copy Markdown
Contributor

@parziva-1 parziva-1 commented Mar 8, 2026

Summary

  • Base branch target: master
  • Problem: When WhatsApp revokes the session (phone logout, device limit reached, inactivity), Event::LoggedOut was only logged as a warning — the channel stayed in a permanently broken state requiring a manual process restart. No QR code was shown.
  • Why it matters: The channel silently stops receiving/sending messages with no recovery path and no operator visibility. For always-on deployments (Termux, server) this means complete downtime until a human notices and restarts.
  • What changed: Wrapped the bot lifecycle in a reconnect loop inside listen(). On LoggedOut, a broadcast signal is sent to the loop, the bot is torn down, the stale session DB file is deleted, and after a 3-second backoff the loop restarts — triggering fresh QR pairing printed to stderr.
  • What did not change: Config schema, public API, Channel trait signature, allowed-numbers logic, pairing-code flow, media handling, or any other subsystem.

Label Snapshot (required)

  • Risk label: risk: low
  • Size label: size: S
  • Scope labels: channel
  • Module labels: channel: whatsapp
  • Contributor tier label: (auto-managed)
  • If any auto-label is incorrect, note requested correction: N/A

Change Metadata

  • Change type: bug
  • Primary scope: channel

Linked Issue

  • Closes: N/A (no existing issue — bug surfaced in runtime log)
  • Related: N/A
  • Depends on: N/A
  • Supersedes: N/A

Supersede Attribution (required when Supersedes # is used)

N/A — this PR does not supersede any prior PR.

Validation Evidence (required)

# Full CI parity not available in cross-compile-only Termux environment.
# Narrowest applicable native checks run:
cargo fmt --all -- --check   # not available locally (cross-compile env)
cargo clippy --all-targets -- -D warnings  # not available locally
cargo test   # not available locally
  • Cargo is not available in the WSL build environment for this project (cross-compiled for Android/Termux). The fix is isolated to src/channels/whatsapp_web.rs with no new dependencies, no schema changes, and no logic outside the listen() function body.
  • The change was manually verified against the upstream master version of the file (not the android branch). The diff is a pure refactor of the listen() control flow — all existing logic is preserved verbatim; only the loop wrapper and broadcast channel are added.
  • shellexpand (already a dependency at 3.1) and tokio::time::sleep (already in scope) are the only additions.
  • If any command is intentionally skipped: CI will run the full gate — local cargo toolchain unavailable for this cross-compile target.

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No — reconnect re-uses the existing wa-rs WebSocket flow
  • Secrets/tokens handling changed? No
  • File system access scope changed? Minimal — deletes the session DB file at the already-configured session_path on logout; no new path access
  • If any Yes: N/A

Privacy and Data Hygiene (required)

  • Data-hygiene status: pass
  • Redaction/anonymization notes: No user data in this change
  • Neutral wording confirmation: No identity-like wording; all log messages are system-focused

Compatibility / Migration

  • Backward compatible? Yes — runtime behaviour is identical on the happy path (connected session). Only the logout recovery path changes.
  • Config/env changes? No
  • Migration needed? No

i18n Follow-Through (required when docs or user-facing wording changes)

  • i18n follow-through triggered? No — this is a code-only change with no docs or user-facing wording changes.

Human Verification (required)

  • Verified scenarios:
    • Code review: upstream master listen() function read in full; fix applied surgically to that exact version (not the android-branch variant)
    • Diff reviewed: only listen() is modified; all existing event handlers, pairing-code flow, and message routing are unchanged
    • shellexpand::tilde usage matches existing pattern in src/security/estop.rs:255
    • Broadcast channel pattern: logout_tx.clone() captured by the move closure; each event handler invocation re-clones from the captured copy — correct for multi-call closures
  • Edge cases checked:
    • Session file already deleted before reconnect: handled by ErrorKind::NotFound arm (no-op)
    • Bot task drops logout_tx without sending: logout_rx.recv() returns Err(RecvError) which is also treated as reconnect signal
    • Ctrl-C during reconnect sleep: not caught mid-sleep, but next loop iteration will hit Ctrl-C in the select — acceptable for this use case
  • What was not verified: Live WhatsApp Web logout/reconnect cycle (requires Android device + WhatsApp account)

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: src/channels/whatsapp_web.rslisten() only
  • Potential unintended effects: Reconnect loop could spin on persistent auth failure (e.g. account banned). Mitigated by the 3-second backoff; a future improvement could add exponential backoff or a max-retry cap.
  • Guardrails/monitoring for early detection: Existing tracing::warn! and tracing::info! log lines will show reconnect attempts in the runtime log

Agent Collaboration Notes (recommended)

  • Agent tools used: Claude Code (file read/edit, GitHub MCP for PR)
  • Workflow/plan summary: Read upstream master listen() → identified missing reconnect loop → applied surgical patch to upstream version in isolated worktree → committed and pushed from fork → PR to upstream master
  • Verification focus: Correctness of broadcast channel ownership and borrow patterns; ensuring only the logout recovery path changes
  • Confirmation: naming + architecture boundaries followed — change stays within channels/ module, no cross-subsystem coupling introduced

Rollback Plan (required)

  • Fast rollback command/path: git revert d84f2ca6 — single commit, clean revert restores warn-only behaviour
  • Feature flags or config toggles: None
  • Observable failure symptoms: If the reconnect loop misbehaves, logs will show repeated "WhatsApp Web channel starting" lines; process can be killed and config rolled back

Risks and Mitigations

  • Risk: Reconnect loop spins indefinitely on permanent auth failure (banned account, invalid credentials)
    • Mitigation: 3-second sleep between attempts limits rate; operator will see repeated log entries and can kill the process. A follow-up could add max-retry or exponential backoff.
  • Risk: Session file deletion fails silently on unusual path configurations
    • Mitigation: NotFound is a no-op (safe); other errors are logged as warnings and the loop continues — worst case is the next pairing attempt has a stale DB which wa-rs will handle or error on cleanly.

Summary by CodeRabbit

  • New Features

    • Persistent reconnect loop with retry limits, exponential backoff, session-file driven QR re-pairing, and QR rendering during pairing.
    • Dynamic transport endpoint override, HTTP media client, and logout signaling to coordinate reconnects.
  • Bug Fixes

    • More robust per-iteration session loading and forced re-pairing for stale sessions.
    • Safer logout handling that clears and aborts stale instances and cleans session state.
  • Improvements

    • Expanded connection/event logging, stricter sender validation, and pairing warnings when partial pairing info is provided.

When WhatsApp revokes the session (e.g. phone logout, device limit),
Event::LoggedOut was only logged; the channel stayed in a broken state
with no way to recover without a manual restart and no new QR code.

Wrap the bot lifecycle in a reconnect loop:
- On Event::LoggedOut, signal via a broadcast channel instead of only
  logging.
- The listen() loop receives the signal, tears down the bot handle,
  deletes the stale session DB file, waits 3 s, then restarts.
- On restart, no session file is found → fresh pairing flow begins →
  QR code is printed to stderr so the operator can re-link the device.
- Ctrl-C / clean shutdown still exits the loop normally (no reconnect).

No config changes, no new dependencies, no public API changes.
Rollback: revert this commit; channel reverts to warn-only behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'tools', 'path_filters', 'review_instructions'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

Adds a persistent WhatsApp Web session lifecycle loop that initializes per-iteration storage and transport, supports configurable WS URL and HTTP client, handles pairing (QR/code) or device load, processes events (messages, Connected, StreamError, LoggedOut), coordinates logout via a signal channel, removes session file on logout, and retries with exponential backoff and a max retry limit. (44 words)

Changes

Cohort / File(s) Summary
WhatsApp Web channel
src/channels/whatsapp_web.rs
Replaces one-shot startup with a persistent session loop: expands ~ in session path each iteration, re-inits RusqliteStore, optionally loads device, supports WHATSAPP_WS_URL transport factory, adds UreqHttpClient, introduces logout broadcast channel, richer event handling (Connected, LoggedOut, StreamError, PairingCode/PairingQrCode, Incoming messages), sender candidate resolution and allowlist checks, session-file removal on logout to force re-pair, exponential backoff (BASE_DELAY_SECS, MAX_DELAY_SECS) with MAX_RETRIES, and bot handle lifecycle management (start/abort/cleanup).

Sequence Diagram

sequenceDiagram
    participant MainLoop as Main Loop
    participant Store as Storage (RusqliteStore / session file)
    participant Transport as Transport Factory
    participant HTTP as Media HTTP Client
    participant Bot as Bot Client
    participant Events as Event Handler
    participant LogoutCh as Logout Channel

    MainLoop->>Store: Expand session path & (re)initialize store
    MainLoop->>Transport: Create transport (opt. WHATSAPP_WS_URL)
    MainLoop->>HTTP: Init media HTTP client (UreqHttpClient)
    loop Session lifecycle per-iteration
        MainLoop->>Store: Load saved device (if present)
        alt device loaded
            MainLoop->>Bot: Build & start bot (with store, transport, HTTP)
            Bot->>Events: Emit events (Connected, Incoming, Pairing, StreamError, LoggedOut)
        else no device => pairing
            MainLoop->>Bot: Start pairing flow (QR / code)
            Bot->>Events: Emit PairingQrCode / PairingCode
        end
        Events->>Bot: Handle Incoming -> resolve sender candidates -> allowlist check -> forward ChannelMessage
        Events->>LogoutCh: Send logout on LoggedOut or explicit logout
        LogoutCh->>MainLoop: Notify to cleanup
        MainLoop->>Store: Remove session file (if logout forces re-pair)
        MainLoop->>MainLoop: Compute backoff (exponential, capped) and increment retry
        MainLoop->>MainLoop: Retry loop or exit after MAX_RETRIES
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

size: S

Suggested reviewers

  • theonlyhennygod
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title concisely summarizes the primary change: adding reconnection with QR display on WhatsApp logout.
Description check ✅ Passed The description comprehensively addresses all template sections with clear detail: problem/solution, metadata, scope boundaries, validation evidence (with cross-compile explanation), security/privacy/compatibility confirmations, human verification, risks/mitigations, and rollback plan.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@parziva-1
Copy link
Copy Markdown
Contributor Author

@chumyin — tagging for assignee gate per contribution protocol. Please assign yourself if this is ready for triage.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/channels/whatsapp_web.rs (1)

506-508: Consider bounded retries or exponential backoff.

A fixed 3-second delay mitigates rapid spinning, but on permanent auth failures (e.g., account banned), this loop will retry indefinitely. Consider adding a max retry count or exponential backoff with a cap to fail fast after repeated failures.

♻️ Example: bounded retry with exponential backoff
// At the start of listen(), before the loop:
let mut retry_count = 0u32;
const MAX_RETRIES: u32 = 10;
const BASE_DELAY_SECS: u64 = 3;
const MAX_DELAY_SECS: u64 = 300; // 5 minutes cap

// ...inside the reconnect branch:
if should_reconnect {
    retry_count += 1;
    if retry_count > MAX_RETRIES {
        anyhow::bail!(
            "WhatsApp Web: exceeded {} reconnect attempts, giving up",
            MAX_RETRIES
        );
    }
    // ... file removal ...
    let delay = std::cmp::min(
        BASE_DELAY_SECS * 2u64.saturating_pow(retry_count - 1),
        MAX_DELAY_SECS,
    );
    tracing::info!("WhatsApp Web: reconnecting in {}s (attempt {}/{})", delay, retry_count, MAX_RETRIES);
    tokio::time::sleep(std::time::Duration::from_secs(delay)).await;
    continue;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/whatsapp_web.rs` around lines 506 - 508, The reconnect loop in
listen() currently uses a fixed 3s sleep which can cause infinite retries on
permanent failures; implement a bounded retry with exponential backoff: add a
retry counter (e.g., retry_count) and constants like MAX_RETRIES,
BASE_DELAY_SECS and MAX_DELAY_SECS, increment retry_count on each reconnect
attempt, bail/error out (or return) when retry_count > MAX_RETRIES, compute
delay as min(BASE_DELAY_SECS * 2^(retry_count-1), MAX_DELAY_SECS), log attempt
number and delay before sleeping, and reset retry_count on a successful
connection to avoid permanent backoff.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 351-356: The current tracing::info! call logs raw message text
(variables sender, chat, text); change this to avoid sensitive data by either
using tracing::debug! for the full text and keeping info-level logs minimal, or
redact/truncate the text before logging (e.g., replace text with a
masked/first-N-chars version or "[REDACTED]") in the handler that calls
tracing::info! so that only non-sensitive context (sender, chat) is logged at
info level and full content is only available under debug/trace.
- Around line 296-299: The session path is passed raw to RusqliteStore::new
(using self.session_path) while cleanup uses shellexpand::tilde, causing
mismatched file locations; fix by expanding the path once up front (use
shellexpand::tilde or equivalent) into a single variable (e.g.,
expanded_session_path) and pass that expanded path to RusqliteStore::new and
store it for use in the cleanup logic so both creation and deletion operate on
the same expanded path.

---

Nitpick comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 506-508: The reconnect loop in listen() currently uses a fixed 3s
sleep which can cause infinite retries on permanent failures; implement a
bounded retry with exponential backoff: add a retry counter (e.g., retry_count)
and constants like MAX_RETRIES, BASE_DELAY_SECS and MAX_DELAY_SECS, increment
retry_count on each reconnect attempt, bail/error out (or return) when
retry_count > MAX_RETRIES, compute delay as min(BASE_DELAY_SECS *
2^(retry_count-1), MAX_DELAY_SECS), log attempt number and delay before
sleeping, and reset retry_count on a successful connection to avoid permanent
backoff.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 95da016e-d814-455e-ab0e-cf10054485f7

📥 Commits

Reviewing files that changed from the base of the PR and between f7fefd4 and d84f2ca.

📒 Files selected for processing (1)
  • src/channels/whatsapp_web.rs

Comment thread src/channels/whatsapp_web.rs
Comment thread src/channels/whatsapp_web.rs
…d retries

- Expand session_path with shellexpand::tilde upfront so RusqliteStore
  and session cleanup both use the same resolved path
- Move raw message content from info! to debug! level to avoid logging
  sensitive payloads; info! now logs only message length
- Replace fixed 3s reconnect delay with exponential backoff (3s base,
  5m cap, max 10 retries) and reset counter on successful connection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 332-347: The current wait on logout_rx.recv() is unreachable for
an Err because the outer logout_tx remains in scope; drop the loop-owned sender
(logout_tx) before entering the select! in listen() so recv() can return Err
when all senders are gone, or alternatively include the bot task handle in the
select! (await the bot handle completion) so shutdown of the Bot::builder
spawned task is observed; ensure you reference and drop/logout_tx (or add the
bot handle join branch) in the same scope where you call logout_rx.recv() so the
recv() path for task shutdown can fire.
- Around line 487-489: The retry_count is being reset immediately after
bot.run() returns (resetting retry_count = 0) which allows a rapid loop on auth
failures; instead move and perform the reset only when the session proves
recovery (i.e., inside the Event::Connected branch of the event loop). Update
the logic around bot.run(), the event handling for Event::Connected and
Event::LoggedOut, and any other similar reset sites (referencing retry_count,
bot.run(), Event::Connected, LoggedOut, and MAX_RETRIES) so that retry_count is
cleared only when Event::Connected is observed; remove or undo resets at
bot.run() return and the duplicated area around the 509-540 handling.
- Around line 431-455: The Event::PairingCode and Event::PairingQrCode handlers
are currently emitting pairing credentials via tracing::info (and tracing::warn
logs the QR error), which exposes secrets to structured log sinks; change these
to print only to the terminal (e.g., eprintln! or stdout) and remove the raw
payload from any tracing:: logs inside the Event::PairingCode and
Event::PairingQrCode match arms and in the render_pairing_qr error path
(referenced symbols: Event::PairingCode, Event::PairingQrCode,
Self::render_pairing_qr, tracing::info); keep non-sensitive status messages in
tracing but ensure the actual code/QR payload is written only to terminal output
and not passed to tracing macros.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 462572f9-e229-478f-b94e-4a351167875f

📥 Commits

Reviewing files that changed from the base of the PR and between d84f2ca and c95b1a1.

📒 Files selected for processing (1)
  • src/channels/whatsapp_web.rs

Comment thread src/channels/whatsapp_web.rs
Comment thread src/channels/whatsapp_web.rs Outdated
Comment thread src/channels/whatsapp_web.rs Outdated
…reset

- Drop outer logout_tx before select! so recv() returns Err when the
  bot task dies without emitting LoggedOut, preventing listen() from
  hanging indefinitely on a dead session
- Remove pair code and QR payload from tracing::info! (login secrets);
  print them only to stderr via eprintln! for operator visibility
- Move retry_count reset from bot.run() return into Event::Connected
  handler via Arc<AtomicU32> so rapid logout loops without successful
  connection correctly accumulate toward MAX_RETRIES

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/channels/whatsapp_web.rs (1)

360-418: ⚠️ Potential issue | 🟠 Major

Remove raw user data from these logs.

Line 367 still emits the full inbound message body, and Line 415 dumps normalized sender candidates. Both are sensitive channel data; keep only metadata/counts here.

🛡️ Minimal safe change
                                 tracing::info!(
                                     "WhatsApp Web message from {} in {} (len={})",
                                     sender,
                                     chat,
                                     text.len()
                                 );
-                                tracing::debug!(
-                                    "WhatsApp Web message content: {}",
-                                    text
-                                );
+                                tracing::debug!("WhatsApp Web message received");
...
                                 } else {
                                     tracing::warn!(
-                                        "WhatsApp Web: message from {} not in allowed list (candidates: {:?})",
-                                        sender_jid,
-                                        sender_candidates
+                                        "WhatsApp Web: message from {} not in allowed list",
+                                        sender_jid
                                     );
                                 }

As per coding guidelines: "Deny-by-default for access and exposure boundaries; never log secrets, raw tokens, or sensitive payloads; keep network/filesystem/shell scope as narrow as possible unless explicitly justified"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/whatsapp_web.rs` around lines 360 - 418, The logs currently emit
raw message body and sensitive sender candidates; update the tracing calls in
this block to avoid logging raw user data: remove the tracing::debug that logs
"WhatsApp Web message content: {}" with text and change the tracing::warn that
dumps sender_candidates and sender_jid to log only non-sensitive metadata (e.g.,
message length, trimmed-empty flag, and the count of sender_candidates or a
masked/hashed identifier). Ensure references to Self::sender_phone_candidates,
sender_jid, tx_inner.send(ChannelMessage { ... }), and the tracing calls are
updated so no raw message text or full phone candidate lists are emitted.
🧹 Nitpick comments (1)
src/channels/whatsapp_web.rs (1)

291-555: Add regression coverage for the reconnect lifecycle.

This is now the core listen() path, but the test module still does not exercise LoggedOut → cleanup → retry/backoff or the Event::Connected retry reset. Even a small helper-level test around that state machine would make the new listen/health_check semantics much safer.

As per coding guidelines: "Implement Channel trait in src/channels/, keep send, listen, health_check, typing semantics consistent, cover auth/allowlist/health behavior with tests"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/whatsapp_web.rs` around lines 291 - 555, The test suite is
missing regression coverage for the reconnect lifecycle inside listen(): add
tests that drive the Event flow (simulate Event::LoggedOut causing session
cleanup and reconnect/backoff) and verify retry_count is incremented and capped
by MAX_RETRIES and that Event::Connected resets retry_count to 0; specifically,
write helper-level tests that exercise WhatsAppWebChannel::listen (or the test
harness that builds the Bot) to assert that when LoggedOut is emitted the code
removes the session file (the expanded_session_path behavior), schedules a
backoff delay using BASE_DELAY_SECS/ MAX_DELAY_SECS logic, and retries up to
MAX_RETRIES, and separately that a Connected event resets the AtomicU32
retry_count to zero; use mocks/stubs for Bot.run()/on_event and RusqliteStore
(or inject a temp session file) to avoid real network I/O and to observe calls
to tokio::fs::remove_file and the retry atomic (retry_count) and
health_check/send behavior as part of the assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 512-536: The cleanup currently aborts the bot task and then
removes the session DB file while the SQLite-backed session store (and variables
like bot, device, backend) are still held, risking WAL/SHM sidecars and races;
before deleting expanded_session_path, explicitly drop/close the SQLite session
store instance (release any references to the DB so PRAGMA WAL files are closed)
and fully await shutdown of the bot task instead of just calling handle.abort()
(i.e., call abort and then await the JoinHandle so background I/O finishes),
then remove the primary DB file and its "-wal" and "-shm" sidecar files (handle
NotFound errors) to ensure a clean fresh re-pair; use the existing symbols
self.client.lock(), self.bot_handle.lock().take(), retry_count, and
expanded_session_path to locate where to insert the drop/await and sidecar
deletions.

---

Duplicate comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 360-418: The logs currently emit raw message body and sensitive
sender candidates; update the tracing calls in this block to avoid logging raw
user data: remove the tracing::debug that logs "WhatsApp Web message content:
{}" with text and change the tracing::warn that dumps sender_candidates and
sender_jid to log only non-sensitive metadata (e.g., message length,
trimmed-empty flag, and the count of sender_candidates or a masked/hashed
identifier). Ensure references to Self::sender_phone_candidates, sender_jid,
tx_inner.send(ChannelMessage { ... }), and the tracing calls are updated so no
raw message text or full phone candidate lists are emitted.

---

Nitpick comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 291-555: The test suite is missing regression coverage for the
reconnect lifecycle inside listen(): add tests that drive the Event flow
(simulate Event::LoggedOut causing session cleanup and reconnect/backoff) and
verify retry_count is incremented and capped by MAX_RETRIES and that
Event::Connected resets retry_count to 0; specifically, write helper-level tests
that exercise WhatsAppWebChannel::listen (or the test harness that builds the
Bot) to assert that when LoggedOut is emitted the code removes the session file
(the expanded_session_path behavior), schedules a backoff delay using
BASE_DELAY_SECS/ MAX_DELAY_SECS logic, and retries up to MAX_RETRIES, and
separately that a Connected event resets the AtomicU32 retry_count to zero; use
mocks/stubs for Bot.run()/on_event and RusqliteStore (or inject a temp session
file) to avoid real network I/O and to observe calls to tokio::fs::remove_file
and the retry atomic (retry_count) and health_check/send behavior as part of the
assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c5ba89fa-72d2-400e-b83a-8fe3504c0131

📥 Commits

Reviewing files that changed from the base of the PR and between c95b1a1 and e91c945.

📒 Files selected for processing (1)
  • src/channels/whatsapp_web.rs

Comment thread src/channels/whatsapp_web.rs
- Await the aborted bot JoinHandle so background I/O finishes before
  file deletion, preventing races with SQLite writes
- Explicitly drop bot/device/backend to close the SQLite connection
  and release WAL/SHM file locks before removal
- Delete -wal and -shm sidecar files alongside the primary DB to
  ensure a fully clean session for fresh QR re-pairing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/channels/whatsapp_web.rs (1)

355-418: ⚠️ Potential issue | 🟠 Major

Remove raw message bodies and sender identifiers from logs.

This handler still writes message text at Line 366 and raw sender/chat/candidate identifiers at Lines 360-365 and 414-418. That leaks user payloads and phone/JID data into log sinks. Keep only redacted metadata such as lengths or counts.

🔐 Safer logging shape
-                                let sender = sender_jid.user().to_string();
                                 let chat = info.source.chat.to_string();

-                                tracing::info!(
-                                    "WhatsApp Web message from {} in {} (len={})",
-                                    sender,
-                                    chat,
-                                    text.len()
-                                );
-                                tracing::debug!(
-                                    "WhatsApp Web message content: {}",
-                                    text
-                                );
+                                tracing::info!(
+                                    "WhatsApp Web message received (len={})",
+                                    text.len()
+                                );
...
-                                    tracing::warn!(
-                                        "WhatsApp Web: message from {} not in allowed list (candidates: {:?})",
-                                        sender_jid,
-                                        sender_candidates
-                                    );
+                                    tracing::warn!(
+                                        "WhatsApp Web: message rejected by allowlist (candidate_count={})",
+                                        sender_candidates.len()
+                                    );
As per coding guidelines: "never log secrets, raw tokens, or sensitive payloads".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/whatsapp_web.rs` around lines 355 - 418, Remove logging of raw
message bodies and full sender/chat identifiers: stop emitting `text`,
`sender_jid`, `sender_alt`, `mapped_phone` and `sender_candidates` contents. In
the blocks around the calls to Self::sender_phone_candidates,
Self::is_number_allowed_for_list, and when sending ChannelMessage via tx_inner,
replace the tracing::info/debug/warn messages to only emit non-sensitive
metadata such as text length (e.g., text.len()), counts (e.g.,
sender_candidates.len()), and redacted/minimal identifiers (e.g., masked sender
or chat length) or boolean flags (allowed/blocked). Ensure ChannelMessage still
contains the real values, but logs never print full phone/JID or message body;
reference symbols to change:
`tracing::info!`/`tracing::debug!`/`tracing::warn!`, `sender_jid`, `sender_alt`,
`mapped_phone`, `sender_candidates`, `tx_inner`, and `ChannelMessage`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 500-551: The code currently treats logout_rx.recv() Err the same
as an explicit logout and always deletes the session DB; change the logic to
only purge session files when an actual Event::LoggedOut was observed. Add a
shared boolean/atomic (e.g., logged_out_flag) that your bot/event handler sets
to true when it emits Event::LoggedOut, update the select! handling around
logout_rx.recv() to distinguish Ok(()) vs Err and set should_reconnect
accordingly, and wrap the session file removal loop (the for loop that removes
expanded_session_path, "-wal", "-shm") in a check of that logged_out_flag so
files are only removed after a confirmed logout. Ensure symbols referenced:
logout_rx.recv(), Event::LoggedOut, retry_count, expanded_session_path, and the
session file removal loop are updated.

---

Duplicate comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 355-418: Remove logging of raw message bodies and full sender/chat
identifiers: stop emitting `text`, `sender_jid`, `sender_alt`, `mapped_phone`
and `sender_candidates` contents. In the blocks around the calls to
Self::sender_phone_candidates, Self::is_number_allowed_for_list, and when
sending ChannelMessage via tx_inner, replace the tracing::info/debug/warn
messages to only emit non-sensitive metadata such as text length (e.g.,
text.len()), counts (e.g., sender_candidates.len()), and redacted/minimal
identifiers (e.g., masked sender or chat length) or boolean flags
(allowed/blocked). Ensure ChannelMessage still contains the real values, but
logs never print full phone/JID or message body; reference symbols to change:
`tracing::info!`/`tracing::debug!`/`tracing::warn!`, `sender_jid`, `sender_alt`,
`mapped_phone`, `sender_candidates`, `tx_inner`, and `ChannelMessage`.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 16a049b8-d6f8-438c-9889-25a3c200be96

📥 Commits

Reviewing files that changed from the base of the PR and between e91c945 and 9afa594.

📒 Files selected for processing (1)
  • src/channels/whatsapp_web.rs

Comment thread src/channels/whatsapp_web.rs Outdated
@rikitrader
Copy link
Copy Markdown

Core reconnect logic looks solid — exponential backoff, broadcast signaling, and session cleanup are well-structured. Two items before merge:

  1. Log redaction: Lines 360-365 and 414-418 log raw phone/JID identifiers (sender, chat) at warn/info level. Per project security conventions (src/security/), please redact identifiable data to counts/lengths only in structured logs. Debug-level is fine for development.

  2. Test coverage: No regression tests for the LoggedOut → cleanup → reconnect lifecycle. Even 2-3 unit tests exercising the retry state machine would strengthen confidence.

Happy to approve once those are addressed.

rareba added a commit to rareba/zeroclaw that referenced this pull request Mar 9, 2026
Move raw sender and chat JID values from info/warn level to debug level.
Info and warn logs now show only lengths/counts per project security
conventions for structured logging of identifiable data.

Addresses review feedback from rikitrader on PR zeroclaw-labs#3045.
@rareba
Copy link
Copy Markdown
Contributor

rareba commented Mar 9, 2026

Addressed the review feedback on log redaction:

Lines 360-365 (info-level): Replaced raw sender and chat JID values with sender.len() and chat.len(). The info log now shows only identifier lengths alongside text length. Raw values are preserved in a companion debug! log for development use.

Lines 414-418 (warn-level): Replaced raw sender_jid and sender_candidates with sender_candidates.len() (count only). Raw values are preserved in a companion debug! log for development use.

This aligns with the project security conventions in src/security/ — identifiable data is redacted to counts/lengths at info/warn level, while debug-level retains full values for development troubleshooting.

parziva-1 and others added 3 commits March 9, 2026 07:53
Add a shared AtomicBool (session_revoked) that the Event::LoggedOut
handler sets to true. Session DB and WAL/SHM sidecars are only deleted
when this flag is set. A transient bot task crash (recv Err without
LoggedOut) now reconnects with the existing session intact, avoiding
unnecessary re-pairing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 6 unit tests covering the reconnect lifecycle:
- Exponential backoff calculation with cap validation
- Retry counter reset on Event::Connected
- MAX_RETRIES exceeded detection
- session_revoked flag guards file deletion
- Broadcast channel signals reconnect on LoggedOut
- Dropped sender returns Err (bot crash without LoggedOut)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace raw phone numbers and JID strings with lengths/counts at
info and warn log levels per project security conventions. Raw values
remain available at debug level for local troubleshooting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/channels/whatsapp_web.rs (1)

323-327: Please avoid adding a hidden transport override in this reconnect patch.

WHATSAPP_WS_URL introduces a new runtime branch that is outside the stated reconnect scope and broadens the channel’s network behavior without any visible docs/tests in this PR. If there is a real operator use case, I’d split it into a separately justified change; otherwise keep the endpoint fixed here.

As per coding guidelines: "Do not add new config keys, trait methods, feature flags, or workflow branches without a concrete accepted use case; do not introduce speculative 'future-proof' abstractions without at least one current caller."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/whatsapp_web.rs` around lines 323 - 327, Remove the hidden
runtime branch that reads WHATSAPP_WS_URL and calling
TokioWebSocketTransportFactory::with_url; revert to using the fixed transport
configuration used for reconnects by deleting the
std::env::var("WHATSAPP_WS_URL") check and the
transport_factory.with_url(ws_url) override so that the transport_factory
variable is created deterministically via TokioWebSocketTransportFactory::new()
only; if an operator-configurable endpoint is truly required, implement it in a
separate PR with docs/tests rather than in this reconnect change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 822-910: Tests are only exercising atomics and channel primitives,
not the real reconnect/cleanup logic; extract the decision logic into small
production helpers (e.g., compute_retry_delay(attempt: u32) -> Duration,
handle_connected_event(retry_count: &AtomicU32),
should_purge_session(session_revoked: &AtomicBool) -> bool) and update the
reconnect state machine (the Channel impl: listen(), send(), health_check() on
the WhatsApp channel) to call those helpers so unit tests can directly exercise
them; then replace the current tests to call these new helpers (assert
compute_retry_delay matches expected capped doubling, call
handle_connected_event to verify retry reset, call should_purge_session to
verify purge gating) or add a focused integration test that drives the real
listen() state machine on the WhatsAppWeb channel to verify end-to-end
reconnect, purge, and logout/broadcast behavior.

---

Nitpick comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 323-327: Remove the hidden runtime branch that reads
WHATSAPP_WS_URL and calling TokioWebSocketTransportFactory::with_url; revert to
using the fixed transport configuration used for reconnects by deleting the
std::env::var("WHATSAPP_WS_URL") check and the
transport_factory.with_url(ws_url) override so that the transport_factory
variable is created deterministically via TokioWebSocketTransportFactory::new()
only; if an operator-configurable endpoint is truly required, implement it in a
separate PR with docs/tests rather than in this reconnect change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9487cd15-dbef-4495-8d94-dd2f9d3a623b

📥 Commits

Reviewing files that changed from the base of the PR and between 9afa594 and 5d836c2.

📒 Files selected for processing (1)
  • src/channels/whatsapp_web.rs

Comment thread src/channels/whatsapp_web.rs Outdated
parziva-1 and others added 3 commits March 9, 2026 10:20
Move retry/cleanup decision logic into small associated functions on
WhatsAppWebChannel (compute_retry_delay, record_retry, reset_retry,
should_purge_session, session_file_paths) so listen() delegates to
them and tests exercise the same code paths used in production.

Replaces tests that only validated stdlib primitives (AtomicU32,
broadcast) with tests that call the actual helpers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
theonlyhennygod and others added 3 commits March 11, 2026 20:03
`backend` is moved into `Bot::builder().with_backend(backend)`, so
calling `drop(backend)` afterwards is a use-after-move compile error.
Dropping `bot` already releases the Arc reference to the storage backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d files

Auto-formatted with rustfmt 1.8.0-stable (Rust 1.92) to match CI expectations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@theonlyhennygod theonlyhennygod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Reviewed the reconnect state machine, resource cleanup, atomics, and test coverage.

Fixes applied:

  • Removed drop(backend) use-after-move (backend is moved into with_backend(), so drop(bot) already releases the Arc reference)
  • Fixed rustfmt formatting for whatsapp_web.rs and pre-existing issues in matrix.rs, loop_.rs, etc.

Design is sound: broadcast channel for logout signaling, exponential backoff (3s-300s, max 10 retries), session file cleanup only on explicit LoggedOut, proper resource ordering (client lock → abort handle → drop bot/device).

…session-reconnect-qr

# Conflicts:
#	src/channels/mod.rs
Reverts formatting-only changes to mod.rs, traits.rs, loop_.rs, and
observability/traits.rs to avoid triggering pre-existing clippy warnings
that are unrelated to this PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@theonlyhennygod theonlyhennygod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all CI checks pass. Reconnect logic is sound.

@theonlyhennygod theonlyhennygod merged commit 0479bfc into zeroclaw-labs:master Mar 12, 2026
7 checks passed
lantrinh1999 pushed a commit to lantrinh1999/zeroclaw-1 that referenced this pull request Mar 18, 2026
…eroclaw-labs#3045)

Wraps the WhatsApp Web listen() in a reconnect loop. When Event::LoggedOut
fires, the bot is torn down, stale session DB files are deleted, and after
exponential backoff (3s base, 300s cap, max 10 retries) the loop restarts
triggering fresh QR pairing.

- Broadcast channel for logout signaling from event handler to listen loop
- Session file cleanup (primary + WAL + SHM) only on explicit LoggedOut
- Proper resource ordering: client lock, abort handle, drop bot/device
- Tests for retry delay, counter, session purge, and file paths helpers
PlebCoder pushed a commit to PlebCoder/zeroclaw that referenced this pull request Mar 19, 2026
…eroclaw-labs#3045)

Wraps the WhatsApp Web listen() in a reconnect loop. When Event::LoggedOut
fires, the bot is torn down, stale session DB files are deleted, and after
exponential backoff (3s base, 300s cap, max 10 retries) the loop restarts
triggering fresh QR pairing.

- Broadcast channel for logout signaling from event handler to listen loop
- Session file cleanup (primary + WAL + SHM) only on explicit LoggedOut
- Proper resource ordering: client lock, abort handle, drop bot/device
- Tests for retry delay, counter, session purge, and file paths helpers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants