Skip to content

feat: download and store WhatsApp media for agent access#128

Closed
baijunjie wants to merge 1 commit intoqwibitai:mainfrom
baijunjie:feat/whatsapp-media-download
Closed

feat: download and store WhatsApp media for agent access#128
baijunjie wants to merge 1 commit intoqwibitai:mainfrom
baijunjie:feat/whatsapp-media-download

Conversation

@baijunjie
Copy link
Copy Markdown
Contributor

Summary

When a registered group receives a media message (image, video, audio, document, or sticker), NanoClaw now automatically downloads the file and saves it to groups/{folder}/media/. The container-accessible path is prepended to the message content as [media: /workspace/group/media/...], so agents can read and use the file directly.

No database schema changes — the media path is embedded in the existing content field. The existing container mount already makes the media directory visible, so no mount changes are needed either.

Motivation

I'm working on letting the assistant post tweets with images that users send via WhatsApp. Currently NanoClaw only extracts caption text from media messages and discards the actual file, so the agent has no way to access user-sent images. This PR fixes that by downloading and storing media on disk, making it available for the agent to use in integrations like X image posting.

@baijunjie baijunjie requested a review from gavrielc as a code owner February 7, 2026 10:32
@baijunjie baijunjie force-pushed the feat/whatsapp-media-download branch from cddc04c to 4786d6e Compare February 7, 2026 10:36
@baijunjie baijunjie force-pushed the feat/whatsapp-media-download branch from 4786d6e to bf4d2e4 Compare February 12, 2026 03:30
@baijunjie
Copy link
Copy Markdown
Contributor Author

I noticed there was a major refactor (index split into channels/whatsapp, ipc, router modules). I've rebased and updated the code — this is now built on top of the latest version.

@TomGranot
Copy link
Copy Markdown
Collaborator

Changes needed:

  • Good feature — the new src/whatsapp-media.ts file is clean.
  • The src/channels/whatsapp.ts changes correctly target the post-refactor file.
  • Please rebase to verify the messages.upsert handler lines match current main exactly (the handler has been modified since this PR was opened).
  • Consider adding tests for the media download logic.

@baijunjie baijunjie force-pushed the feat/whatsapp-media-download branch from a618ae4 to ba46995 Compare February 13, 2026 01:55
@baijunjie
Copy link
Copy Markdown
Contributor Author

@TomGranot Updated and rebased on the latest main. Also added comprehensive tests:

  • src/whatsapp-media.test.ts — 17 unit tests covering getMediaInfo (all media types, edge cases) and downloadAndSaveMedia (MIME-to-extension mapping, fallback extensions, download failure handling)
  • src/channels/whatsapp.test.ts — 3 new integration tests for media download in the message handler (successful download, download failure, text-only skip), plus fixes for existing test compatibility with the async handler (Browsers mock, GROUPS_DIR mock, flushPromises for async awaits)

All 159 tests passing.

@baijunjie baijunjie force-pushed the feat/whatsapp-media-download branch from ba46995 to 6500dd6 Compare February 16, 2026 04:13
@baijunjie
Copy link
Copy Markdown
Contributor Author

I've rebased and updated the base branch to the latest version. I'd like to know if this PR is still on track to be merged, or if there's something that doesn't meet the requirements? Or perhaps no one really needs the AI assistant to support image viewing?

Apologies for pressing — I think this will be my last update to this PR. If it's not going to be merged, please feel free to close it.

Add media download support for registered groups. When a message
contains an image, video, audio, document, or sticker, it is
downloaded and saved to groups/{folder}/media/. The container path
is prepended to the message content as [media: /workspace/group/media/filename]
so the agent can access the file.

Add unit tests for whatsapp-media module and integration tests for
media download in the WhatsApp channel handler. Fix existing test
compatibility (add Browsers/GROUPS_DIR mocks, async handler awaits).
@TomGranot
Copy link
Copy Markdown
Collaborator

Heads up — PR #281 also implements WhatsApp media download and file send support. You might want to compare approaches and see what each can learn from the other.

@baijunjie
Copy link
Copy Markdown
Contributor Author

@TomGranot Thanks for the heads up. I've done a detailed comparison between PR #281 and this PR. Here's what I found:

Architecture

Aspect PR #281 This PR (#128)
Code organization Inline downloadMedia() private method in whatsapp.ts Separate whatsapp-media.ts module with exported getMediaInfo() and downloadAndSaveMedia()
Scope Bidirectional: download + send (send_file MCP tool) Download only

Media Download

Aspect PR #281 This PR (#128)
Nested message unwrapping Handles ephemeral, viewOnce, viewOnceV2, documentWithCaption wrappers None — only checks top-level message keys
MIME safety check SAFE_MIME_PREFIXES allowlist, rejects executables/scripts None — downloads any media type
File naming ${Date.now()}-${randomHex}.${ext} (random) ${msgId}.${ext} (deterministic, based on message ID)
Empty content handling Sets placeholder like [image] or [document: file.pdf] Prepends [media: /workspace/group/media/xxx.jpg]\n to content

Data Persistence

Aspect PR #281 This PR (#128)
DB schema Adds 4 columns: media_type, media_path, media_mime, media_filename No DB changes
Router format Structured XML attributes: <message media_type="image" media_path="..." ...> Media path embedded directly in content text
Type changes NewMessage gets 4 optional fields + Channel gets sendFile method No type changes

Media Send (PR #281 only)

PR #281 also implements the full file-send pipeline from agent container back to WhatsApp:

  • send_file MCP tool in the container (path/size validation, 64MB limit)
  • IPC layer handles type: 'file' messages with container-to-host path translation
  • WhatsAppChannel.sendFile() picks image/video/audio/document based on MIME type
  • Authorization: non-main groups can only send to their own chat

Takeaways

PR #281 is more complete: MIME safety checks, nested message unwrapping, structured media metadata in DB, bidirectional file transfer.

This PR is simpler and more testable: separate module with dedicated unit tests, deterministic message-ID-based naming (avoids duplicate downloads), but lacks nested message unwrapping (ephemeral/viewOnce media won't download) and MIME safety checks.

I think the key gaps on my side are:

  1. Nested message unwrapping — without it, ephemeral/viewOnce media silently fails
  2. MIME safety check — should block potentially dangerous file types
  3. send_file capability — not in scope for this PR but a natural next step

Happy to incorporate any of these improvements if this PR is still being considered.

@TomGranot
Copy link
Copy Markdown
Collaborator

@gavrielc — This and PR #281 both implement WhatsApp media. #128 is simpler with 17 tests and an engaged author. #281 is more comprehensive (MIME safety, bidirectional send, nested message unwrapping). The author of #128 did a thorough comparison in the comments. Could you decide which approach to go with?

youyouhe pushed a commit to youyouhe/nanoclaw that referenced this pull request Feb 24, 2026
Adopt PR qwibitai#128's modular structure: move media download/detection
into a separate whatsapp-media.ts with dedicated tests (36 tests).
Also adopt deterministic message-ID-based filenames and per-type
default extensions, while keeping our MIME safety checks, nested
message unwrapping, and structured DB metadata.
@Andy-NanoClaw-AI Andy-NanoClaw-AI added PR: Feature New feature or enhancement Status: Blocked Blocked by merge conflicts or dependencies labels Mar 5, 2026
@gavrielc gavrielc requested a review from gabi-simons as a code owner March 6, 2026 10:17
@Andy-NanoClaw-AI
Copy link
Copy Markdown
Collaborator

Hey @baijunjie 👋 Thank you for this — automatically downloading and surfacing WhatsApp media to agents is exactly the kind of quality-of-life improvement NanoClaw needs!

This feature was subsequently implemented and merged in #770 (the image vision skill), which handles media delivery to container agents. This PR also has merge conflicts with the current codebase.

We're adding Status: Pending Closure. Your idea was spot on — thanks for the contribution! 🙌

@Andy-NanoClaw-AI Andy-NanoClaw-AI added the Status: Pending Closure PR flagged for closure during triage label Mar 7, 2026
@baijunjie
Copy link
Copy Markdown
Contributor Author

Glad to see NanoClaw has implemented image download capability. This PR is no longer needed and can be closed.

@baijunjie baijunjie closed this Mar 8, 2026
Copilot AI pushed a commit to youyouhe/nanoclaw that referenced this pull request Mar 13, 2026
Adopt PR qwibitai#128's modular structure: move media download/detection
into a separate whatsapp-media.ts with dedicated tests (36 tests).
Also adopt deterministic message-ID-based filenames and per-type
default extensions, while keeping our MIME safety checks, nested
message unwrapping, and structured DB metadata.
kenansun-dev-bot bot pushed a commit to kenansun-dev/nanoclaw-github-copilot that referenced this pull request Apr 12, 2026
Agent was too conservative — told users 'cannot restart yourself'.
In host mode, agent can edit config + run nanoclaw restart via bash.
Updated model change flow and removed restart from 'cannot do' list.

Co-authored-by: Kenan Rpi5 Claw <rpi5-claw@nanoclaw.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: Feature New feature or enhancement Status: Blocked Blocked by merge conflicts or dependencies Status: Pending Closure PR flagged for closure during triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants