feat: download and store WhatsApp media for agent access#128
feat: download and store WhatsApp media for agent access#128baijunjie wants to merge 1 commit intoqwibitai:mainfrom
Conversation
cddc04c to
4786d6e
Compare
4786d6e to
bf4d2e4
Compare
|
I noticed there was a major refactor (index split into channels/whatsapp, ipc, router modules). I've rebased and updated the code — this is now built on top of the latest version. |
|
Changes needed:
|
a618ae4 to
ba46995
Compare
|
@TomGranot Updated and rebased on the latest main. Also added comprehensive tests:
All 159 tests passing. |
ba46995 to
6500dd6
Compare
|
I've rebased and updated the base branch to the latest version. I'd like to know if this PR is still on track to be merged, or if there's something that doesn't meet the requirements? Or perhaps no one really needs the AI assistant to support image viewing? Apologies for pressing — I think this will be my last update to this PR. If it's not going to be merged, please feel free to close it. |
Add media download support for registered groups. When a message
contains an image, video, audio, document, or sticker, it is
downloaded and saved to groups/{folder}/media/. The container path
is prepended to the message content as [media: /workspace/group/media/filename]
so the agent can access the file.
Add unit tests for whatsapp-media module and integration tests for
media download in the WhatsApp channel handler. Fix existing test
compatibility (add Browsers/GROUPS_DIR mocks, async handler awaits).
6500dd6 to
3a2d98d
Compare
|
Heads up — PR #281 also implements WhatsApp media download and file send support. You might want to compare approaches and see what each can learn from the other. |
|
@TomGranot Thanks for the heads up. I've done a detailed comparison between PR #281 and this PR. Here's what I found: Architecture
Media Download
Data Persistence
Media Send (PR #281 only)PR #281 also implements the full file-send pipeline from agent container back to WhatsApp:
TakeawaysPR #281 is more complete: MIME safety checks, nested message unwrapping, structured media metadata in DB, bidirectional file transfer. This PR is simpler and more testable: separate module with dedicated unit tests, deterministic message-ID-based naming (avoids duplicate downloads), but lacks nested message unwrapping (ephemeral/viewOnce media won't download) and MIME safety checks. I think the key gaps on my side are:
Happy to incorporate any of these improvements if this PR is still being considered. |
|
@gavrielc — This and PR #281 both implement WhatsApp media. #128 is simpler with 17 tests and an engaged author. #281 is more comprehensive (MIME safety, bidirectional send, nested message unwrapping). The author of #128 did a thorough comparison in the comments. Could you decide which approach to go with? |
Adopt PR qwibitai#128's modular structure: move media download/detection into a separate whatsapp-media.ts with dedicated tests (36 tests). Also adopt deterministic message-ID-based filenames and per-type default extensions, while keeping our MIME safety checks, nested message unwrapping, and structured DB metadata.
|
Hey @baijunjie 👋 Thank you for this — automatically downloading and surfacing WhatsApp media to agents is exactly the kind of quality-of-life improvement NanoClaw needs! This feature was subsequently implemented and merged in #770 (the image vision skill), which handles media delivery to container agents. This PR also has merge conflicts with the current codebase. We're adding Status: Pending Closure. Your idea was spot on — thanks for the contribution! 🙌 |
|
Glad to see NanoClaw has implemented image download capability. This PR is no longer needed and can be closed. |
Adopt PR qwibitai#128's modular structure: move media download/detection into a separate whatsapp-media.ts with dedicated tests (36 tests). Also adopt deterministic message-ID-based filenames and per-type default extensions, while keeping our MIME safety checks, nested message unwrapping, and structured DB metadata.
Agent was too conservative — told users 'cannot restart yourself'. In host mode, agent can edit config + run nanoclaw restart via bash. Updated model change flow and removed restart from 'cannot do' list. Co-authored-by: Kenan Rpi5 Claw <rpi5-claw@nanoclaw.dev>
Summary
When a registered group receives a media message (image, video, audio, document, or sticker), NanoClaw now automatically downloads the file and saves it to
groups/{folder}/media/. The container-accessible path is prepended to the message content as[media: /workspace/group/media/...], so agents can read and use the file directly.No database schema changes — the media path is embedded in the existing
contentfield. The existing container mount already makes the media directory visible, so no mount changes are needed either.Motivation
I'm working on letting the assistant post tweets with images that users send via WhatsApp. Currently NanoClaw only extracts caption text from media messages and discards the actual file, so the agent has no way to access user-sent images. This PR fixes that by downloading and storing media on disk, making it available for the agent to use in integrations like X image posting.