feat(tools): add send_document tool for file sharing to channels#464
feat(tools): add send_document tool for file sharing to channels#464
Conversation
Add a new send_document tool that allows sending local files (PDF, CSV, DOCX, TXT, JSON, ZIP, etc.) to conversation channels. Key changes: - New SendDocumentTool with blocked-extension security list and MIME validation via moltis_media::mime - Extract shared file_io module from send_image for host/sandbox reads with size validation and TOCTOU protection - Media-ref persistence path: saves files to session media dir, avoiding base64 in JSON/WS/persistence; falls back to data URI without store - Add filename field to MediaAttachment for original filenames in channel outbounds (Discord, Slack, Telegram) - Document card UI component with icon, filename, size, and download/open button - Deduplicate MIME maps: single source of truth in moltis_media::mime, used by api.rs, send_image, and send_document - Cache container_cli() with OnceLock to avoid repeated subprocess spawns - Add Content-Disposition header to media API with proper sanitization - Podman support for image_cache, sandbox listing, swift-bridge, and web API check-packages - E2E tests for document card rendering (PDF, ZIP, CSV) Closes #337 Closes #332 Entire-Checkpoint: 5b7cc2977dc0
Merging this PR will not alter performance
Comparing Footnotes
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR adds a Key points from the review:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant LLM as LLM Agent
participant Tool as SendDocumentTool
participant FileIO as file_io
participant Store as SessionStore
participant Chat as chat::run_with_tools
participant Outbound as ChannelOutbound
participant API as /api/sessions/.../media
LLM->>Tool: execute({ path, caption })
Tool->>Tool: validate extension (blocklist + MIME check)
Tool->>FileIO: read_file_for_session(router, session, path)
alt sandboxed session
FileIO->>FileIO: read_sandbox_file (base64 via exec)
else host session
FileIO->>FileIO: read_host_file (size check + TOCTOU guard)
end
FileIO-->>Tool: Vec<u8>
alt SessionStore available
Tool->>Store: save_media(session, uuid_filename, bytes)
Store-->>Tool: media_ref ("media/main/abc_report.pdf")
Tool-->>Chat: { document_ref, mime_type, filename, size_bytes }
Chat->>Store: read_media(session, ref_filename)
Store-->>Chat: Vec<u8>
Chat->>Chat: base64-encode → data URI
else no SessionStore
Tool-->>Chat: { document: "data:...", filename }
end
Chat->>Outbound: send_media(account, chat, payload, reply_to)
Outbound-->>LLM: (async, fire-and-forget via tokio::spawn)
Note over API: Media serving
API->>Store: read_media(session, filename)
Store-->>API: Vec<u8>
API-->>API: mime_from_extension → Content-Type
API-->>API: media_content_disposition → Content-Disposition
API-->>LLM: file bytes with headers
|
| pub fn container_cli() -> &'static str { | ||
| static CLI: std::sync::OnceLock<&str> = std::sync::OnceLock::new(); | ||
| CLI.get_or_init(|| { | ||
| if is_cli_available("podman") { | ||
| "podman" | ||
| } else { | ||
| "docker" | ||
| } | ||
| }) |
There was a problem hiding this comment.
Blocking subprocess in
OnceLock init from async context
container_cli() calls is_cli_available("podman"), which uses std::process::Command (blocking I/O) inside OnceLock::get_or_init. This function is now called from DockerImageBuilder::new() (a sync constructor) and directly from api_check_packages_handler (an async Axum handler). On the first invocation, it will block whichever Tokio worker thread happens to run it.
CLAUDE.md states "never use block_on inside async context" — the spirit is the same: don't block async workers with synchronous I/O. Consider using tokio::task::spawn_blocking for the detection call, or performing the detection eagerly at server startup (sync context before the runtime starts) rather than lazily inside the lock.
// In async handlers, the call should be deferred:
let cli = tokio::task::spawn_blocking(container_cli).await?;Or, initialize container_cli() from a known-sync startup path (e.g., during prepare_gateway before the first .await) so the OnceLock is already warm before any async context hits it.
There was a problem hiding this comment.
Fixed — warmed the container_cli() OnceLock at startup via spawn_blocking in spawn_post_listener_warmups. By the time any async handler runs, the lock is already initialized and returns instantly.
Replaces the hand-maintained 30-arm match in mime_from_extension with mime_guess::from_ext(), keeping manual overrides only for extensions the crate doesn't cover (text, log, ppm). This broadens format support automatically and reduces maintenance surface. Entire-Checkpoint: cf6ef2abee12
Avoids blocking an async worker thread on the first call to container_cli() by eagerly initializing the OnceLock from spawn_post_listener_warmups. Subsequent calls from async handlers just read the cached value. Entire-Checkpoint: 15dd0bc8c257
| || content_type.starts_with("audio/") | ||
| || matches!( | ||
| content_type, | ||
| "application/pdf" | "text/plain" | "text/html" | "text/csv" | "text/markdown" |
There was a problem hiding this comment.
Serving
text/html inline enables stored XSS
text/html is included in the inline disposition list, which tells the browser to render the file directly from the same application origin. Since the LLM can generate and save arbitrary HTML files via send_document (.html is in the supported extensions), this creates a stored XSS path:
- An adversarial prompt injection tricks the LLM into generating an HTML file with malicious
<script>content. send_documentsaves it to the session media directory.- A user clicks "Open" in the document card — their browser renders the HTML on the app's own origin.
- The injected script runs with full access to
localStorage, cookies, and the session API.
text/csv and text/markdown are less risky because browsers don't execute them, but text/html is the primary XSS attack surface. Consider removing it from the inline list so HTML files always trigger a download:
| "application/pdf" | "text/plain" | "text/html" | "text/csv" | "text/markdown" | |
| "application/pdf" | "text/plain" | "text/csv" | "text/markdown" |
There was a problem hiding this comment.
Fixed — removed text/html from the inline disposition list. HTML files now always trigger a download (attachment) to prevent stored XSS.
| const BLOCKED_EXTENSIONS: &[&str] = &[ | ||
| "exe", "bat", "cmd", "com", "msi", "scr", "pif", // Windows executables | ||
| "sh", "bash", "zsh", "csh", "ksh", // Shell scripts | ||
| "dll", "so", "dylib", // Shared libraries | ||
| "app", "dmg", "pkg", // macOS | ||
| "deb", "rpm", // Linux packages | ||
| "ps1", "psm1", "psd1", // PowerShell | ||
| "vbs", "vbe", "js", "jse", "wsf", "wsh", // Script engines | ||
| "reg", "inf", // Windows registry / setup | ||
| ]; | ||
|
|
||
| /// Document-sending tool. | ||
| #[derive(Default)] |
There was a problem hiding this comment.
Common scripting languages missing from blocked list
The blocked list targets Windows executables, Unix shell scripts, and Windows script engines, but omits several common interpreted languages that are also executable as scripts on many platforms:
py/pyw— Python (directly executable viapython,python3, or#!)php— PHP scriptsrb— Rubypl— Perllua— Luatcl— Tcl
On a typical Linux/macOS host, python script.py or a #!/usr/bin/env python shebang can execute these just as effectively as a .sh file. If the intent of this list is to prevent sharing executable/scriptable files that could be run on the recipient's machine, these formats should also be blocked.
| const BLOCKED_EXTENSIONS: &[&str] = &[ | |
| "exe", "bat", "cmd", "com", "msi", "scr", "pif", // Windows executables | |
| "sh", "bash", "zsh", "csh", "ksh", // Shell scripts | |
| "dll", "so", "dylib", // Shared libraries | |
| "app", "dmg", "pkg", // macOS | |
| "deb", "rpm", // Linux packages | |
| "ps1", "psm1", "psd1", // PowerShell | |
| "vbs", "vbe", "js", "jse", "wsf", "wsh", // Script engines | |
| "reg", "inf", // Windows registry / setup | |
| ]; | |
| /// Document-sending tool. | |
| #[derive(Default)] | |
| const BLOCKED_EXTENSIONS: &[&str] = &[ | |
| "exe", "bat", "cmd", "com", "msi", "scr", "pif", // Windows executables | |
| "sh", "bash", "zsh", "csh", "ksh", // Shell scripts | |
| "dll", "so", "dylib", // Shared libraries | |
| "app", "dmg", "pkg", // macOS | |
| "deb", "rpm", // Linux packages | |
| "ps1", "psm1", "psd1", // PowerShell | |
| "vbs", "vbe", "js", "jse", "wsf", "wsh", // Script engines | |
| "reg", "inf", // Windows registry / setup | |
| "py", "pyw", "php", "rb", "pl", "lua", "tcl", // Interpreted scripts | |
| ]; |
There was a problem hiding this comment.
Fixed — added py, pyw, php, rb, pl, lua, tcl to the blocked extensions list.
…tensions - Remove text/html from inline Content-Disposition list so HTML files always trigger a download instead of rendering on our origin. - Add interpreted script extensions (py, pyw, php, rb, pl, lua, tcl) to the send_document blocked list. Entire-Checkpoint: 32a224dd48e0
Entire-Checkpoint: 05f0dd7e8156
The tests were calling events.eventListeners.forEach() treating it as an array, but it's an object keyed by event name. Dispatch through eventListeners["chat"] with proper tool_call_start/tool_call_end payloads matching the real WebSocket frame format. Entire-Checkpoint: 71aae0eb4a43
…nt tests The previous approach dispatched events via eventListeners["chat"] which bypasses the internal eventHandlers map. Use the system-event RPC (same pattern as websocket.spec.js) which goes through the full server dispatch chain including handleChatEvent. Entire-Checkpoint: 308b42036d7a
Summary
send_documenttool for sending local files (PDF, CSV, DOCX, TXT, JSON, ZIP, etc.) to conversation channels (Telegram, Discord, Slack)file_iomodule fromsend_imagefor consistent host/sandbox file reads with size validation and TOCTOU protectionmoltis_media::mime(single source of truth for extension↔MIME mapping)filenamefield toMediaAttachmentso channel outbounds use original filenames instead of generic placeholdersContent-Dispositionheader to media API with proper filename sanitizationcontainer_cli()withOnceLockcache for podman/docker auto-detectionlocalhost/prefix in sandbox image listingsResurrects the good parts from #353, cleanly rebased on current main.
Closes
Validation
Completed
cargo checkpassescargo +nightly-2025-11-30 fmt --all -- --checkpassescargo +nightly-2025-11-30 clippy --workspace --all-targets -- -D warningspassesbiome check --writepassesRemaining
cargo test(full suite)cd crates/web/ui && npx playwright test e2e/specs/send-document.spec.js./scripts/local-validate.shManual QA
/api/sessions/<key>/media/<file>returns correctContent-Dispositionheader