Skip to content

feat(tools): add send_document tool for file sharing to channels#464

Merged
penso merged 7 commits intomainfrom
feat/send-document-v2
Mar 23, 2026
Merged

feat(tools): add send_document tool for file sharing to channels#464
penso merged 7 commits intomainfrom
feat/send-document-v2

Conversation

@penso
Copy link
Copy Markdown
Collaborator

@penso penso commented Mar 23, 2026

Summary

  • Add send_document tool for sending local files (PDF, CSV, DOCX, TXT, JSON, ZIP, etc.) to conversation channels (Telegram, Discord, Slack)
  • Extract shared file_io module from send_image for consistent host/sandbox file reads with size validation and TOCTOU protection
  • Deduplicate MIME maps into moltis_media::mime (single source of truth for extension↔MIME mapping)
  • Add filename field to MediaAttachment so channel outbounds use original filenames instead of generic placeholders
  • Add Content-Disposition header to media API with proper filename sanitization
  • Add container_cli() with OnceLock cache for podman/docker auto-detection
  • Normalize podman localhost/ prefix in sandbox image listings
  • Document card UI component with file-type icon, filename, size, and download/open button

Resurrects the good parts from #353, cleanly rebased on current main.

Closes

Validation

Completed

  • cargo check passes
  • cargo +nightly-2025-11-30 fmt --all -- --check passes
  • cargo +nightly-2025-11-30 clippy --workspace --all-targets -- -D warnings passes
  • biome check --write passes
  • send_document unit tests (21 tests pass)
  • file_io unit tests (7 tests pass)
  • media MIME tests (13 tests pass)
  • common types tests (15 tests pass)
  • web API content_disposition tests (11 tests pass)

Remaining

  • cargo test (full suite)
  • E2E tests: cd crates/web/ui && npx playwright test e2e/specs/send-document.spec.js
  • ./scripts/local-validate.sh

Manual QA

  1. Start gateway, open web UI chat
  2. Ask the LLM to create and send a PDF file — verify document card renders with icon, filename, size, and Open button
  3. Ask the LLM to send a ZIP file — verify Download button (not Open)
  4. Verify documents arrive in connected channels (Telegram/Discord/Slack) with correct filename
  5. Check /api/sessions/<key>/media/<file> returns correct Content-Disposition header

Add a new send_document tool that allows sending local files (PDF, CSV,
DOCX, TXT, JSON, ZIP, etc.) to conversation channels. Key changes:

- New SendDocumentTool with blocked-extension security list and MIME
  validation via moltis_media::mime
- Extract shared file_io module from send_image for host/sandbox reads
  with size validation and TOCTOU protection
- Media-ref persistence path: saves files to session media dir, avoiding
  base64 in JSON/WS/persistence; falls back to data URI without store
- Add filename field to MediaAttachment for original filenames in channel
  outbounds (Discord, Slack, Telegram)
- Document card UI component with icon, filename, size, and
  download/open button
- Deduplicate MIME maps: single source of truth in moltis_media::mime,
  used by api.rs, send_image, and send_document
- Cache container_cli() with OnceLock to avoid repeated subprocess spawns
- Add Content-Disposition header to media API with proper sanitization
- Podman support for image_cache, sandbox listing, swift-bridge, and
  web API check-packages
- E2E tests for document card rendering (PDF, ZIP, CSV)

Closes #337
Closes #332

Entire-Checkpoint: 5b7cc2977dc0
@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented Mar 23, 2026

Merging this PR will not alter performance

✅ 39 untouched benchmarks
⏩ 5 skipped benchmarks1


Comparing feat/send-document-v2 (b1467bc) with main (346e888)

Open in CodSpeed

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 54.93671% with 178 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/chat/src/lib.rs 17.64% 140 Missing ⚠️
crates/web/src/api.rs 80.95% 16 Missing ⚠️
crates/gateway/src/server.rs 0.00% 9 Missing ⚠️
crates/slack/src/outbound.rs 0.00% 5 Missing ⚠️
crates/discord/src/outbound.rs 0.00% 4 Missing ⚠️
crates/telegram/src/outbound.rs 0.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 23, 2026

Greptile Summary

This PR adds a send_document tool that allows the LLM to share local files (PDF, CSV, DOCX, TXT, JSON, ZIP, etc.) to connected conversation channels (Telegram, Discord, Slack), with a matching document card UI. The implementation is well-engineered: it extracts a shared file_io module from send_image, deduplicates MIME mapping into moltis_media::mime, adds a filename field to MediaAttachment, introduces Content-Disposition handling for the media API, and replaces hard-coded docker references with auto-detected podman/docker CLI selection cached via OnceLock.

Key points from the review:

  • Security – extension blocklist gap: .mjs and .cjs (Node.js ES / CommonJS module files) are absent from BLOCKED_EXTENSIONS despite .js being blocked. Both are directly executable with node and carry identical risk.
  • UX inconsistency – "Open" button for HTML files: The isText check in helpers.js matches text/html, rendering an "Open" button for HTML documents. The server correctly forces Content-Disposition: attachment for HTML, so clicking "Open" triggers a download rather than opening the file — misleading to users.
  • Non-ASCII filenames in Content-Disposition: The filename sanitization strips control characters but leaves Unicode characters in the quoted filename parameter, which violates RFC 7230. Browsers handle non-ASCII in quoted-string filename inconsistently; RFC 5987/8187 filename* encoding should be used for non-ASCII names.
  • The startup warmup of the container_cli() OnceLock via spawn_blocking correctly resolves the prior async-blocking concern.
  • The XSS concern from serving text/html inline has been correctly addressed by removing it from the inline disposition list.

Confidence Score: 4/5

  • Safe to merge with minor fixes; the .mjs/.cjs gap is the only logical issue and has limited practical impact in a controlled deployment.
  • The PR is well-tested (55+ unit tests), cleanly architected, and the previous critical concerns (async blocking, stored XSS) have been fixed. The remaining issues are: one logic gap (missing .mjs/.cjs in the blocklist), one UX mismatch (HTML "Open" button), and one RFC compliance issue (non-ASCII Content-Disposition filenames). None of these are blocking for most deployments but the blocklist gap warrants a quick fix before merge.
  • crates/tools/src/send_document.rs (blocked extensions list), crates/web/src/assets/js/helpers.js (isText check), crates/web/src/api.rs (Content-Disposition filename encoding)

Important Files Changed

Filename Overview
crates/tools/src/send_document.rs New send_document tool: well-structured with good security practices (extension blocklist, size limits, TOCTOU guard, sandbox routing). One gap: .mjs/.cjs (Node.js module extensions) are not blocked alongside .js.
crates/tools/src/file_io.rs New shared file-reading module extracted from send_image. Clean design with size validation, TOCTOU protection, and proper sandbox routing. Shell quoting via shell_single_quote is correct and well-tested.
crates/web/src/api.rs Adds Content-Disposition header with filename sanitization and inline/attachment policy. HTML correctly forced to attachment. Non-ASCII characters in filenames not sanitized per RFC 7230/5987, which could cause browser inconsistencies.
crates/web/src/assets/js/helpers.js Adds renderDocument function for the document card UI. isText check includes text/html, showing a misleading "Open" button for HTML files despite the server sending attachment disposition.
crates/chat/src/lib.rs Adds document dispatch logic alongside the existing screenshot path. Cleanly separates document_ref (media-dir) and legacy document (data URI) paths. Properly uses tokio::spawn for async channel dispatch without blocking the main loop.
crates/tools/src/sandbox/containers.rs Adds container_cli() with OnceLock cache, warmed at startup via spawn_blocking. Podman localhost/ prefix normalization is correct. Previous async-blocking concern has been addressed.
crates/media/src/mime.rs Expands MIME mapping tables and adds mime_from_extension delegating to mime_guess. Good deduplication of the previously scattered MIME maps. Well-tested with round-trip assertions.
crates/common/src/types.rs Adds optional filename field to MediaAttachment with backward-compatible serde defaults. Straightforward and well-tested.

Sequence Diagram

sequenceDiagram
    participant LLM as LLM Agent
    participant Tool as SendDocumentTool
    participant FileIO as file_io
    participant Store as SessionStore
    participant Chat as chat::run_with_tools
    participant Outbound as ChannelOutbound
    participant API as /api/sessions/.../media

    LLM->>Tool: execute({ path, caption })
    Tool->>Tool: validate extension (blocklist + MIME check)
    Tool->>FileIO: read_file_for_session(router, session, path)
    alt sandboxed session
        FileIO->>FileIO: read_sandbox_file (base64 via exec)
    else host session
        FileIO->>FileIO: read_host_file (size check + TOCTOU guard)
    end
    FileIO-->>Tool: Vec<u8>

    alt SessionStore available
        Tool->>Store: save_media(session, uuid_filename, bytes)
        Store-->>Tool: media_ref ("media/main/abc_report.pdf")
        Tool-->>Chat: { document_ref, mime_type, filename, size_bytes }
        Chat->>Store: read_media(session, ref_filename)
        Store-->>Chat: Vec<u8>
        Chat->>Chat: base64-encode → data URI
    else no SessionStore
        Tool-->>Chat: { document: "data:...", filename }
    end

    Chat->>Outbound: send_media(account, chat, payload, reply_to)
    Outbound-->>LLM: (async, fire-and-forget via tokio::spawn)

    Note over API: Media serving
    API->>Store: read_media(session, filename)
    Store-->>API: Vec<u8>
    API-->>API: mime_from_extension → Content-Type
    API-->>API: media_content_disposition → Content-Disposition
    API-->>LLM: file bytes with headers
Loading

Comments Outside Diff (3)

  1. crates/tools/src/send_document.rs, line 1155-1158 (link)

    .mjs and .cjs missing from blocked extensions

    js is correctly blocked, but .mjs (ES module syntax) and .cjs (CommonJS module syntax) are absent from the list. Both are directly executable via node file.mjs / node file.cjs and carry the same risk as a .js file. A recipient who receives a .mjs attachment can run it immediately with Node.js.

  2. crates/web/src/assets/js/helpers.js, line 2227-2236 (link)

    isText matches text/html, showing misleading "Open" button

    (mimeType || "").startsWith("text/") is true for text/html, so HTML documents rendered in the document card will show an "⬆ Open" button with target="_blank". However, the server deliberately sends Content-Disposition: attachment for text/html (to prevent stored XSS). When a user clicks "Open", the browser will trigger a download instead of opening the file — a confusing mismatch between the button label and the actual behavior.

    Consider explicitly excluding text/html from the inline branch:

  3. crates/web/src/api.rs, line 2011-2016 (link)

    Non-ASCII filenames in Content-Disposition are not RFC-compliant

    The current sanitization strips only a few ASCII control characters (", \n, \r, ;, \) but leaves arbitrary non-ASCII (Unicode) characters in the filename quoted-string. RFC 7230 requires HTTP header field values to be composed of visible ASCII characters (or horizontal tab). Non-ASCII bytes embedded in a quoted string are technically invalid and are handled inconsistently across browsers — some garble the name, some drop it, and some crash the parse.

    The RFC 5987 / RFC 8187 filename* parameter with UTF-8''%XX percent-encoding is the correct mechanism for non-ASCII filenames. A minimal fix is to strip (or percent-encode) all non-ASCII characters from the quoted filename parameter, while adding a filename* parameter for Unicode names:

    // ASCII-only fallback filename
    let ascii_name: String = safe_name.chars().filter(|c| c.is_ascii()).collect();
    if ascii_name == safe_name {
        format!("{disposition}; filename=\"{safe_name}\"")
    } else {
        let encoded: String = safe_name
            .bytes()
            .flat_map(|b| {
                if b.is_ascii_alphanumeric() || b".-_~".contains(&b) {
                    vec![b as char]
                } else {
                    format!("%{b:02X}").chars().collect()
                }
            })
            .collect();
        format!(
            "{disposition}; filename=\"{ascii_name}\"; filename*=UTF-8''{encoded}"
        )
    }

Reviews (4): Last reviewed commit: "fix(e2e): use system-event RPC to dispat..." | Re-trigger Greptile

Comment on lines +938 to +946
pub fn container_cli() -> &'static str {
static CLI: std::sync::OnceLock<&str> = std::sync::OnceLock::new();
CLI.get_or_init(|| {
if is_cli_available("podman") {
"podman"
} else {
"docker"
}
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Blocking subprocess in OnceLock init from async context

container_cli() calls is_cli_available("podman"), which uses std::process::Command (blocking I/O) inside OnceLock::get_or_init. This function is now called from DockerImageBuilder::new() (a sync constructor) and directly from api_check_packages_handler (an async Axum handler). On the first invocation, it will block whichever Tokio worker thread happens to run it.

CLAUDE.md states "never use block_on inside async context" — the spirit is the same: don't block async workers with synchronous I/O. Consider using tokio::task::spawn_blocking for the detection call, or performing the detection eagerly at server startup (sync context before the runtime starts) rather than lazily inside the lock.

// In async handlers, the call should be deferred:
let cli = tokio::task::spawn_blocking(container_cli).await?;

Or, initialize container_cli() from a known-sync startup path (e.g., during prepare_gateway before the first .await) so the OnceLock is already warm before any async context hits it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — warmed the container_cli() OnceLock at startup via spawn_blocking in spawn_post_listener_warmups. By the time any async handler runs, the lock is already initialized and returns instantly.

penso added 2 commits March 23, 2026 10:54
Replaces the hand-maintained 30-arm match in mime_from_extension with
mime_guess::from_ext(), keeping manual overrides only for extensions
the crate doesn't cover (text, log, ppm). This broadens format support
automatically and reduces maintenance surface.

Entire-Checkpoint: cf6ef2abee12
Avoids blocking an async worker thread on the first call to
container_cli() by eagerly initializing the OnceLock from
spawn_post_listener_warmups. Subsequent calls from async handlers
just read the cached value.

Entire-Checkpoint: 15dd0bc8c257
Comment thread crates/web/src/api.rs Outdated
|| content_type.starts_with("audio/")
|| matches!(
content_type,
"application/pdf" | "text/plain" | "text/html" | "text/csv" | "text/markdown"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Serving text/html inline enables stored XSS

text/html is included in the inline disposition list, which tells the browser to render the file directly from the same application origin. Since the LLM can generate and save arbitrary HTML files via send_document (.html is in the supported extensions), this creates a stored XSS path:

  1. An adversarial prompt injection tricks the LLM into generating an HTML file with malicious <script> content.
  2. send_document saves it to the session media directory.
  3. A user clicks "Open" in the document card — their browser renders the HTML on the app's own origin.
  4. The injected script runs with full access to localStorage, cookies, and the session API.

text/csv and text/markdown are less risky because browsers don't execute them, but text/html is the primary XSS attack surface. Consider removing it from the inline list so HTML files always trigger a download:

Suggested change
"application/pdf" | "text/plain" | "text/html" | "text/csv" | "text/markdown"
"application/pdf" | "text/plain" | "text/csv" | "text/markdown"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed text/html from the inline disposition list. HTML files now always trigger a download (attachment) to prevent stored XSS.

Comment on lines +31 to +43
const BLOCKED_EXTENSIONS: &[&str] = &[
"exe", "bat", "cmd", "com", "msi", "scr", "pif", // Windows executables
"sh", "bash", "zsh", "csh", "ksh", // Shell scripts
"dll", "so", "dylib", // Shared libraries
"app", "dmg", "pkg", // macOS
"deb", "rpm", // Linux packages
"ps1", "psm1", "psd1", // PowerShell
"vbs", "vbe", "js", "jse", "wsf", "wsh", // Script engines
"reg", "inf", // Windows registry / setup
];

/// Document-sending tool.
#[derive(Default)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Common scripting languages missing from blocked list

The blocked list targets Windows executables, Unix shell scripts, and Windows script engines, but omits several common interpreted languages that are also executable as scripts on many platforms:

  • py / pyw — Python (directly executable via python, python3, or #!)
  • php — PHP scripts
  • rb — Ruby
  • pl — Perl
  • lua — Lua
  • tcl — Tcl

On a typical Linux/macOS host, python script.py or a #!/usr/bin/env python shebang can execute these just as effectively as a .sh file. If the intent of this list is to prevent sharing executable/scriptable files that could be run on the recipient's machine, these formats should also be blocked.

Suggested change
const BLOCKED_EXTENSIONS: &[&str] = &[
"exe", "bat", "cmd", "com", "msi", "scr", "pif", // Windows executables
"sh", "bash", "zsh", "csh", "ksh", // Shell scripts
"dll", "so", "dylib", // Shared libraries
"app", "dmg", "pkg", // macOS
"deb", "rpm", // Linux packages
"ps1", "psm1", "psd1", // PowerShell
"vbs", "vbe", "js", "jse", "wsf", "wsh", // Script engines
"reg", "inf", // Windows registry / setup
];
/// Document-sending tool.
#[derive(Default)]
const BLOCKED_EXTENSIONS: &[&str] = &[
"exe", "bat", "cmd", "com", "msi", "scr", "pif", // Windows executables
"sh", "bash", "zsh", "csh", "ksh", // Shell scripts
"dll", "so", "dylib", // Shared libraries
"app", "dmg", "pkg", // macOS
"deb", "rpm", // Linux packages
"ps1", "psm1", "psd1", // PowerShell
"vbs", "vbe", "js", "jse", "wsf", "wsh", // Script engines
"reg", "inf", // Windows registry / setup
"py", "pyw", "php", "rb", "pl", "lua", "tcl", // Interpreted scripts
];

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added py, pyw, php, rb, pl, lua, tcl to the blocked extensions list.

penso added 4 commits March 23, 2026 11:25
…tensions

- Remove text/html from inline Content-Disposition list so HTML files
  always trigger a download instead of rendering on our origin.
- Add interpreted script extensions (py, pyw, php, rb, pl, lua, tcl)
  to the send_document blocked list.

Entire-Checkpoint: 32a224dd48e0
Entire-Checkpoint: 05f0dd7e8156
The tests were calling events.eventListeners.forEach() treating it as
an array, but it's an object keyed by event name. Dispatch through
eventListeners["chat"] with proper tool_call_start/tool_call_end
payloads matching the real WebSocket frame format.

Entire-Checkpoint: 71aae0eb4a43
…nt tests

The previous approach dispatched events via eventListeners["chat"]
which bypasses the internal eventHandlers map. Use the system-event
RPC (same pattern as websocket.spec.js) which goes through the full
server dispatch chain including handleChatEvent.

Entire-Checkpoint: 308b42036d7a
@penso penso merged commit 3e7661e into main Mar 23, 2026
45 of 55 checks passed
@penso penso deleted the feat/send-document-v2 branch March 23, 2026 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant