Onboarding: select bundled Telegram channel and auto-install#7
Closed
serrrfirat wants to merge 3 commits intonearai:mainfrom
Closed
Onboarding: select bundled Telegram channel and auto-install#7serrrfirat wants to merge 3 commits intonearai:mainfrom
serrrfirat wants to merge 3 commits intonearai:mainfrom
Conversation
Adds bidirectional WebSocket transport to the web gateway alongside the existing SSE stream. Clients can send messages, approvals, and pings over a single persistent connection at /api/chat/ws. - Enable axum `ws` feature for built-in WebSocket support - Add WsClientMessage/WsServerMessage types with tagged JSON protocol - Add subscribe_raw() to SseManager for non-SSE consumers - Create ws.rs with connection handler (split sender/receiver tasks) - Add WsConnectionTracker for active connection counting - Add /api/gateway/status control plane endpoint (SSE + WS counts) - 35 new tests covering message types, broadcast, and handler logic https://claude.ai/code/session_01KEaLN6Xq2j5EeV3SGHQT6b
- Add tokio-tungstenite dev-dependency for WebSocket client in tests - Update start_server to return actual bound SocketAddr (enables port 0) - Add 10 e2e tests covering full HTTP upgrade → WebSocket → message flow: ping/pong, message routing to agent, broadcast event delivery, connection tracking, invalid message handling, auth rejection, gateway status endpoint, and multi-event sequencing https://claude.ai/code/session_01KEaLN6Xq2j5EeV3SGHQT6b
5 tasks
ilblackdragon
added a commit
that referenced
this pull request
Feb 19, 2026
- Use manifest.name (not crate_name) for installed filenames so discovery, auth, and CLI commands all agree on the stem (#1) - Add AlreadyInstalled error variant instead of misleading ExtensionNotFound (#2) - Add DownloadFailed error variant with URL context instead of stuffing URLs into PathBuf (#3) - Validate HTTP status with error_for_status() before reading response bytes in artifact downloads (#4) - Switch build_wasm_component to tokio::process::Command with status() so build output streams to the terminal (#6) - Find WASM artifact by crate_name specifically instead of picking the first .wasm file in the release directory (#7) - Add is_file() guard in catalog loader to skip directories (#8) - Detect ambiguous bare-name lookups when both tools/<name> and channels/<name> exist, with get_strict() returning an error (#9) - Fix wizard step_extensions to check tool.name for installed detection, consistent with the new naming (#11, #12) - Fix redundant closures and map_or clippy warnings in changed files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon
added a commit
that referenced
this pull request
Feb 20, 2026
- Use manifest.name (not crate_name) for installed filenames so discovery, auth, and CLI commands all agree on the stem (#1) - Add AlreadyInstalled error variant instead of misleading ExtensionNotFound (#2) - Add DownloadFailed error variant with URL context instead of stuffing URLs into PathBuf (#3) - Validate HTTP status with error_for_status() before reading response bytes in artifact downloads (#4) - Switch build_wasm_component to tokio::process::Command with status() so build output streams to the terminal (#6) - Find WASM artifact by crate_name specifically instead of picking the first .wasm file in the release directory (#7) - Add is_file() guard in catalog loader to skip directories (#8) - Detect ambiguous bare-name lookups when both tools/<name> and channels/<name> exist, with get_strict() returning an error (#9) - Fix wizard step_extensions to check tool.name for installed detection, consistent with the new naming (#11, #12) - Fix redundant closures and map_or clippy warnings in changed files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon
added a commit
that referenced
this pull request
Feb 20, 2026
…tion (#238) * feat: add extension registry with metadata catalog, CLI, and onboarding integration Adds a central registry that catalogs all 14 available extensions (10 tools, 4 channels) with their capabilities, auth requirements, and artifact references. The onboarding wizard now shows installable channels from the registry and offers tool installation as a new Step 7. - registry/ folder with per-extension JSON manifests and bundle definitions - src/registry/ module: manifest structs, catalog loader, installer - `ironclaw registry list|info|install|install-defaults` CLI commands - Setup wizard enhanced: channels from registry, new extensions step (8 steps) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(setup): resolve workspace errors for tool crates and channels-only onboarding Tool crates in tools-src/ and channels-src/ failed `cargo metadata` during onboard install because Cargo resolved them as part of the root workspace. Add `[workspace]` table to each standalone crate and extend the root `workspace.exclude` list so they build independently. Channels-only mode (`onboard --channels-only`) failed with "Secrets not configured" and "No database connection" because it skipped database and security setup. Add `reconnect_existing_db()` to establish the DB connection and load saved settings before running channel configuration. Also improve the tunnel "already configured" display to show full provider details (domain, mode, command) instead of just the provider name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(registry): address PR review feedback on installer and catalog - Use manifest.name (not crate_name) for installed filenames so discovery, auth, and CLI commands all agree on the stem (#1) - Add AlreadyInstalled error variant instead of misleading ExtensionNotFound (#2) - Add DownloadFailed error variant with URL context instead of stuffing URLs into PathBuf (#3) - Validate HTTP status with error_for_status() before reading response bytes in artifact downloads (#4) - Switch build_wasm_component to tokio::process::Command with status() so build output streams to the terminal (#6) - Find WASM artifact by crate_name specifically instead of picking the first .wasm file in the release directory (#7) - Add is_file() guard in catalog loader to skip directories (#8) - Detect ambiguous bare-name lookups when both tools/<name> and channels/<name> exist, with get_strict() returning an error (#9) - Fix wizard step_extensions to check tool.name for installed detection, consistent with the new naming (#11, #12) - Fix redundant closures and map_or clippy warnings in changed files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(setup): restore DB connection fields after settings reload reconnect_postgres() and reconnect_libsql() called Settings::from_db_map() which overwrote database_url / libsql_path / libsql_url set from env vars. Also use get_strict() in cmd_info to surface ambiguous bare-name errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix clippy collapsible_if and print_literal warnings Collapse nested if-let chains and inline string literals in format macros to satisfy CI clippy lint checks (deny warnings). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(registry): prefer artifacts for install-defaults and improve dir lookup - InstallDefaults now defaults to downloading pre-built artifacts (matching `registry install` behavior), with --build flag for source builds. - find_registry_dir() walks up 3 ancestor levels from the exe and adds a CARGO_MANIFEST_DIR fallback, matching load_registry_catalog() logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
jaswinder6991
pushed a commit
to jaswinder6991/ironclaw
that referenced
this pull request
Feb 26, 2026
…tion (nearai#238) * feat: add extension registry with metadata catalog, CLI, and onboarding integration Adds a central registry that catalogs all 14 available extensions (10 tools, 4 channels) with their capabilities, auth requirements, and artifact references. The onboarding wizard now shows installable channels from the registry and offers tool installation as a new Step 7. - registry/ folder with per-extension JSON manifests and bundle definitions - src/registry/ module: manifest structs, catalog loader, installer - `ironclaw registry list|info|install|install-defaults` CLI commands - Setup wizard enhanced: channels from registry, new extensions step (8 steps) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(setup): resolve workspace errors for tool crates and channels-only onboarding Tool crates in tools-src/ and channels-src/ failed `cargo metadata` during onboard install because Cargo resolved them as part of the root workspace. Add `[workspace]` table to each standalone crate and extend the root `workspace.exclude` list so they build independently. Channels-only mode (`onboard --channels-only`) failed with "Secrets not configured" and "No database connection" because it skipped database and security setup. Add `reconnect_existing_db()` to establish the DB connection and load saved settings before running channel configuration. Also improve the tunnel "already configured" display to show full provider details (domain, mode, command) instead of just the provider name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(registry): address PR review feedback on installer and catalog - Use manifest.name (not crate_name) for installed filenames so discovery, auth, and CLI commands all agree on the stem (nearai#1) - Add AlreadyInstalled error variant instead of misleading ExtensionNotFound (nearai#2) - Add DownloadFailed error variant with URL context instead of stuffing URLs into PathBuf (nearai#3) - Validate HTTP status with error_for_status() before reading response bytes in artifact downloads (nearai#4) - Switch build_wasm_component to tokio::process::Command with status() so build output streams to the terminal (nearai#6) - Find WASM artifact by crate_name specifically instead of picking the first .wasm file in the release directory (nearai#7) - Add is_file() guard in catalog loader to skip directories (nearai#8) - Detect ambiguous bare-name lookups when both tools/<name> and channels/<name> exist, with get_strict() returning an error (nearai#9) - Fix wizard step_extensions to check tool.name for installed detection, consistent with the new naming (nearai#11, nearai#12) - Fix redundant closures and map_or clippy warnings in changed files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(setup): restore DB connection fields after settings reload reconnect_postgres() and reconnect_libsql() called Settings::from_db_map() which overwrote database_url / libsql_path / libsql_url set from env vars. Also use get_strict() in cmd_info to surface ambiguous bare-name errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix clippy collapsible_if and print_literal warnings Collapse nested if-let chains and inline string literals in format macros to satisfy CI clippy lint checks (deny warnings). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(registry): prefer artifacts for install-defaults and improve dir lookup - InstallDefaults now defaults to downloading pre-built artifacts (matching `registry install` behavior), with --build flag for source builds. - find_registry_dir() walks up 3 ancestor levels from the exe and adds a CARGO_MANIFEST_DIR fallback, matching load_registry_catalog() logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon
added a commit
that referenced
this pull request
Mar 6, 2026
- Switch build script from python3 to jq for JSON parsing, consistent with release.yml and avoids python3 dependency (#1, #7) - Use dirs::home_dir() instead of HOME env var for portability (#2) - Filter extensions by manifest "kind" field instead of path (#3) - Replace .flatten() with explicit error handling in dir iteration (#4, #5) - Split stub_tool_host_functions into stub_shared_host_functions + tool-only tool-invoke stub, since tool-invoke is not in channel WIT (#6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4 tasks
ilblackdragon
added a commit
that referenced
this pull request
Mar 6, 2026
* test: add WIT compatibility tests for all WASM tools and channels Adds CI and integration tests to catch WIT interface breakage across all 14 WASM extensions (10 tools + 4 channels). Previously, changing wit/tool.wit or wit/channel.wit could silently break guest-side tools that weren't rebuilt until release time. Three new pieces: 1. scripts/build-wasm-extensions.sh — builds all WASM extensions from source by reading registry manifests. Used by CI and locally. 2. tests/wit_compat.rs — integration tests that compile and instantiate each .wasm binary against the current wasmtime host linker with stubbed host functions. Catches added/removed/renamed WIT functions, signature mismatches, and missing exports. Skips gracefully when artifacts aren't built so `cargo test` still passes standalone. 3. .github/workflows/test.yml — new wasm-wit-compat CI job that builds all extensions then runs instantiation tests on every PR. Added to the branch protection roll-up. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix rustfmt formatting in wit_compat tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback on WIT compat tests - Switch build script from python3 to jq for JSON parsing, consistent with release.yml and avoids python3 dependency (#1, #7) - Use dirs::home_dir() instead of HOME env var for portability (#2) - Filter extensions by manifest "kind" field instead of path (#3) - Replace .flatten() with explicit error handling in dir iteration (#4, #5) - Split stub_tool_host_functions into stub_shared_host_functions + tool-only tool-invoke stub, since tool-invoke is not in channel WIT (#6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
5 tasks
ilblackdragon
added a commit
that referenced
this pull request
Mar 7, 2026
…lity Security fixes: - Remove SSRF-prone download() from DocumentExtractionMiddleware (#13) - Sanitize filenames in workspace path to prevent directory traversal (#11) - Pre-check file size before reading in WASM wrapper to prevent OOM (#2) - Percent-encode file_id in Telegram source URLs (#7) Correctness fixes: - Clear image_content_parts on turn end to prevent memory leak (#1) - Find first *successful* transcription instead of first overall (#3) - Enforce data.len() size limit in document extraction (#10) - Use UTF-8 safe truncation with char_indices() (#12) Robustness & code quality: - Add 120s timeout to OpenAI Whisper HTTP client (#5) - Trim trailing slash from Whisper base_url (#6) - Allow ~/.ironclaw/ paths in WASM wrapper (#8) - Return error from on_broadcast in Slack/Discord/WhatsApp (#9) - Fix doc comment in HTTP tool (#4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon
added a commit
that referenced
this pull request
Mar 7, 2026
* feat: add inbound attachment support to WASM channel system Add attachment record to WIT interface and implement inbound media parsing across all four channel implementations (Telegram, Slack, WhatsApp, Discord). Attachments flow from WASM channels through EmittedMessage to IncomingMessage with validation (size limits, MIME allowlist, count caps) at the host boundary. - Add `attachment` record to `emitted-message` in wit/channel.wit - Add `IncomingAttachment` struct to channel.rs and re-export - Add host-side validation (20MB total, 10 max, MIME allowlist) - Telegram: parse photo, document, audio, video, voice, sticker - Slack: parse file attachments with url_private - WhatsApp: parse image, audio, video, document with captions - Discord: backward-compatible empty attachments - Update FEATURE_PARITY.md section 7 - Add fixture-based tests per channel and host integration tests [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: integrate outbound attachment support and reconcile WIT types (#409) Reconcile PR #409's outbound attachment work with our inbound attachment support into a unified design: WIT type split: - `inbound-attachment` in channel-host: metadata-only (id, mime_type, filename, size_bytes, source_url, storage_key, extracted_text) - `attachment` in channel: raw bytes (filename, mime_type, data) on agent-response for outbound sending Outbound features (from PR #409): - `on-broadcast` WIT export for proactive messages without prior inbound - Telegram: multipart sendPhoto/sendDocument with auto photo→document fallback for files >10MB - wrapper.rs: `call_on_broadcast`, `read_attachments` from disk, attachment params threaded through `call_on_respond` - HTTP tool: `save_to` param for binary downloads to /tmp/ (50MB limit, path traversal protection, SSRF-safe redirect following) - Message tool: allow /tmp/ paths for attachments alongside base_dir - Credential env var fallback in inject_channel_credentials Channel updates: - All 4 channels implement on_broadcast (Telegram full, others stub) - Telegram: polling_enabled config, adjusted poll timeout - Inbound attachment types renamed to InboundAttachment in all channels Tests: 1965 passing (9 new), 0 clippy warnings [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add audio transcription pipeline and extensible WIT attachment design Add host-side transcription middleware (OpenAI Whisper) that detects audio attachments with inline data on incoming messages and transcribes them automatically. Refactor WIT inbound-attachment to use extras-json and a store-attachment-data host function instead of typed fields, so future attachment properties (dimensions, codec, etc.) don't require WIT changes that invalidate all channel plugins. - Add src/transcription/ module: TranscriptionProvider trait, TranscriptionMiddleware, AudioFormat enum, OpenAI Whisper provider - Add src/config/transcription.rs: TRANSCRIPTION_ENABLED/MODEL/BASE_URL - Wire middleware into agent message loop via AgentDeps - WIT: replace data + duration-secs with extras-json + store-attachment-data - Host: parse extras-json for well-known keys, merge stored binary data - Telegram: download voice files via store-attachment-data, add duration to extras-json, add /file/bot to HTTP allowlist, voice-only placeholder - Add reqwest multipart feature for Whisper API uploads - 5 regression tests for transcription middleware Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: wire attachment processing into LLM pipeline with multimodal image support Attachments on incoming messages are now augmented into user text via XML tags before entering the turn system, and images with data are passed as multimodal content parts (base64 data URIs) to LLM providers. This enables audio transcripts, document text, and image content to reach the LLM without changes to ChatMessage serialization or provider interfaces. - Add src/agent/attachments.rs with augment_with_attachments() and 9 unit tests - Add ContentPart/ImageUrl types to llm::provider with OpenAI-compatible serde - Carry image_content_parts transiently on Turn (skipped in serialization) - Update nearai_chat and rig_adapter to serialize multimodal content - Add 3 e2e tests verifying attachments flow through the full agent loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: CI failures — formatting, version bumps, and Telegram voice test - Fix cargo fmt formatting in attachments.rs, nearai_chat.rs, rig_adapter.rs, e2e_attachments.rs - Bump channel registry versions 0.1.0 → 0.2.0 (discord, slack, telegram, whatsapp) to satisfy version-bump CI check - Fix Telegram test_extract_attachments_voice: add missing required `duration` field to voice fixture JSON Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: bump WIT channel version to 0.3.0, fix Telegram voice test, add pre-commit hook - Bump wit/channel.wit package version 0.2.0 → 0.3.0 (interface changed with store-attachment-data) - Update WIT_CHANNEL_VERSION constant and registry wit_version fields to match - Fix Telegram test_extract_attachments_voice: gate voice download behind #[cfg(target_arch = "wasm32")] so host functions aren't called in native tests, update assertions for generated filename and extras_json duration - Add @0.3.0 linker stubs in wit_compat.rs - Add .githooks/pre-commit hook that runs scripts/check-version-bumps.sh when WIT or extension sources are staged - Symlink commit-msg regression hook into .githooks/ [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: extract voice download from extract_attachments into handle_message Move download_voice_file + store_attachment_data calls out of extract_attachments into a separate download_and_store_voice function called from handle_message. This keeps extract_attachments as a pure data-mapping function with no host calls, making it fully testable in native unit tests without #[cfg(target_arch)] gates. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments — security, correctness, and code quality Security fixes: - Add path validation to read_attachments (restrict to /tmp/) preventing arbitrary file reads from compromised tools - Escape XML special characters in attachment filenames, MIME types, and extracted text to prevent prompt injection via tag spoofing - Percent-encode file_id in Telegram getFile URL to prevent query injection - Clone SecretString directly instead of expose_secret().to_string() Correctness fixes: - Fix store_attachment_data overwrite accounting: subtract old entry size before adding new to prevent inflated totals and false rejections - Use max(reported, stored_size) for attachment size accounting to prevent WASM channels from under-reporting size_bytes to bypass limits - Add application/octet-stream to MIME allowlist (channels default unknown types to this) Code quality: - Extract send_response helper in Telegram, deduplicating on_respond and on_broadcast - Rename misleading Discord test to test_parse_slash_command_interaction - Fix .githooks/commit-msg to use relative symlink (portable across machines) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add tool_upgrade command + fix TOCTOU in save_to path validation Add `tool_upgrade` — a new extension management tool that automatically detects and reinstalls WASM extensions with outdated WIT versions. Preserves authentication secrets during upgrade. Supports upgrading a single extension by name or all installed WASM tools/channels at once. Fix TOCTOU in `validate_save_to_path`: validate the path *before* creating parent directories, so traversal paths like `/tmp/../../etc/` cannot cause filesystem mutations outside /tmp before being rejected. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify WIT package version to 0.3.0 across tool.wit and all capabilities tool.wit and channel.wit share the `near:agent` package namespace, so they must declare the same version. Bumps tool.wit from 0.2.0 to 0.3.0 and updates all capabilities files and registry entries to match. Fixes `cargo component build` failure: "package identifier near:agent@0.2.0 does not match previous package name of near:agent@0.3.0" [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move WIT file comments after package declaration WIT treats `//` comments before `package` as doc comments. When both tool.wit and channel.wit had header comments, the parser rejected them as "doc comments on multiple 'package' items". Move comments after the package declaration in both files. Also bumps tool registry versions to 0.2.0 to match the WIT 0.3.0 bump. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: display extension versions in gateway Extensions tab Add version field to InstalledExtension and RegistryEntry types, pipe through the web API (ExtensionInfo, RegistryEntryInfo), and render as a badge in the gateway UI for both installed and available extensions. For installed WASM extensions, version is read from the capabilities file with a fallback to the registry entry when the local file has no version (old installations). Bump all extension Cargo.toml and registry JSON versions from 0.1.0 to 0.2.0 to keep them in sync. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add document text extraction middleware for PDF, Office, and text files Extract text from document attachments (PDF, DOCX, PPTX, XLSX, RTF, plain text, code files) so the LLM can reason about uploaded documents. Uses pdf-extract for PDFs, zip+XML parsing for Office XML formats, and UTF-8 decode for text files. Wired into the agent loop after transcription middleware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: download document files in Telegram channel for text extraction The DocumentExtractionMiddleware needs file bytes in the attachment `data` field, but only voice files were being downloaded. Document attachments (PDFs, DOCX, etc.) had empty `data` and a source_url with a credential placeholder that only works inside the WASM host's http_request. Add `download_and_store_documents()` that downloads non-voice, non-image, non-audio attachments via the existing two-step getFile→download flow and stores bytes via `store_attachment_data` for host-side extraction. Also rename `download_voice_file` → `download_telegram_file` since it's generic for any file_id. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: allow Office MIME types and increase file download limit for Telegram Two issues preventing document extraction from Telegram: 1. PPTX/DOCX/XLSX MIME types (application/vnd.*) were dropped by the WASM host attachment allowlist — add application/vnd., application/msword, and application/rtf prefixes. 2. Telegram file downloads over 10 MB failed with "Response body too large" — set max_response_bytes to 20 MB in Telegram capabilities. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: report document extraction errors back to user instead of silently skipping - Bump max_response_bytes to 50 MB for Telegram file downloads - When document extraction fails (too large, download error, parse error), set extracted_text to a user-friendly error message instead of leaving it None. This ensures the LLM tells the user what went wrong. - On Telegram download failure, set extracted_text with the error so the user sees feedback even when the file never reaches the extraction middleware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: store extracted document text in workspace memory for search/recall After document extraction succeeds, write the extracted text to workspace memory at `documents/{date}/{filename}`. This enables: - Full-text and semantic search over past uploaded documents - Cross-conversation recall ("what did that PDF say?") - Automatic chunking and embedding via the workspace pipeline Documents are stored with metadata header (uploader, channel, date, MIME type). Error messages (extraction failures) are not stored — only successful extractions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: CI failures — formatting, unused assignment warning - Run cargo fmt on document_extraction and agent_loop modules - Suppress unused_assignments warning on trace_llm_ref (used only behind #[cfg(feature = "libsql")]) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments — security, correctness, and code quality Security fixes: - Remove SSRF-prone download() from DocumentExtractionMiddleware (#13) - Sanitize filenames in workspace path to prevent directory traversal (#11) - Pre-check file size before reading in WASM wrapper to prevent OOM (#2) - Percent-encode file_id in Telegram source URLs (#7) Correctness fixes: - Clear image_content_parts on turn end to prevent memory leak (#1) - Find first *successful* transcription instead of first overall (#3) - Enforce data.len() size limit in document extraction (#10) - Use UTF-8 safe truncation with char_indices() (#12) Robustness & code quality: - Add 120s timeout to OpenAI Whisper HTTP client (#5) - Trim trailing slash from Whisper base_url (#6) - Allow ~/.ironclaw/ paths in WASM wrapper (#8) - Return error from on_broadcast in Slack/Discord/WhatsApp (#9) - Fix doc comment in HTTP tool (#4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: formatting — cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address latest PR review — doc comments, error messages, version bumps - Fix DocumentExtractionMiddleware doc comment (no longer downloads from source_url) - Fix error message: "no inline data" instead of "no download URL" - Log error + fallback instead of silent unwrap_or_default on Whisper HTTP client - Bump all capabilities.json versions from 0.1.0 to 0.2.0 to match Cargo.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unsupported profile: minimal from CI workflows [skip-regression-check] dtolnay/rust-toolchain@stable does not accept the 'profile' input (it was a parameter for the deprecated actions-rs/toolchain action). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: merge with latest main — resolve compilation errors and PR review nits - Add version: None to RegistryEntry/InstalledExtension test constructors - Fix MessageContent type mismatches in nearai_chat tests (String → MessageContent::Text) - Fix .contains() calls on MessageContent — use .as_text().unwrap() - Remove redundant trace_llm_ref = None assignment in test_rig - Check data size before clone in document extraction to avoid unnecessary allocation [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
7 tasks
23 tasks
6 tasks
5 tasks
ilblackdragon
added a commit
that referenced
this pull request
Mar 25, 2026
GATEWAY_USER_TOKENS never went to production — replaced entirely by DB-backed user management via /api/admin/users and /api/tokens. Removed: - UserTokenConfig struct and GATEWAY_USER_TOKENS env var parsing - user_tokens field from GatewayConfig - GatewayChannel::new_multi_auth() constructor - Env-var user migration block in main.rs (~90 lines) - multi_tenant auto-detection from GATEWAY_USER_TOKENS (now runtime via db.has_any_users() in app.rs) Review fixes (zmanian): - User ID generation: UUID instead of display-name derivation (#1) - Invitation accept moved to public router (no auth needed) (#3) - libSQL get_invitation_by_hash aligned with postgres: filters status='pending' AND expires_at > now (#4) - UUID parse: returns DatabaseError::Serialization instead of unwrap_or_default (#7) - PostgreSQL SELECT * replaced with explicit column lists (#8) - Sort order aligned (both backends use DESC) (#6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ilblackdragon
added a commit
that referenced
this pull request
Mar 28, 2026
…i-tenant isolation (#1626) * feat: complete multi-tenant isolation — per-user budgets, model selection, heartbeat cycling Finishes the remaining isolation work from phases 2–4 of #59: Phase 2 (DB scoping): Fix /status and /list commands to use _for_user DB variants instead of global queries that leaked cross-user job data. Phase 3 (Runtime isolation): Per-user workspace in routine engine's spawn_fire so lightweight routines run in the correct user context. Per-user daily cost tracking in CostGuard with configurable budget via MAX_COST_PER_USER_PER_DAY_CENTS. Multi-user heartbeat that cycles through all users with routines, auto-detected from GATEWAY_USER_TOKENS. Phase 4 (Provider/tools): Per-user model selection via preferred_model setting — looked up from SettingsStore on first iteration, threaded through ReasoningContext.model_override to CompletionRequest. Works with providers that support per-request model overrides (NearAI). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use selected_model setting key to match /model command persistence The dispatcher was reading "preferred_model" but the /model command (merged from staging) persists to "selected_model". Since set_setting is already per-user scoped, using the same key makes /model work as the per-user model override in multi-tenant mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: heartbeat hygiene, /model multi-tenant guard, RigAdapter model override Three follow-up fixes for multi-tenant isolation: 1. Multi-user heartbeat now runs memory hygiene per user before each heartbeat check, matching single-user heartbeat behavior. 2. /model command in multi-tenant mode only persists to per-user settings (selected_model) without calling set_model() on the shared LlmProvider. The per-request model_override in the dispatcher reads from the same setting. Added multi_tenant flag to AgentConfig (auto-detected from GATEWAY_USER_TOKENS). 3. RigAdapter now supports per-request model overrides by injecting the model name into rig-core's additional_params. OpenAI/Anthropic/Ollama API servers use last-key-wins for duplicate JSON keys, so the override takes effect via serde's flatten serialization order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR review — cost model attribution, heartbeat concurrency, pruning Fixes from review comments on #1614: - Cost tracking now uses the override model name (not active_model_name) when a per-user model override is active, for accurate attribution. - Multi-user heartbeat runs per-user checks concurrently via JoinSet instead of sequentially, preventing one slow user from blocking others. - Per-user failure counts tracked independently; users exceeding max_failures are skipped (matching single-user semantics). - per_user_daily_cost HashMap pruned on day rollover to prevent unbounded growth in long-lived deployments. - Doc comment fixed: says "routines" not "active routines". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: /status ownership, model persistence scoping, heartbeat robustness Addresses second round of PR review on #1614: - /status <job_id> DB path now validates job.user_id == requesting user before returning data (was missing ownership check, security fix). - persist_selected_model takes user_id param instead of owner_id, and skips .env/TOML writes in multi-tenant mode (these are shared global files). handle_system_command now receives user_id from caller. - JoinSet collection handles Err(JoinError) explicitly instead of silently dropping panicked tasks. - Notification forwarder extracts owner_id from response metadata in multi-tenant mode for per-user routing instead of broadcasting to the agent owner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cost pricing, fire_manual workspace, heartbeat concurrency cap Round 3 review fixes: - Cost tracking passes None for cost_per_token when model override is active, letting CostGuard look up pricing by model name instead of using the default provider's rates (serrrfirat). - fire_manual() now uses per-user workspace, matching spawn_fire() pattern (serrrfirat). - Removed MULTI_TENANT env var — multi-tenant mode is auto-detected solely from GATEWAY_USER_TOKENS presence (serrrfirat + Copilot). - Multi-user heartbeat capped at 8 concurrent tasks to avoid flooding the LLM provider (serrrfirat + Copilot). - Fixed inject_model_override doc comment accuracy (Copilot). - Added comment explaining multi-tenant notification routing priority (Copilot). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: user-scoped webhook endpoint for multi-tenant isolation Adds POST /api/webhooks/u/{user_id}/{path} — a user-scoped webhook endpoint that filters the routine lookup by user_id, preventing cross-user webhook triggering when paths collide. The existing /api/webhooks/{path} endpoint remains unchanged for backward compatibility in single-user deployments. Changes: - get_webhook_routine_by_path gains user_id: Option<&str> param - Both postgres and libsql implementations add AND user_id = ? filter when user_id is provided - New webhook_trigger_user_scoped_handler extracts (user_id, path) from URL and passes to shared fire_webhook_inner logic - Route registered on public router (webhooks are called by external services that can't send bearer tokens) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(db): add UserStore trait with users, api_tokens, invitations tables Foundation for DB-backed user management (#1605): - UserRecord, ApiTokenRecord, InvitationRecord types in db/mod.rs - UserStore sub-trait (17 methods) added to Database supertrait - PostgreSQL migration V14__users.sql (users, api_tokens, invitations) - libSQL schema + incremental migration V14 - Full implementations for both PgBackend (via Store delegation) and LibSqlBackend (direct SQL in libsql/users.rs) - authenticate_token JOINs api_tokens+users with active/non-revoked checks; has_any_users for bootstrap detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(web): DB-backed auth, user/token/invitation API handlers Adds the web gateway layer for DB-backed user management (#1605): Auth refactor: - CombinedAuthState wraps env-var tokens (MultiAuthState) + optional DbAuthenticator for DB-backed token lookup with LRU cache (60s TTL, 1024 max entries) - auth_middleware tries env-var tokens first, then DB fallback - From<MultiAuthState> impl for backward compatibility - main.rs wires with_db_auth when database is available API handlers (12 new endpoints): - /api/admin/users — CRUD: create, list, detail, update, suspend, activate - /api/tokens — create (returns plaintext once), list, revoke - /api/invitations — create, list, accept (creates user + first token) Token creation: 32 random bytes → hex plaintext, SHA-256 hash stored. Invitation accept: validates hash + pending + not expired, creates user record and first API token atomically. All test files updated for CombinedAuthState type change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: startup env-var user migration + UserStore integration tests Completes the DB-backed user management feature (#1605): - Startup migration: when GATEWAY_USER_TOKENS is set and the users table is empty, inserts env-var users + hashed tokens into DB. Logs deprecation notice when DB already has users. - hash_token made pub for reuse in migration code. - 10 integration tests for UserStore (libsql file-backed): - has_any_users bootstrap detection - create/get/get_by_email/list/update user lifecycle - token create → authenticate → revoke → reject cycle - suspended user tokens rejected - wrong-user token revoke returns false - invitation create → accept → user created - record_login and record_token_usage timestamps - libSQL migration: removed FK constraints from V14 (incompatible with execute_batch inside transactions). Tables in both base SCHEMA and incremental migration for fresh and existing databases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove GATEWAY_USER_TOKENS, fix review feedback GATEWAY_USER_TOKENS never went to production — replaced entirely by DB-backed user management via /api/admin/users and /api/tokens. Removed: - UserTokenConfig struct and GATEWAY_USER_TOKENS env var parsing - user_tokens field from GatewayConfig - GatewayChannel::new_multi_auth() constructor - Env-var user migration block in main.rs (~90 lines) - multi_tenant auto-detection from GATEWAY_USER_TOKENS (now runtime via db.has_any_users() in app.rs) Review fixes (zmanian): - User ID generation: UUID instead of display-name derivation (#1) - Invitation accept moved to public router (no auth needed) (#3) - libSQL get_invitation_by_hash aligned with postgres: filters status='pending' AND expires_at > now (#4) - UUID parse: returns DatabaseError::Serialization instead of unwrap_or_default (#7) - PostgreSQL SELECT * replaced with explicit column lists (#8) - Sort order aligned (both backends use DESC) (#6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add role-based access control (admin/member) Adds a `role` field (admin|member) to user management: Schema: - `role TEXT NOT NULL DEFAULT 'member'` added to users table in both PostgreSQL V14 migration and libSQL schema/incremental migration - UserRecord gains `role: String` field - UserIdentity gains `role: String` field, populated from DB in DbAuthenticator and defaulting to "admin" for single-user mode Access control: - AdminUser extractor: returns 403 Forbidden if role != "admin" - /api/admin/users/* handlers: require AdminUser (create, list, detail, update, suspend, activate) - POST /api/invitations: requires AdminUser (only admins can invite) - User creation accepts optional "role" param (defaults to "member") - Invitation acceptance creates users with "member" role Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(web): add Users admin tab to web UI Adds a Users tab to the web gateway UI for managing users, tokens, and roles without needing direct API calls. Features: - User list table with ID, name, email, role, status, created date - Create user form with display name, email, role selector - Suspend/activate actions per user - Create API token for any user (shows plaintext once with copy button) - Role badges (admin highlighted, member muted) - Non-admin users see "Admin access required" message - Keyboard shortcut: Cmd/Ctrl+5 switches to Users tab CSS: - Reuses routines-table styles for the user list - Badge, token-display, btn-small, btn-danger, btn-primary components Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: move Users to Settings subtab, bootstrap admin user on first run - Moved Users from top-level tab to Settings sidebar subtab (under Skills, before Theme toggle) - On first startup with empty users table, automatically creates an admin user from GATEWAY_USER_ID config with a corresponding API token from GATEWAY_AUTH_TOKEN. This ensures the owner appears in the Users panel immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: user creation shows token, + Token works, no password save popup Three UI/UX fixes: 1. Create user now generates an initial API token and shows it in a copy-able banner instead of triggering the browser's password save dialog. Uses autocomplete="off" and type="text" for email field. 2. "+ Token" button works: exposed createTokenForUser/suspendUser/ activateUser on window for inline onclick handlers in dynamically generated table rows. Token creation uses showTokenBanner helper. 3. Admin token creation: POST /api/tokens now accepts optional "user_id" field when the requesting user is admin, allowing token creation for other users from the Users panel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use event delegation for user action buttons (CSP compliance) Inline onclick handlers are blocked by the Content-Security-Policy (script-src 'self' without 'unsafe-inline'). Switched to data-action attributes with a delegated click listener on the users table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add i18n for Users subtab, show login link on user creation - Added 'settings.users' i18n key for English and Chinese - Token banner now shows a full login link (domain/?token=xxx) with a Copy Link button, plus the raw token below - Login link works automatically via existing ?token= auto-auth Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: token hash mismatch — hash hex string, not raw bytes Critical auth bug: token creation hashed the raw 32 bytes (hasher.update(token_bytes)) but authentication hashed the hex-encoded string (hash_token(candidate) where candidate is the hex string the user sends). This meant newly created tokens could never authenticate. Fixed all 4 token creation sites (users, tokens, invitations create, invitations accept) to use hash_token(&plaintext_token) which hashes the hex string consistently with the auth lookup path. Removed now-unused sha2::Digest imports from handlers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove invitation system The invitation flow is redundant — admin create user already generates a token and shows a login link. Invitations add complexity without value until email integration exists. Removed: - InvitationRecord struct and 4 UserStore trait methods - invitations table from V14 migration (postgres + both libsql schemas) - PostgreSQL Store methods (create/get/accept/list invitations) - libSQL UserStore invitation methods + row_to_invitation helper - invitations.rs handler file (212 lines) - /api/invitations routes (create, list, accept) - test_invitation_lifecycle test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: user deletion, self-service profile, per-user job limits, usage API Four multi-tenancy improvements: 1. User deletion cascade (DELETE /api/admin/users/{id}): Deletes user and all data across 11 user-scoped tables (settings, secrets, routines, memory, jobs, conversations, etc.). Admin only. 2. Self-service profile (GET/PATCH /api/profile): Users can read and update their own display_name and metadata without admin privileges. 3. Per-user job concurrency (MAX_JOBS_PER_USER env var): Scheduler checks active_jobs_for(user_id) before dispatch. Prevents one user from exhausting all job slots. 4. Usage reporting (GET /api/admin/usage?user_id=X&period=day|week|month): Aggregates LLM costs from llm_calls via agent_jobs.user_id. Returns per-user, per-model breakdown of calls, tokens, and cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add TenantCtx for compile-time tenant isolation Implements zmanian's architectural proposal from #1614 review: two-tier scoped database access (TenantScope/AdminScope) so handler code cannot accidentally bypass tenant scoping. TenantScope (default): wraps user_id + Arc<dyn Database>, auto-binds user_id on every operation. ID-based lookups return None for cross- tenant resources. No escape hatch — forgetting to scope is a compile error. AdminScope (explicit opt-in): cross-tenant access for system-level components (heartbeat, routine engine, self-repair, scheduler, worker). TenantCtx bundles TenantScope + workspace + cost guard + per-user rate limiting. Constructed once per request in handle_message, threaded through all command handlers and ChatDelegate. Key changes: - New src/tenant.rs (~920 lines): TenantScope, AdminScope, TenantCtx, TenantRateState, TenantRateRegistry - All command handlers: user_id: &str → ctx: &TenantCtx - ChatDelegate: cost check/record/settings via self.tenant - System components: store field changed to AdminScope - Config: TENANT_MAX_LLM_CONCURRENT, TENANT_MAX_JOBS_CONCURRENT env vars - Fixes bug: /status <job_id> cross-tenant leak (now auto-filtered) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR #1626 review feedback — bounded LRU cache, admin auth, FK cleanup - Replace HashMap with lru::LruCache in DbAuthenticator so the token cache is hard-bounded at 1024 entries (evicts LRU, not just expired) - Gate admin user endpoints (list/detail/update/suspend/activate) with AdminUser extractor so members get 403 instead of full access - Add api_tokens to libSQL delete_user cleanup list to prevent orphaned tokens (libSQL has no FK cascade) - Add regression tests for all three fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update CA certificates in runtime Docker image Ensures the root certificate bundle is current so TLS handshakes to services like Supabase succeed on Railway. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CI failures — formatting, no-panics check - Run cargo fmt on test code - Replace .expect() with const NonZeroUsize in DbAuthenticator - Add // safety: comments for test-only code in multi_tenant.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: switch PostgreSQL TLS from rustls to native-tls rustls with rustls-native-certs fails TLS handshake on Railway's slim container (empty or stale root cert store). native-tls delegates to OpenSSL on Linux which handles system certs more reliably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Adding user management api * feat: admin secrets provisioning API + API documentation - Add PUT/GET/DELETE /api/admin/users/{id}/secrets/{name} endpoints for application backends to provision per-user secrets (AES-256-GCM encrypted) - Add secrets_store field to GatewayState with builder wiring - Create docs/USER_MANAGEMENT_API.md with full API spec covering users, secrets, tokens, profile, and usage endpoints - Update web gateway CLAUDE.md route table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add CatchPanicLayer to capture handler panics Without this, panics in async handlers silently drop the connection and the edge proxy returns a generic 503. Now panics are caught, logged, and returned as 500 with the panic message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address second-round review — transactional delete, overflow, error logging - C1: Wrap PostgreSQL delete_user() in a transaction so partial cleanup can't leave users in a half-deleted state - M2: Add job_events to delete cleanup (both backends) — FK to agent_jobs without CASCADE would cause FK violation - H1/M4: Cap expires_in_days to 36500 before i64 cast (tokens + secrets) - H2: Validate target user exists before creating admin token to prevent orphan tokens on libSQL - H3: Log DB errors in DbAuthenticator::authenticate() instead of silently swallowing them as 401 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert to rustls with webpki-roots fallback for PostgreSQL TLS native-tls/OpenSSL caused silent crashes (segfaults in C code) during DB writes on Railway containers. Switch back to rustls but add webpki-roots as a fallback when system certs are missing, which was the original TLS handshake failure on slim container images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update Cargo.lock for rustls + webpki-roots Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * debug: add /api/debug/db-write endpoint to diagnose user insert failure Temporary diagnostic endpoint that tests DB INSERT to users table with full error logging. No auth required. Will be removed after debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: use cargo-chef in Dockerfile for dependency caching Splits the build into planner/deps/builder stages. Dependencies are only recompiled when Cargo.toml or Cargo.lock change. Source-only changes skip straight to the final build stage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * debug: add tracing to users_create_handler Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: guard created_by FK in user creation handler The auth identity user_id (from owner_id scope) may not match any user row in the DB, causing a FK violation on the created_by column. Check that the referenced user exists before setting created_by. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: collapse GATEWAY_USER_ID into IRONCLAW_OWNER_ID Remove the separate GATEWAY_USER_ID config. The gateway now uses IRONCLAW_OWNER_ID (config.owner_id) directly for auth identity, bootstrap user creation, and workspace scoping. Previously, with_owner_scope() rebinds the auth identity to owner_id while keeping default_sender_id as the gateway user_id. This caused a FK constraint violation when creating users because the auth identity ("default") didn't match any user in the DB ("nearai"). Changes: - Remove GATEWAY_USER_ID env var and gateway_user_id from settings - Remove user_id field from GatewayConfig - Add owner_id parameter to GatewayChannel::new() - Remove with_owner_scope() method - Remove default_sender_id from GatewayState - Remove sender override logic in chat/approval handlers - Remove debug endpoint and tracing from prior debugging - Update all tests and E2E fixtures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: hide Users tab for non-admins, remove auth hint text - Fetch /api/profile after login and hide the Users settings tab when the user's role is not admin - Remove the "Enter the GATEWAY_AUTH_TOKEN" hint from the login page since tokens are now managed via the admin panel, not .env files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review feedback (auth 503, token expiry, CORS PATCH) - DB auth errors now return 503 instead of 401 so outages are distinguishable from invalid tokens (serrrfirat H3) - Cap expires_in_days to 36500 before i64 cast to prevent negative duration from u64 overflow (serrrfirat H1) - Add PATCH to CORS allowed methods for profile/user update endpoints (Copilot) - Stop leaking panic details in CatchPanicLayer response body Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden multi-tenant isolation — review fixes from #1614 - Add conversation ownership checks in TenantScope: add_conversation_message, touch_conversation, list_conversation_messages (+ paginated), update_conversation_metadata_field, get_conversation_metadata now return NotFound for conversations not owned by the tenant (cross-tenant data leak) - Fix multi-user heartbeat: clear notify_user_id per runner so notifications persist to the correct user, not the shared config target - Move hygiene tasks into bounded JoinSet instead of unbounded tokio::spawn - Revert send_notification to private visibility (only used within module) - Use effective_model_name() for cost attribution in dispatcher so providers that ignore per-request model overrides report the actual model used - Fix inject_model_override doc comment; add 3 unit tests - Fix heartbeat doc comment ("routines" not "active routines") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Jobs, Cost, Last Active columns to admin Users table Add UserSummaryStats struct and user_summary_stats() batch query to the UserStore trait (both PostgreSQL and libSQL backends). The admin users list endpoint now fetches per-user aggregates (job count, total LLM spend, most recent activity) in a single query and includes them inline in the response. The frontend Users table displays three new columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review comments and CI formatting failures CI fixes: - cargo fmt fixes in cli/mod.rs and db/tls.rs Security/correctness (from Copilot + serrrfirat + pranavraja99 reviews): - Token create: reject expires_in_days > 36500 with 400 instead of silent clamp - Token create: return 404 when admin targets non-existent user - User create: map duplicate email constraint violations to 409 Conflict - User create: remove unnecessary DB roundtrip for created_by (use AdminUser directly) - DB auth: log warn on DB lookup failures instead of silently swallowing errors - libSQL: add FK constraints on users.created_by and api_tokens.user_id Config fixes: - agent.multi_tenant: resolve from AGENT_MULTI_TENANT env var instead of hardcoding false - heartbeat.multi_tenant: fix doc comment to match actual env-var-based behavior UI fix: - showTokenBanner: pass correct title ("Token created!" vs "User created!") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining review comments (round 2) - Secrets handlers: normalize name to lowercase before store operations, validate target user_id exists (returns 404 if not found) - libSQL: propagate cost parsing errors instead of unwrap_or_default() in both user_usage_stats and user_summary_stats - users_list_handler: propagate user_summary_stats DB errors (was silently swallowed with unwrap_or_default) - loadUsers: distinguish 401/403 (admin required) from other errors - Docs: fix users.id type (TEXT not UUID), remove "invitation flow" from V14 migration comment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: i18n for Users tab, atomic user+token creation, transactional delete_user i18n: - Add 31 translation keys for all Users tab strings (en + zh-CN) - Wire data-i18n attributes on HTML elements (headings, buttons, inputs, table headers, empty state) - Replace all hard-coded strings in app.js with I18n.t() calls Atomic user+token creation: - Add create_user_with_token() to UserStore trait - PostgreSQL: wraps both INSERTs in conn.transaction() with auto-rollback - libSQL: wraps in explicit BEGIN/COMMIT with ROLLBACK on error - Handler uses single atomic call instead of two separate operations Transactional delete_user for libSQL: - Wrap multi-table DELETE cascade in BEGIN/COMMIT transaction - ROLLBACK on any error to prevent partial cleanup / inconsistent state - Matches the PostgreSQL implementation which already used transactions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert V14 migration to match deployed checksum [skip-regression-check] Refinery checksums applied migrations — editing V14__users.sql after it was already applied causes deployment failures. Revert the cosmetic comment changes (added in df40b22) to restore the original checksum. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: bootstrap onboarding flow for multi-tenant users The bootstrap greeting and workspace seeding only ran for the owner workspace at startup, so new users created via the admin API never received the welcome message or identity files (BOOTSTRAP.md, SOUL.md, AGENTS.md, USER.md, etc.). Three fixes: - tenant_ctx(): seed per-user workspace on first creation via seed_if_empty(), which writes identity files and sets bootstrap_pending when the workspace is truly fresh - handle_message(): check take_bootstrap_pending() on the tenant workspace (not the owner workspace) and persist the greeting to the user's own assistant conversation + broadcast via SSE - WorkspacePool: seed new per-user workspaces in the web gateway so memory tools also see identity files immediately The existing single-user bootstrap in Agent::run() is preserved for non-multi-tenant deployments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining PR review comments (round 3) - Docs: fix metadata description from "merge patch" to "full replacement" - Secrets: reject expires_in_days > 36500 with 400 (was silently clamped) - libSQL: CAST(SUM(cost) AS TEXT) in user_usage_stats and user_summary_stats to prevent SQLite numeric coercion from crashing get_text() — this was the root cause of the Copilot "SUM returns numeric type" comments - Add 3 regression tests: user_summary_stats (empty + with data) and user_usage_stats (multi-model aggregation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add role change support for users (admin/member toggle) - Add update_user_role() to UserStore trait + both backends (PostgreSQL and libSQL) - Extend PATCH /api/admin/users/{id} to accept optional "role" field with validation (must be "admin" or "member") - Add "Make Admin" / "Make Member" toggle button in Users table actions - Add i18n keys for role change (en + zh-CN) - Update API docs to document the role field on PATCH - Fix test helpers to use fmt_ts() for timestamps (was using SQLite datetime('now') which produces incompatible format for string comparison) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: show live LLM spend in Users table instead of only DB-recorded costs [skip-regression-check] Chat turns record LLM cost in CostGuard (in-memory) but don't create agent_jobs/llm_calls DB rows — those are only written for background jobs. The Users table was querying only from DB, so it showed $0.00 for users who only chatted. Now supplements DB stats with CostGuard.daily_spend_for_user() — the same source displayed in the status bar token counter. Shows whichever is larger (DB historical total vs live daily spend). Also falls back to last_login_at for "Last Active" when no DB job activity exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: persist chat LLM calls to DB and fix usage stats query Two root causes for zero usage stats: 1. ChatDelegate only recorded LLM costs to CostGuard (in-memory) — never to the llm_calls DB table. Added DB persistence via TenantScope.record_llm_call() after each chat LLM call, with job_id=NULL and conversation_id=thread_id. 2. user_summary_stats query only joined agent_jobs→llm_calls, missing chat calls (which have job_id=NULL). Redesigned query to start from llm_calls and resolve user_id via COALESCE(agent_jobs.user_id, conversations.user_id) — covers both job and chat LLM calls. Both PostgreSQL and libSQL queries updated. TenantScope gets record_llm_call() method. Tests updated for new query semantics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review comments — input validation, cost semantics, panic safety [skip-regression-check] - Validate display_name: trim whitespace, reject empty strings (create + update) - Validate metadata: must be a JSON object, return 400 if not (admin + profile) - secrets_list_handler: verify target user_id exists before listing - Cost display: use DB total directly (chat calls now persist to DB), remove confusing max(db,live) CostGuard fallback - CatchPanicLayer: truncate panic payload to 200 chars in log to limit potential sensitive data exposure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Copilot round 5 — docs, secrets consistency, token name, provider field [skip-regression-check] - Docs: users.id note updated to "typically UUID v4 strings (bootstrap admin may use a custom ID)" - secrets_list_handler: return 503 when DB store is None (was falling through to list secrets without user validation) - tokens_create: trim + reject empty token name (matching display_name pattern) - LlmCallRecord.provider: use llm_backend ("nearai","openai") instead of model_name() which returns the model identifier - user_summary_stats zero-LLM users: acceptable — handler already falls back to 0 cost and last_login_at for missing entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: DB auth returns 503 on outage, scheduler counts only blocking jobs From serrrfirat review: - DB auth: return Err(()) on database errors so middleware returns 503 instead of silently returning Ok(None) → 401 (auth miss) - Scheduler: add parallel_blocking_count_for() that uses is_parallel_blocking() (Pending/InProgress/Stuck) instead of is_active() for per-user concurrency — Completed/Submitted jobs no longer count against MAX_JOBS_PER_USER From Copilot: - CLAUDE.md: fix secrets route paths from {id} to {user_id} - token_hash: use .as_slice() instead of .to_vec() to avoid heap allocation on every token auth/creation call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: immediate auth cache invalidation on security-critical actions (zmanian review #6) Add DbAuthenticator::invalidate_user() that evicts all cached entries for a user. Called after: - Suspend user (immediate lockout, was 60s delay) - Activate user (immediate access restoration) - Role change (admin↔member takes effect immediately) - Token revocation (revoked token can't be reused from cache) The DbAuthenticator is shared (via Clone, which Arc-clones the cache) between the auth middleware and GatewayState, so handlers can evict entries from the same cache the middleware reads. Also from zmanian's review: - Items 1-5, 7-11 were already resolved in prior commits - Item 12 (String→enum for status/role) is deferred as a broader refactor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: last-admin protection, usage stats for chat calls, UTF-8 safe panic truncation Last-admin protection: - Suspend, delete, and role-demotion of the last active admin now return 409 Conflict instead of succeeding and locking out the admin API - Helper is_last_admin() checks active admin count before destructive ops Usage stats: - user_usage_stats() now includes chat LLM calls (job_id=NULL) by joining via conversations.user_id, matching user_summary_stats() - Both PostgreSQL and libSQL queries updated Panic handler: - Use floor_char_boundary(200) instead of byte-index [..200] to prevent panic on multi-byte UTF-8 characters in panic messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: workspace seed race, bootstrap atomicity, email trim, secrets upsert response [skip-regression-check] - WorkspacePool: await seed_if_empty() synchronously after inserting into cache (drop lock first to avoid blocking), so callers see identity files immediately instead of racing a background task - Bootstrap admin: use create_user_with_token() for atomic user+token creation, matching the admin create endpoint - Email: trim whitespace, treat empty as None to prevent " " being stored and breaking uniqueness - Secrets PUT: report "updated" vs "created" based on prior existence - Last token_hash.to_vec() → .as_slice() in authenticate_token Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: disable unscoped webhook endpoint in multi-tenant mode [skip-regression-check] The original /api/webhooks/{path} endpoint looks up routines across all users. In multi-tenant mode, anyone who knows the webhook path + secret could trigger another user's routine. Now returns 410 Gone with a message pointing to the scoped endpoint /api/webhooks/u/{user_id}/{path}. Detection uses state.db_auth.is_some() — present only when DB-backed auth is enabled (multi-tenant). Single-user deployments are unaffected. From: standardtoaster review comment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: webhook multi-tenant check, secrets error propagation, stale doc comment [skip-regression-check] - Webhook: use workspace_pool.is_some() instead of db_auth.is_some() for multi-tenant detection — db_auth is set for any DB deployment, workspace_pool is only set when has_any_users() was true at startup - Secrets: propagate exists() errors instead of unwrap_or(false) so backend outages surface as 500 rather than incorrect "created" status - Config: fix stale workspace_read_scopes comment referencing user_id Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7 tasks
bkutasi
pushed a commit
to bkutasi/ironclaw
that referenced
this pull request
Mar 28, 2026
…tion (nearai#238) * feat: add extension registry with metadata catalog, CLI, and onboarding integration Adds a central registry that catalogs all 14 available extensions (10 tools, 4 channels) with their capabilities, auth requirements, and artifact references. The onboarding wizard now shows installable channels from the registry and offers tool installation as a new Step 7. - registry/ folder with per-extension JSON manifests and bundle definitions - src/registry/ module: manifest structs, catalog loader, installer - `ironclaw registry list|info|install|install-defaults` CLI commands - Setup wizard enhanced: channels from registry, new extensions step (8 steps) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(setup): resolve workspace errors for tool crates and channels-only onboarding Tool crates in tools-src/ and channels-src/ failed `cargo metadata` during onboard install because Cargo resolved them as part of the root workspace. Add `[workspace]` table to each standalone crate and extend the root `workspace.exclude` list so they build independently. Channels-only mode (`onboard --channels-only`) failed with "Secrets not configured" and "No database connection" because it skipped database and security setup. Add `reconnect_existing_db()` to establish the DB connection and load saved settings before running channel configuration. Also improve the tunnel "already configured" display to show full provider details (domain, mode, command) instead of just the provider name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(registry): address PR review feedback on installer and catalog - Use manifest.name (not crate_name) for installed filenames so discovery, auth, and CLI commands all agree on the stem (nearai#1) - Add AlreadyInstalled error variant instead of misleading ExtensionNotFound (nearai#2) - Add DownloadFailed error variant with URL context instead of stuffing URLs into PathBuf (nearai#3) - Validate HTTP status with error_for_status() before reading response bytes in artifact downloads (nearai#4) - Switch build_wasm_component to tokio::process::Command with status() so build output streams to the terminal (nearai#6) - Find WASM artifact by crate_name specifically instead of picking the first .wasm file in the release directory (nearai#7) - Add is_file() guard in catalog loader to skip directories (nearai#8) - Detect ambiguous bare-name lookups when both tools/<name> and channels/<name> exist, with get_strict() returning an error (nearai#9) - Fix wizard step_extensions to check tool.name for installed detection, consistent with the new naming (nearai#11, nearai#12) - Fix redundant closures and map_or clippy warnings in changed files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(setup): restore DB connection fields after settings reload reconnect_postgres() and reconnect_libsql() called Settings::from_db_map() which overwrote database_url / libsql_path / libsql_url set from env vars. Also use get_strict() in cmd_info to surface ambiguous bare-name errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix clippy collapsible_if and print_literal warnings Collapse nested if-let chains and inline string literals in format macros to satisfy CI clippy lint checks (deny warnings). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(registry): prefer artifacts for install-defaults and improve dir lookup - InstallDefaults now defaults to downloading pre-built artifacts (matching `registry install` behavior), with --build flag for source builds. - find_registry_dir() walks up 3 ancestor levels from the exe and adds a CARGO_MANIFEST_DIR fallback, matching load_registry_catalog() logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
bkutasi
pushed a commit
to bkutasi/ironclaw
that referenced
this pull request
Mar 28, 2026
* test: add WIT compatibility tests for all WASM tools and channels Adds CI and integration tests to catch WIT interface breakage across all 14 WASM extensions (10 tools + 4 channels). Previously, changing wit/tool.wit or wit/channel.wit could silently break guest-side tools that weren't rebuilt until release time. Three new pieces: 1. scripts/build-wasm-extensions.sh — builds all WASM extensions from source by reading registry manifests. Used by CI and locally. 2. tests/wit_compat.rs — integration tests that compile and instantiate each .wasm binary against the current wasmtime host linker with stubbed host functions. Catches added/removed/renamed WIT functions, signature mismatches, and missing exports. Skips gracefully when artifacts aren't built so `cargo test` still passes standalone. 3. .github/workflows/test.yml — new wasm-wit-compat CI job that builds all extensions then runs instantiation tests on every PR. Added to the branch protection roll-up. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix rustfmt formatting in wit_compat tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback on WIT compat tests - Switch build script from python3 to jq for JSON parsing, consistent with release.yml and avoids python3 dependency (nearai#1, nearai#7) - Use dirs::home_dir() instead of HOME env var for portability (nearai#2) - Filter extensions by manifest "kind" field instead of path (nearai#3) - Replace .flatten() with explicit error handling in dir iteration (nearai#4, nearai#5) - Split stub_tool_host_functions into stub_shared_host_functions + tool-only tool-invoke stub, since tool-invoke is not in channel WIT (nearai#6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
bkutasi
pushed a commit
to bkutasi/ironclaw
that referenced
this pull request
Mar 28, 2026
) * feat: add inbound attachment support to WASM channel system Add attachment record to WIT interface and implement inbound media parsing across all four channel implementations (Telegram, Slack, WhatsApp, Discord). Attachments flow from WASM channels through EmittedMessage to IncomingMessage with validation (size limits, MIME allowlist, count caps) at the host boundary. - Add `attachment` record to `emitted-message` in wit/channel.wit - Add `IncomingAttachment` struct to channel.rs and re-export - Add host-side validation (20MB total, 10 max, MIME allowlist) - Telegram: parse photo, document, audio, video, voice, sticker - Slack: parse file attachments with url_private - WhatsApp: parse image, audio, video, document with captions - Discord: backward-compatible empty attachments - Update FEATURE_PARITY.md section 7 - Add fixture-based tests per channel and host integration tests [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: integrate outbound attachment support and reconcile WIT types (nearai#409) Reconcile PR nearai#409's outbound attachment work with our inbound attachment support into a unified design: WIT type split: - `inbound-attachment` in channel-host: metadata-only (id, mime_type, filename, size_bytes, source_url, storage_key, extracted_text) - `attachment` in channel: raw bytes (filename, mime_type, data) on agent-response for outbound sending Outbound features (from PR nearai#409): - `on-broadcast` WIT export for proactive messages without prior inbound - Telegram: multipart sendPhoto/sendDocument with auto photo→document fallback for files >10MB - wrapper.rs: `call_on_broadcast`, `read_attachments` from disk, attachment params threaded through `call_on_respond` - HTTP tool: `save_to` param for binary downloads to /tmp/ (50MB limit, path traversal protection, SSRF-safe redirect following) - Message tool: allow /tmp/ paths for attachments alongside base_dir - Credential env var fallback in inject_channel_credentials Channel updates: - All 4 channels implement on_broadcast (Telegram full, others stub) - Telegram: polling_enabled config, adjusted poll timeout - Inbound attachment types renamed to InboundAttachment in all channels Tests: 1965 passing (9 new), 0 clippy warnings [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add audio transcription pipeline and extensible WIT attachment design Add host-side transcription middleware (OpenAI Whisper) that detects audio attachments with inline data on incoming messages and transcribes them automatically. Refactor WIT inbound-attachment to use extras-json and a store-attachment-data host function instead of typed fields, so future attachment properties (dimensions, codec, etc.) don't require WIT changes that invalidate all channel plugins. - Add src/transcription/ module: TranscriptionProvider trait, TranscriptionMiddleware, AudioFormat enum, OpenAI Whisper provider - Add src/config/transcription.rs: TRANSCRIPTION_ENABLED/MODEL/BASE_URL - Wire middleware into agent message loop via AgentDeps - WIT: replace data + duration-secs with extras-json + store-attachment-data - Host: parse extras-json for well-known keys, merge stored binary data - Telegram: download voice files via store-attachment-data, add duration to extras-json, add /file/bot to HTTP allowlist, voice-only placeholder - Add reqwest multipart feature for Whisper API uploads - 5 regression tests for transcription middleware Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: wire attachment processing into LLM pipeline with multimodal image support Attachments on incoming messages are now augmented into user text via XML tags before entering the turn system, and images with data are passed as multimodal content parts (base64 data URIs) to LLM providers. This enables audio transcripts, document text, and image content to reach the LLM without changes to ChatMessage serialization or provider interfaces. - Add src/agent/attachments.rs with augment_with_attachments() and 9 unit tests - Add ContentPart/ImageUrl types to llm::provider with OpenAI-compatible serde - Carry image_content_parts transiently on Turn (skipped in serialization) - Update nearai_chat and rig_adapter to serialize multimodal content - Add 3 e2e tests verifying attachments flow through the full agent loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: CI failures — formatting, version bumps, and Telegram voice test - Fix cargo fmt formatting in attachments.rs, nearai_chat.rs, rig_adapter.rs, e2e_attachments.rs - Bump channel registry versions 0.1.0 → 0.2.0 (discord, slack, telegram, whatsapp) to satisfy version-bump CI check - Fix Telegram test_extract_attachments_voice: add missing required `duration` field to voice fixture JSON Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: bump WIT channel version to 0.3.0, fix Telegram voice test, add pre-commit hook - Bump wit/channel.wit package version 0.2.0 → 0.3.0 (interface changed with store-attachment-data) - Update WIT_CHANNEL_VERSION constant and registry wit_version fields to match - Fix Telegram test_extract_attachments_voice: gate voice download behind #[cfg(target_arch = "wasm32")] so host functions aren't called in native tests, update assertions for generated filename and extras_json duration - Add @0.3.0 linker stubs in wit_compat.rs - Add .githooks/pre-commit hook that runs scripts/check-version-bumps.sh when WIT or extension sources are staged - Symlink commit-msg regression hook into .githooks/ [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: extract voice download from extract_attachments into handle_message Move download_voice_file + store_attachment_data calls out of extract_attachments into a separate download_and_store_voice function called from handle_message. This keeps extract_attachments as a pure data-mapping function with no host calls, making it fully testable in native unit tests without #[cfg(target_arch)] gates. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments — security, correctness, and code quality Security fixes: - Add path validation to read_attachments (restrict to /tmp/) preventing arbitrary file reads from compromised tools - Escape XML special characters in attachment filenames, MIME types, and extracted text to prevent prompt injection via tag spoofing - Percent-encode file_id in Telegram getFile URL to prevent query injection - Clone SecretString directly instead of expose_secret().to_string() Correctness fixes: - Fix store_attachment_data overwrite accounting: subtract old entry size before adding new to prevent inflated totals and false rejections - Use max(reported, stored_size) for attachment size accounting to prevent WASM channels from under-reporting size_bytes to bypass limits - Add application/octet-stream to MIME allowlist (channels default unknown types to this) Code quality: - Extract send_response helper in Telegram, deduplicating on_respond and on_broadcast - Rename misleading Discord test to test_parse_slash_command_interaction - Fix .githooks/commit-msg to use relative symlink (portable across machines) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add tool_upgrade command + fix TOCTOU in save_to path validation Add `tool_upgrade` — a new extension management tool that automatically detects and reinstalls WASM extensions with outdated WIT versions. Preserves authentication secrets during upgrade. Supports upgrading a single extension by name or all installed WASM tools/channels at once. Fix TOCTOU in `validate_save_to_path`: validate the path *before* creating parent directories, so traversal paths like `/tmp/../../etc/` cannot cause filesystem mutations outside /tmp before being rejected. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify WIT package version to 0.3.0 across tool.wit and all capabilities tool.wit and channel.wit share the `near:agent` package namespace, so they must declare the same version. Bumps tool.wit from 0.2.0 to 0.3.0 and updates all capabilities files and registry entries to match. Fixes `cargo component build` failure: "package identifier near:agent@0.2.0 does not match previous package name of near:agent@0.3.0" [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move WIT file comments after package declaration WIT treats `//` comments before `package` as doc comments. When both tool.wit and channel.wit had header comments, the parser rejected them as "doc comments on multiple 'package' items". Move comments after the package declaration in both files. Also bumps tool registry versions to 0.2.0 to match the WIT 0.3.0 bump. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: display extension versions in gateway Extensions tab Add version field to InstalledExtension and RegistryEntry types, pipe through the web API (ExtensionInfo, RegistryEntryInfo), and render as a badge in the gateway UI for both installed and available extensions. For installed WASM extensions, version is read from the capabilities file with a fallback to the registry entry when the local file has no version (old installations). Bump all extension Cargo.toml and registry JSON versions from 0.1.0 to 0.2.0 to keep them in sync. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add document text extraction middleware for PDF, Office, and text files Extract text from document attachments (PDF, DOCX, PPTX, XLSX, RTF, plain text, code files) so the LLM can reason about uploaded documents. Uses pdf-extract for PDFs, zip+XML parsing for Office XML formats, and UTF-8 decode for text files. Wired into the agent loop after transcription middleware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: download document files in Telegram channel for text extraction The DocumentExtractionMiddleware needs file bytes in the attachment `data` field, but only voice files were being downloaded. Document attachments (PDFs, DOCX, etc.) had empty `data` and a source_url with a credential placeholder that only works inside the WASM host's http_request. Add `download_and_store_documents()` that downloads non-voice, non-image, non-audio attachments via the existing two-step getFile→download flow and stores bytes via `store_attachment_data` for host-side extraction. Also rename `download_voice_file` → `download_telegram_file` since it's generic for any file_id. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: allow Office MIME types and increase file download limit for Telegram Two issues preventing document extraction from Telegram: 1. PPTX/DOCX/XLSX MIME types (application/vnd.*) were dropped by the WASM host attachment allowlist — add application/vnd., application/msword, and application/rtf prefixes. 2. Telegram file downloads over 10 MB failed with "Response body too large" — set max_response_bytes to 20 MB in Telegram capabilities. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: report document extraction errors back to user instead of silently skipping - Bump max_response_bytes to 50 MB for Telegram file downloads - When document extraction fails (too large, download error, parse error), set extracted_text to a user-friendly error message instead of leaving it None. This ensures the LLM tells the user what went wrong. - On Telegram download failure, set extracted_text with the error so the user sees feedback even when the file never reaches the extraction middleware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: store extracted document text in workspace memory for search/recall After document extraction succeeds, write the extracted text to workspace memory at `documents/{date}/{filename}`. This enables: - Full-text and semantic search over past uploaded documents - Cross-conversation recall ("what did that PDF say?") - Automatic chunking and embedding via the workspace pipeline Documents are stored with metadata header (uploader, channel, date, MIME type). Error messages (extraction failures) are not stored — only successful extractions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: CI failures — formatting, unused assignment warning - Run cargo fmt on document_extraction and agent_loop modules - Suppress unused_assignments warning on trace_llm_ref (used only behind #[cfg(feature = "libsql")]) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments — security, correctness, and code quality Security fixes: - Remove SSRF-prone download() from DocumentExtractionMiddleware (nearai#13) - Sanitize filenames in workspace path to prevent directory traversal (nearai#11) - Pre-check file size before reading in WASM wrapper to prevent OOM (nearai#2) - Percent-encode file_id in Telegram source URLs (nearai#7) Correctness fixes: - Clear image_content_parts on turn end to prevent memory leak (nearai#1) - Find first *successful* transcription instead of first overall (nearai#3) - Enforce data.len() size limit in document extraction (nearai#10) - Use UTF-8 safe truncation with char_indices() (nearai#12) Robustness & code quality: - Add 120s timeout to OpenAI Whisper HTTP client (nearai#5) - Trim trailing slash from Whisper base_url (nearai#6) - Allow ~/.ironclaw/ paths in WASM wrapper (nearai#8) - Return error from on_broadcast in Slack/Discord/WhatsApp (nearai#9) - Fix doc comment in HTTP tool (nearai#4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: formatting — cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address latest PR review — doc comments, error messages, version bumps - Fix DocumentExtractionMiddleware doc comment (no longer downloads from source_url) - Fix error message: "no inline data" instead of "no download URL" - Log error + fallback instead of silent unwrap_or_default on Whisper HTTP client - Bump all capabilities.json versions from 0.1.0 to 0.2.0 to match Cargo.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unsupported profile: minimal from CI workflows [skip-regression-check] dtolnay/rust-toolchain@stable does not accept the 'profile' input (it was a parameter for the deprecated actions-rs/toolchain action). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: merge with latest main — resolve compilation errors and PR review nits - Add version: None to RegistryEntry/InstalledExtension test constructors - Fix MessageContent type mismatches in nearai_chat tests (String → MessageContent::Text) - Fix .contains() calls on MessageContent — use .as_text().unwrap() - Remove redundant trace_llm_ref = None assignment in test_rig - Check data size before clone in document extraction to avoid unnecessary allocation [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
serrrfirat
pushed a commit
that referenced
this pull request
Mar 29, 2026
GATEWAY_USER_TOKENS never went to production — replaced entirely by DB-backed user management via /api/admin/users and /api/tokens. Removed: - UserTokenConfig struct and GATEWAY_USER_TOKENS env var parsing - user_tokens field from GatewayConfig - GatewayChannel::new_multi_auth() constructor - Env-var user migration block in main.rs (~90 lines) - multi_tenant auto-detection from GATEWAY_USER_TOKENS (now runtime via db.has_any_users() in app.rs) Review fixes (zmanian): - User ID generation: UUID instead of display-name derivation (#1) - Invitation accept moved to public router (no auth needed) (#3) - libSQL get_invitation_by_hash aligned with postgres: filters status='pending' AND expires_at > now (#4) - UUID parse: returns DatabaseError::Serialization instead of unwrap_or_default (#7) - PostgreSQL SELECT * replaced with explicit column lists (#8) - Sort order aligned (both backends use DESC) (#6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DougAnderson444
pushed a commit
to DougAnderson444/ironclaw
that referenced
this pull request
Mar 29, 2026
…i-tenant isolation (nearai#1626) * feat: complete multi-tenant isolation — per-user budgets, model selection, heartbeat cycling Finishes the remaining isolation work from phases 2–4 of nearai#59: Phase 2 (DB scoping): Fix /status and /list commands to use _for_user DB variants instead of global queries that leaked cross-user job data. Phase 3 (Runtime isolation): Per-user workspace in routine engine's spawn_fire so lightweight routines run in the correct user context. Per-user daily cost tracking in CostGuard with configurable budget via MAX_COST_PER_USER_PER_DAY_CENTS. Multi-user heartbeat that cycles through all users with routines, auto-detected from GATEWAY_USER_TOKENS. Phase 4 (Provider/tools): Per-user model selection via preferred_model setting — looked up from SettingsStore on first iteration, threaded through ReasoningContext.model_override to CompletionRequest. Works with providers that support per-request model overrides (NearAI). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use selected_model setting key to match /model command persistence The dispatcher was reading "preferred_model" but the /model command (merged from staging) persists to "selected_model". Since set_setting is already per-user scoped, using the same key makes /model work as the per-user model override in multi-tenant mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: heartbeat hygiene, /model multi-tenant guard, RigAdapter model override Three follow-up fixes for multi-tenant isolation: 1. Multi-user heartbeat now runs memory hygiene per user before each heartbeat check, matching single-user heartbeat behavior. 2. /model command in multi-tenant mode only persists to per-user settings (selected_model) without calling set_model() on the shared LlmProvider. The per-request model_override in the dispatcher reads from the same setting. Added multi_tenant flag to AgentConfig (auto-detected from GATEWAY_USER_TOKENS). 3. RigAdapter now supports per-request model overrides by injecting the model name into rig-core's additional_params. OpenAI/Anthropic/Ollama API servers use last-key-wins for duplicate JSON keys, so the override takes effect via serde's flatten serialization order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR review — cost model attribution, heartbeat concurrency, pruning Fixes from review comments on nearai#1614: - Cost tracking now uses the override model name (not active_model_name) when a per-user model override is active, for accurate attribution. - Multi-user heartbeat runs per-user checks concurrently via JoinSet instead of sequentially, preventing one slow user from blocking others. - Per-user failure counts tracked independently; users exceeding max_failures are skipped (matching single-user semantics). - per_user_daily_cost HashMap pruned on day rollover to prevent unbounded growth in long-lived deployments. - Doc comment fixed: says "routines" not "active routines". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: /status ownership, model persistence scoping, heartbeat robustness Addresses second round of PR review on nearai#1614: - /status <job_id> DB path now validates job.user_id == requesting user before returning data (was missing ownership check, security fix). - persist_selected_model takes user_id param instead of owner_id, and skips .env/TOML writes in multi-tenant mode (these are shared global files). handle_system_command now receives user_id from caller. - JoinSet collection handles Err(JoinError) explicitly instead of silently dropping panicked tasks. - Notification forwarder extracts owner_id from response metadata in multi-tenant mode for per-user routing instead of broadcasting to the agent owner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cost pricing, fire_manual workspace, heartbeat concurrency cap Round 3 review fixes: - Cost tracking passes None for cost_per_token when model override is active, letting CostGuard look up pricing by model name instead of using the default provider's rates (serrrfirat). - fire_manual() now uses per-user workspace, matching spawn_fire() pattern (serrrfirat). - Removed MULTI_TENANT env var — multi-tenant mode is auto-detected solely from GATEWAY_USER_TOKENS presence (serrrfirat + Copilot). - Multi-user heartbeat capped at 8 concurrent tasks to avoid flooding the LLM provider (serrrfirat + Copilot). - Fixed inject_model_override doc comment accuracy (Copilot). - Added comment explaining multi-tenant notification routing priority (Copilot). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: user-scoped webhook endpoint for multi-tenant isolation Adds POST /api/webhooks/u/{user_id}/{path} — a user-scoped webhook endpoint that filters the routine lookup by user_id, preventing cross-user webhook triggering when paths collide. The existing /api/webhooks/{path} endpoint remains unchanged for backward compatibility in single-user deployments. Changes: - get_webhook_routine_by_path gains user_id: Option<&str> param - Both postgres and libsql implementations add AND user_id = ? filter when user_id is provided - New webhook_trigger_user_scoped_handler extracts (user_id, path) from URL and passes to shared fire_webhook_inner logic - Route registered on public router (webhooks are called by external services that can't send bearer tokens) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(db): add UserStore trait with users, api_tokens, invitations tables Foundation for DB-backed user management (nearai#1605): - UserRecord, ApiTokenRecord, InvitationRecord types in db/mod.rs - UserStore sub-trait (17 methods) added to Database supertrait - PostgreSQL migration V14__users.sql (users, api_tokens, invitations) - libSQL schema + incremental migration V14 - Full implementations for both PgBackend (via Store delegation) and LibSqlBackend (direct SQL in libsql/users.rs) - authenticate_token JOINs api_tokens+users with active/non-revoked checks; has_any_users for bootstrap detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(web): DB-backed auth, user/token/invitation API handlers Adds the web gateway layer for DB-backed user management (nearai#1605): Auth refactor: - CombinedAuthState wraps env-var tokens (MultiAuthState) + optional DbAuthenticator for DB-backed token lookup with LRU cache (60s TTL, 1024 max entries) - auth_middleware tries env-var tokens first, then DB fallback - From<MultiAuthState> impl for backward compatibility - main.rs wires with_db_auth when database is available API handlers (12 new endpoints): - /api/admin/users — CRUD: create, list, detail, update, suspend, activate - /api/tokens — create (returns plaintext once), list, revoke - /api/invitations — create, list, accept (creates user + first token) Token creation: 32 random bytes → hex plaintext, SHA-256 hash stored. Invitation accept: validates hash + pending + not expired, creates user record and first API token atomically. All test files updated for CombinedAuthState type change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: startup env-var user migration + UserStore integration tests Completes the DB-backed user management feature (nearai#1605): - Startup migration: when GATEWAY_USER_TOKENS is set and the users table is empty, inserts env-var users + hashed tokens into DB. Logs deprecation notice when DB already has users. - hash_token made pub for reuse in migration code. - 10 integration tests for UserStore (libsql file-backed): - has_any_users bootstrap detection - create/get/get_by_email/list/update user lifecycle - token create → authenticate → revoke → reject cycle - suspended user tokens rejected - wrong-user token revoke returns false - invitation create → accept → user created - record_login and record_token_usage timestamps - libSQL migration: removed FK constraints from V14 (incompatible with execute_batch inside transactions). Tables in both base SCHEMA and incremental migration for fresh and existing databases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove GATEWAY_USER_TOKENS, fix review feedback GATEWAY_USER_TOKENS never went to production — replaced entirely by DB-backed user management via /api/admin/users and /api/tokens. Removed: - UserTokenConfig struct and GATEWAY_USER_TOKENS env var parsing - user_tokens field from GatewayConfig - GatewayChannel::new_multi_auth() constructor - Env-var user migration block in main.rs (~90 lines) - multi_tenant auto-detection from GATEWAY_USER_TOKENS (now runtime via db.has_any_users() in app.rs) Review fixes (zmanian): - User ID generation: UUID instead of display-name derivation (nearai#1) - Invitation accept moved to public router (no auth needed) (nearai#3) - libSQL get_invitation_by_hash aligned with postgres: filters status='pending' AND expires_at > now (nearai#4) - UUID parse: returns DatabaseError::Serialization instead of unwrap_or_default (nearai#7) - PostgreSQL SELECT * replaced with explicit column lists (nearai#8) - Sort order aligned (both backends use DESC) (nearai#6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add role-based access control (admin/member) Adds a `role` field (admin|member) to user management: Schema: - `role TEXT NOT NULL DEFAULT 'member'` added to users table in both PostgreSQL V14 migration and libSQL schema/incremental migration - UserRecord gains `role: String` field - UserIdentity gains `role: String` field, populated from DB in DbAuthenticator and defaulting to "admin" for single-user mode Access control: - AdminUser extractor: returns 403 Forbidden if role != "admin" - /api/admin/users/* handlers: require AdminUser (create, list, detail, update, suspend, activate) - POST /api/invitations: requires AdminUser (only admins can invite) - User creation accepts optional "role" param (defaults to "member") - Invitation acceptance creates users with "member" role Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(web): add Users admin tab to web UI Adds a Users tab to the web gateway UI for managing users, tokens, and roles without needing direct API calls. Features: - User list table with ID, name, email, role, status, created date - Create user form with display name, email, role selector - Suspend/activate actions per user - Create API token for any user (shows plaintext once with copy button) - Role badges (admin highlighted, member muted) - Non-admin users see "Admin access required" message - Keyboard shortcut: Cmd/Ctrl+5 switches to Users tab CSS: - Reuses routines-table styles for the user list - Badge, token-display, btn-small, btn-danger, btn-primary components Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: move Users to Settings subtab, bootstrap admin user on first run - Moved Users from top-level tab to Settings sidebar subtab (under Skills, before Theme toggle) - On first startup with empty users table, automatically creates an admin user from GATEWAY_USER_ID config with a corresponding API token from GATEWAY_AUTH_TOKEN. This ensures the owner appears in the Users panel immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: user creation shows token, + Token works, no password save popup Three UI/UX fixes: 1. Create user now generates an initial API token and shows it in a copy-able banner instead of triggering the browser's password save dialog. Uses autocomplete="off" and type="text" for email field. 2. "+ Token" button works: exposed createTokenForUser/suspendUser/ activateUser on window for inline onclick handlers in dynamically generated table rows. Token creation uses showTokenBanner helper. 3. Admin token creation: POST /api/tokens now accepts optional "user_id" field when the requesting user is admin, allowing token creation for other users from the Users panel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use event delegation for user action buttons (CSP compliance) Inline onclick handlers are blocked by the Content-Security-Policy (script-src 'self' without 'unsafe-inline'). Switched to data-action attributes with a delegated click listener on the users table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add i18n for Users subtab, show login link on user creation - Added 'settings.users' i18n key for English and Chinese - Token banner now shows a full login link (domain/?token=xxx) with a Copy Link button, plus the raw token below - Login link works automatically via existing ?token= auto-auth Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: token hash mismatch — hash hex string, not raw bytes Critical auth bug: token creation hashed the raw 32 bytes (hasher.update(token_bytes)) but authentication hashed the hex-encoded string (hash_token(candidate) where candidate is the hex string the user sends). This meant newly created tokens could never authenticate. Fixed all 4 token creation sites (users, tokens, invitations create, invitations accept) to use hash_token(&plaintext_token) which hashes the hex string consistently with the auth lookup path. Removed now-unused sha2::Digest imports from handlers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove invitation system The invitation flow is redundant — admin create user already generates a token and shows a login link. Invitations add complexity without value until email integration exists. Removed: - InvitationRecord struct and 4 UserStore trait methods - invitations table from V14 migration (postgres + both libsql schemas) - PostgreSQL Store methods (create/get/accept/list invitations) - libSQL UserStore invitation methods + row_to_invitation helper - invitations.rs handler file (212 lines) - /api/invitations routes (create, list, accept) - test_invitation_lifecycle test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: user deletion, self-service profile, per-user job limits, usage API Four multi-tenancy improvements: 1. User deletion cascade (DELETE /api/admin/users/{id}): Deletes user and all data across 11 user-scoped tables (settings, secrets, routines, memory, jobs, conversations, etc.). Admin only. 2. Self-service profile (GET/PATCH /api/profile): Users can read and update their own display_name and metadata without admin privileges. 3. Per-user job concurrency (MAX_JOBS_PER_USER env var): Scheduler checks active_jobs_for(user_id) before dispatch. Prevents one user from exhausting all job slots. 4. Usage reporting (GET /api/admin/usage?user_id=X&period=day|week|month): Aggregates LLM costs from llm_calls via agent_jobs.user_id. Returns per-user, per-model breakdown of calls, tokens, and cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add TenantCtx for compile-time tenant isolation Implements zmanian's architectural proposal from nearai#1614 review: two-tier scoped database access (TenantScope/AdminScope) so handler code cannot accidentally bypass tenant scoping. TenantScope (default): wraps user_id + Arc<dyn Database>, auto-binds user_id on every operation. ID-based lookups return None for cross- tenant resources. No escape hatch — forgetting to scope is a compile error. AdminScope (explicit opt-in): cross-tenant access for system-level components (heartbeat, routine engine, self-repair, scheduler, worker). TenantCtx bundles TenantScope + workspace + cost guard + per-user rate limiting. Constructed once per request in handle_message, threaded through all command handlers and ChatDelegate. Key changes: - New src/tenant.rs (~920 lines): TenantScope, AdminScope, TenantCtx, TenantRateState, TenantRateRegistry - All command handlers: user_id: &str → ctx: &TenantCtx - ChatDelegate: cost check/record/settings via self.tenant - System components: store field changed to AdminScope - Config: TENANT_MAX_LLM_CONCURRENT, TENANT_MAX_JOBS_CONCURRENT env vars - Fixes bug: /status <job_id> cross-tenant leak (now auto-filtered) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR nearai#1626 review feedback — bounded LRU cache, admin auth, FK cleanup - Replace HashMap with lru::LruCache in DbAuthenticator so the token cache is hard-bounded at 1024 entries (evicts LRU, not just expired) - Gate admin user endpoints (list/detail/update/suspend/activate) with AdminUser extractor so members get 403 instead of full access - Add api_tokens to libSQL delete_user cleanup list to prevent orphaned tokens (libSQL has no FK cascade) - Add regression tests for all three fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update CA certificates in runtime Docker image Ensures the root certificate bundle is current so TLS handshakes to services like Supabase succeed on Railway. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CI failures — formatting, no-panics check - Run cargo fmt on test code - Replace .expect() with const NonZeroUsize in DbAuthenticator - Add // safety: comments for test-only code in multi_tenant.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: switch PostgreSQL TLS from rustls to native-tls rustls with rustls-native-certs fails TLS handshake on Railway's slim container (empty or stale root cert store). native-tls delegates to OpenSSL on Linux which handles system certs more reliably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Adding user management api * feat: admin secrets provisioning API + API documentation - Add PUT/GET/DELETE /api/admin/users/{id}/secrets/{name} endpoints for application backends to provision per-user secrets (AES-256-GCM encrypted) - Add secrets_store field to GatewayState with builder wiring - Create docs/USER_MANAGEMENT_API.md with full API spec covering users, secrets, tokens, profile, and usage endpoints - Update web gateway CLAUDE.md route table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add CatchPanicLayer to capture handler panics Without this, panics in async handlers silently drop the connection and the edge proxy returns a generic 503. Now panics are caught, logged, and returned as 500 with the panic message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address second-round review — transactional delete, overflow, error logging - C1: Wrap PostgreSQL delete_user() in a transaction so partial cleanup can't leave users in a half-deleted state - M2: Add job_events to delete cleanup (both backends) — FK to agent_jobs without CASCADE would cause FK violation - H1/M4: Cap expires_in_days to 36500 before i64 cast (tokens + secrets) - H2: Validate target user exists before creating admin token to prevent orphan tokens on libSQL - H3: Log DB errors in DbAuthenticator::authenticate() instead of silently swallowing them as 401 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert to rustls with webpki-roots fallback for PostgreSQL TLS native-tls/OpenSSL caused silent crashes (segfaults in C code) during DB writes on Railway containers. Switch back to rustls but add webpki-roots as a fallback when system certs are missing, which was the original TLS handshake failure on slim container images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update Cargo.lock for rustls + webpki-roots Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * debug: add /api/debug/db-write endpoint to diagnose user insert failure Temporary diagnostic endpoint that tests DB INSERT to users table with full error logging. No auth required. Will be removed after debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: use cargo-chef in Dockerfile for dependency caching Splits the build into planner/deps/builder stages. Dependencies are only recompiled when Cargo.toml or Cargo.lock change. Source-only changes skip straight to the final build stage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * debug: add tracing to users_create_handler Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: guard created_by FK in user creation handler The auth identity user_id (from owner_id scope) may not match any user row in the DB, causing a FK violation on the created_by column. Check that the referenced user exists before setting created_by. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: collapse GATEWAY_USER_ID into IRONCLAW_OWNER_ID Remove the separate GATEWAY_USER_ID config. The gateway now uses IRONCLAW_OWNER_ID (config.owner_id) directly for auth identity, bootstrap user creation, and workspace scoping. Previously, with_owner_scope() rebinds the auth identity to owner_id while keeping default_sender_id as the gateway user_id. This caused a FK constraint violation when creating users because the auth identity ("default") didn't match any user in the DB ("nearai"). Changes: - Remove GATEWAY_USER_ID env var and gateway_user_id from settings - Remove user_id field from GatewayConfig - Add owner_id parameter to GatewayChannel::new() - Remove with_owner_scope() method - Remove default_sender_id from GatewayState - Remove sender override logic in chat/approval handlers - Remove debug endpoint and tracing from prior debugging - Update all tests and E2E fixtures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: hide Users tab for non-admins, remove auth hint text - Fetch /api/profile after login and hide the Users settings tab when the user's role is not admin - Remove the "Enter the GATEWAY_AUTH_TOKEN" hint from the login page since tokens are now managed via the admin panel, not .env files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review feedback (auth 503, token expiry, CORS PATCH) - DB auth errors now return 503 instead of 401 so outages are distinguishable from invalid tokens (serrrfirat H3) - Cap expires_in_days to 36500 before i64 cast to prevent negative duration from u64 overflow (serrrfirat H1) - Add PATCH to CORS allowed methods for profile/user update endpoints (Copilot) - Stop leaking panic details in CatchPanicLayer response body Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden multi-tenant isolation — review fixes from nearai#1614 - Add conversation ownership checks in TenantScope: add_conversation_message, touch_conversation, list_conversation_messages (+ paginated), update_conversation_metadata_field, get_conversation_metadata now return NotFound for conversations not owned by the tenant (cross-tenant data leak) - Fix multi-user heartbeat: clear notify_user_id per runner so notifications persist to the correct user, not the shared config target - Move hygiene tasks into bounded JoinSet instead of unbounded tokio::spawn - Revert send_notification to private visibility (only used within module) - Use effective_model_name() for cost attribution in dispatcher so providers that ignore per-request model overrides report the actual model used - Fix inject_model_override doc comment; add 3 unit tests - Fix heartbeat doc comment ("routines" not "active routines") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Jobs, Cost, Last Active columns to admin Users table Add UserSummaryStats struct and user_summary_stats() batch query to the UserStore trait (both PostgreSQL and libSQL backends). The admin users list endpoint now fetches per-user aggregates (job count, total LLM spend, most recent activity) in a single query and includes them inline in the response. The frontend Users table displays three new columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review comments and CI formatting failures CI fixes: - cargo fmt fixes in cli/mod.rs and db/tls.rs Security/correctness (from Copilot + serrrfirat + pranavraja99 reviews): - Token create: reject expires_in_days > 36500 with 400 instead of silent clamp - Token create: return 404 when admin targets non-existent user - User create: map duplicate email constraint violations to 409 Conflict - User create: remove unnecessary DB roundtrip for created_by (use AdminUser directly) - DB auth: log warn on DB lookup failures instead of silently swallowing errors - libSQL: add FK constraints on users.created_by and api_tokens.user_id Config fixes: - agent.multi_tenant: resolve from AGENT_MULTI_TENANT env var instead of hardcoding false - heartbeat.multi_tenant: fix doc comment to match actual env-var-based behavior UI fix: - showTokenBanner: pass correct title ("Token created!" vs "User created!") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining review comments (round 2) - Secrets handlers: normalize name to lowercase before store operations, validate target user_id exists (returns 404 if not found) - libSQL: propagate cost parsing errors instead of unwrap_or_default() in both user_usage_stats and user_summary_stats - users_list_handler: propagate user_summary_stats DB errors (was silently swallowed with unwrap_or_default) - loadUsers: distinguish 401/403 (admin required) from other errors - Docs: fix users.id type (TEXT not UUID), remove "invitation flow" from V14 migration comment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: i18n for Users tab, atomic user+token creation, transactional delete_user i18n: - Add 31 translation keys for all Users tab strings (en + zh-CN) - Wire data-i18n attributes on HTML elements (headings, buttons, inputs, table headers, empty state) - Replace all hard-coded strings in app.js with I18n.t() calls Atomic user+token creation: - Add create_user_with_token() to UserStore trait - PostgreSQL: wraps both INSERTs in conn.transaction() with auto-rollback - libSQL: wraps in explicit BEGIN/COMMIT with ROLLBACK on error - Handler uses single atomic call instead of two separate operations Transactional delete_user for libSQL: - Wrap multi-table DELETE cascade in BEGIN/COMMIT transaction - ROLLBACK on any error to prevent partial cleanup / inconsistent state - Matches the PostgreSQL implementation which already used transactions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert V14 migration to match deployed checksum [skip-regression-check] Refinery checksums applied migrations — editing V14__users.sql after it was already applied causes deployment failures. Revert the cosmetic comment changes (added in df40b22) to restore the original checksum. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: bootstrap onboarding flow for multi-tenant users The bootstrap greeting and workspace seeding only ran for the owner workspace at startup, so new users created via the admin API never received the welcome message or identity files (BOOTSTRAP.md, SOUL.md, AGENTS.md, USER.md, etc.). Three fixes: - tenant_ctx(): seed per-user workspace on first creation via seed_if_empty(), which writes identity files and sets bootstrap_pending when the workspace is truly fresh - handle_message(): check take_bootstrap_pending() on the tenant workspace (not the owner workspace) and persist the greeting to the user's own assistant conversation + broadcast via SSE - WorkspacePool: seed new per-user workspaces in the web gateway so memory tools also see identity files immediately The existing single-user bootstrap in Agent::run() is preserved for non-multi-tenant deployments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining PR review comments (round 3) - Docs: fix metadata description from "merge patch" to "full replacement" - Secrets: reject expires_in_days > 36500 with 400 (was silently clamped) - libSQL: CAST(SUM(cost) AS TEXT) in user_usage_stats and user_summary_stats to prevent SQLite numeric coercion from crashing get_text() — this was the root cause of the Copilot "SUM returns numeric type" comments - Add 3 regression tests: user_summary_stats (empty + with data) and user_usage_stats (multi-model aggregation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add role change support for users (admin/member toggle) - Add update_user_role() to UserStore trait + both backends (PostgreSQL and libSQL) - Extend PATCH /api/admin/users/{id} to accept optional "role" field with validation (must be "admin" or "member") - Add "Make Admin" / "Make Member" toggle button in Users table actions - Add i18n keys for role change (en + zh-CN) - Update API docs to document the role field on PATCH - Fix test helpers to use fmt_ts() for timestamps (was using SQLite datetime('now') which produces incompatible format for string comparison) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: show live LLM spend in Users table instead of only DB-recorded costs [skip-regression-check] Chat turns record LLM cost in CostGuard (in-memory) but don't create agent_jobs/llm_calls DB rows — those are only written for background jobs. The Users table was querying only from DB, so it showed $0.00 for users who only chatted. Now supplements DB stats with CostGuard.daily_spend_for_user() — the same source displayed in the status bar token counter. Shows whichever is larger (DB historical total vs live daily spend). Also falls back to last_login_at for "Last Active" when no DB job activity exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: persist chat LLM calls to DB and fix usage stats query Two root causes for zero usage stats: 1. ChatDelegate only recorded LLM costs to CostGuard (in-memory) — never to the llm_calls DB table. Added DB persistence via TenantScope.record_llm_call() after each chat LLM call, with job_id=NULL and conversation_id=thread_id. 2. user_summary_stats query only joined agent_jobs→llm_calls, missing chat calls (which have job_id=NULL). Redesigned query to start from llm_calls and resolve user_id via COALESCE(agent_jobs.user_id, conversations.user_id) — covers both job and chat LLM calls. Both PostgreSQL and libSQL queries updated. TenantScope gets record_llm_call() method. Tests updated for new query semantics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review comments — input validation, cost semantics, panic safety [skip-regression-check] - Validate display_name: trim whitespace, reject empty strings (create + update) - Validate metadata: must be a JSON object, return 400 if not (admin + profile) - secrets_list_handler: verify target user_id exists before listing - Cost display: use DB total directly (chat calls now persist to DB), remove confusing max(db,live) CostGuard fallback - CatchPanicLayer: truncate panic payload to 200 chars in log to limit potential sensitive data exposure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Copilot round 5 — docs, secrets consistency, token name, provider field [skip-regression-check] - Docs: users.id note updated to "typically UUID v4 strings (bootstrap admin may use a custom ID)" - secrets_list_handler: return 503 when DB store is None (was falling through to list secrets without user validation) - tokens_create: trim + reject empty token name (matching display_name pattern) - LlmCallRecord.provider: use llm_backend ("nearai","openai") instead of model_name() which returns the model identifier - user_summary_stats zero-LLM users: acceptable — handler already falls back to 0 cost and last_login_at for missing entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: DB auth returns 503 on outage, scheduler counts only blocking jobs From serrrfirat review: - DB auth: return Err(()) on database errors so middleware returns 503 instead of silently returning Ok(None) → 401 (auth miss) - Scheduler: add parallel_blocking_count_for() that uses is_parallel_blocking() (Pending/InProgress/Stuck) instead of is_active() for per-user concurrency — Completed/Submitted jobs no longer count against MAX_JOBS_PER_USER From Copilot: - CLAUDE.md: fix secrets route paths from {id} to {user_id} - token_hash: use .as_slice() instead of .to_vec() to avoid heap allocation on every token auth/creation call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: immediate auth cache invalidation on security-critical actions (zmanian review nearai#6) Add DbAuthenticator::invalidate_user() that evicts all cached entries for a user. Called after: - Suspend user (immediate lockout, was 60s delay) - Activate user (immediate access restoration) - Role change (admin↔member takes effect immediately) - Token revocation (revoked token can't be reused from cache) The DbAuthenticator is shared (via Clone, which Arc-clones the cache) between the auth middleware and GatewayState, so handlers can evict entries from the same cache the middleware reads. Also from zmanian's review: - Items 1-5, 7-11 were already resolved in prior commits - Item 12 (String→enum for status/role) is deferred as a broader refactor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: last-admin protection, usage stats for chat calls, UTF-8 safe panic truncation Last-admin protection: - Suspend, delete, and role-demotion of the last active admin now return 409 Conflict instead of succeeding and locking out the admin API - Helper is_last_admin() checks active admin count before destructive ops Usage stats: - user_usage_stats() now includes chat LLM calls (job_id=NULL) by joining via conversations.user_id, matching user_summary_stats() - Both PostgreSQL and libSQL queries updated Panic handler: - Use floor_char_boundary(200) instead of byte-index [..200] to prevent panic on multi-byte UTF-8 characters in panic messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: workspace seed race, bootstrap atomicity, email trim, secrets upsert response [skip-regression-check] - WorkspacePool: await seed_if_empty() synchronously after inserting into cache (drop lock first to avoid blocking), so callers see identity files immediately instead of racing a background task - Bootstrap admin: use create_user_with_token() for atomic user+token creation, matching the admin create endpoint - Email: trim whitespace, treat empty as None to prevent " " being stored and breaking uniqueness - Secrets PUT: report "updated" vs "created" based on prior existence - Last token_hash.to_vec() → .as_slice() in authenticate_token Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: disable unscoped webhook endpoint in multi-tenant mode [skip-regression-check] The original /api/webhooks/{path} endpoint looks up routines across all users. In multi-tenant mode, anyone who knows the webhook path + secret could trigger another user's routine. Now returns 410 Gone with a message pointing to the scoped endpoint /api/webhooks/u/{user_id}/{path}. Detection uses state.db_auth.is_some() — present only when DB-backed auth is enabled (multi-tenant). Single-user deployments are unaffected. From: standardtoaster review comment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: webhook multi-tenant check, secrets error propagation, stale doc comment [skip-regression-check] - Webhook: use workspace_pool.is_some() instead of db_auth.is_some() for multi-tenant detection — db_auth is set for any DB deployment, workspace_pool is only set when has_any_users() was true at startup - Secrets: propagate exists() errors instead of unwrap_or(false) so backend outages surface as 500 rather than incorrect "created" status - Config: fix stale workspace_read_scopes comment referencing user_id Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ilblackdragon
added a commit
that referenced
this pull request
Mar 30, 2026
…, SSE broadcast Three hardening fixes from fragility audit: 1. **Clear v2 pending_auth from API path** (#5/#6): The /api/chat/auth-token endpoint now calls `clear_engine_pending_auth()` after storing credentials. Without this, the next chat message would be intercepted as a token retry even though auth was completed via the API endpoint. 2. **Fix E2E binary path resolution** (#1): conftest.py now resolves the actual cargo target-dir from ~/.cargo/config.toml instead of hardcoding `target/debug/`. Also adds `crates/` to the mtime check inputs so engine crate changes trigger rebuilds. 3. **Send AuthCompleted SSE from chat message path** (#7): The chat-message token submission path now broadcasts AuthCompleted via SSE (same as the API path), so the frontend dismisses the auth card immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
11 tasks
7 tasks
ilblackdragon
added a commit
that referenced
this pull request
Apr 7, 2026
Document the core design principle from #2049 in two places so future contributors (human and AI) discover it during development: - CLAUDE.md: new "Everything Goes Through Tools" section near the "Adding a New Channel" guide. Includes the rule, the rationale (audit trail, safety pipeline parity, channel-agnostic surface, agent parity), and a pointer to the detailed rule file. - .claude/rules/tools.md: full pattern with required/forbidden examples, the list of layers that ARE exempt (Worker::execute_tool, v2 EffectBridgeAdapter, tool implementations themselves, background engine jobs, read-aggregation queries), and how to annotate intentional exceptions. Also extends `paths` to cover src/channels/** and src/cli/** so it surfaces when those files are edited. Enforce with a new pre-commit safety check (#7) in scripts/pre-commit-safety.sh: - Scans newly added lines under src/channels/web/handlers/*.rs and src/cli/*.rs for direct touches of state.{store, workspace, workspace_pool, extension_manager, skill_registry, session_manager}. - Suppress with a trailing `// dispatch-exempt: <reason>` comment on the same line, matching the existing `// safety:` convention. - Only checks added lines (`+` in the diff), so existing untouched handlers don't trip the check during incremental migration. The check fires only for new code: handlers that haven't been migrated yet (52 existing direct accesses across 12 handler files) won't break unmodified, but any new line that bypasses the dispatcher will be flagged at commit time. Refs: #2049 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
drchirag1991
pushed a commit
to drchirag1991/ironclaw
that referenced
this pull request
Apr 8, 2026
) * feat: add inbound attachment support to WASM channel system Add attachment record to WIT interface and implement inbound media parsing across all four channel implementations (Telegram, Slack, WhatsApp, Discord). Attachments flow from WASM channels through EmittedMessage to IncomingMessage with validation (size limits, MIME allowlist, count caps) at the host boundary. - Add `attachment` record to `emitted-message` in wit/channel.wit - Add `IncomingAttachment` struct to channel.rs and re-export - Add host-side validation (20MB total, 10 max, MIME allowlist) - Telegram: parse photo, document, audio, video, voice, sticker - Slack: parse file attachments with url_private - WhatsApp: parse image, audio, video, document with captions - Discord: backward-compatible empty attachments - Update FEATURE_PARITY.md section 7 - Add fixture-based tests per channel and host integration tests [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: integrate outbound attachment support and reconcile WIT types (nearai#409) Reconcile PR nearai#409's outbound attachment work with our inbound attachment support into a unified design: WIT type split: - `inbound-attachment` in channel-host: metadata-only (id, mime_type, filename, size_bytes, source_url, storage_key, extracted_text) - `attachment` in channel: raw bytes (filename, mime_type, data) on agent-response for outbound sending Outbound features (from PR nearai#409): - `on-broadcast` WIT export for proactive messages without prior inbound - Telegram: multipart sendPhoto/sendDocument with auto photo→document fallback for files >10MB - wrapper.rs: `call_on_broadcast`, `read_attachments` from disk, attachment params threaded through `call_on_respond` - HTTP tool: `save_to` param for binary downloads to /tmp/ (50MB limit, path traversal protection, SSRF-safe redirect following) - Message tool: allow /tmp/ paths for attachments alongside base_dir - Credential env var fallback in inject_channel_credentials Channel updates: - All 4 channels implement on_broadcast (Telegram full, others stub) - Telegram: polling_enabled config, adjusted poll timeout - Inbound attachment types renamed to InboundAttachment in all channels Tests: 1965 passing (9 new), 0 clippy warnings [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add audio transcription pipeline and extensible WIT attachment design Add host-side transcription middleware (OpenAI Whisper) that detects audio attachments with inline data on incoming messages and transcribes them automatically. Refactor WIT inbound-attachment to use extras-json and a store-attachment-data host function instead of typed fields, so future attachment properties (dimensions, codec, etc.) don't require WIT changes that invalidate all channel plugins. - Add src/transcription/ module: TranscriptionProvider trait, TranscriptionMiddleware, AudioFormat enum, OpenAI Whisper provider - Add src/config/transcription.rs: TRANSCRIPTION_ENABLED/MODEL/BASE_URL - Wire middleware into agent message loop via AgentDeps - WIT: replace data + duration-secs with extras-json + store-attachment-data - Host: parse extras-json for well-known keys, merge stored binary data - Telegram: download voice files via store-attachment-data, add duration to extras-json, add /file/bot to HTTP allowlist, voice-only placeholder - Add reqwest multipart feature for Whisper API uploads - 5 regression tests for transcription middleware Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: wire attachment processing into LLM pipeline with multimodal image support Attachments on incoming messages are now augmented into user text via XML tags before entering the turn system, and images with data are passed as multimodal content parts (base64 data URIs) to LLM providers. This enables audio transcripts, document text, and image content to reach the LLM without changes to ChatMessage serialization or provider interfaces. - Add src/agent/attachments.rs with augment_with_attachments() and 9 unit tests - Add ContentPart/ImageUrl types to llm::provider with OpenAI-compatible serde - Carry image_content_parts transiently on Turn (skipped in serialization) - Update nearai_chat and rig_adapter to serialize multimodal content - Add 3 e2e tests verifying attachments flow through the full agent loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: CI failures — formatting, version bumps, and Telegram voice test - Fix cargo fmt formatting in attachments.rs, nearai_chat.rs, rig_adapter.rs, e2e_attachments.rs - Bump channel registry versions 0.1.0 → 0.2.0 (discord, slack, telegram, whatsapp) to satisfy version-bump CI check - Fix Telegram test_extract_attachments_voice: add missing required `duration` field to voice fixture JSON Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: bump WIT channel version to 0.3.0, fix Telegram voice test, add pre-commit hook - Bump wit/channel.wit package version 0.2.0 → 0.3.0 (interface changed with store-attachment-data) - Update WIT_CHANNEL_VERSION constant and registry wit_version fields to match - Fix Telegram test_extract_attachments_voice: gate voice download behind #[cfg(target_arch = "wasm32")] so host functions aren't called in native tests, update assertions for generated filename and extras_json duration - Add @0.3.0 linker stubs in wit_compat.rs - Add .githooks/pre-commit hook that runs scripts/check-version-bumps.sh when WIT or extension sources are staged - Symlink commit-msg regression hook into .githooks/ [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: extract voice download from extract_attachments into handle_message Move download_voice_file + store_attachment_data calls out of extract_attachments into a separate download_and_store_voice function called from handle_message. This keeps extract_attachments as a pure data-mapping function with no host calls, making it fully testable in native unit tests without #[cfg(target_arch)] gates. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments — security, correctness, and code quality Security fixes: - Add path validation to read_attachments (restrict to /tmp/) preventing arbitrary file reads from compromised tools - Escape XML special characters in attachment filenames, MIME types, and extracted text to prevent prompt injection via tag spoofing - Percent-encode file_id in Telegram getFile URL to prevent query injection - Clone SecretString directly instead of expose_secret().to_string() Correctness fixes: - Fix store_attachment_data overwrite accounting: subtract old entry size before adding new to prevent inflated totals and false rejections - Use max(reported, stored_size) for attachment size accounting to prevent WASM channels from under-reporting size_bytes to bypass limits - Add application/octet-stream to MIME allowlist (channels default unknown types to this) Code quality: - Extract send_response helper in Telegram, deduplicating on_respond and on_broadcast - Rename misleading Discord test to test_parse_slash_command_interaction - Fix .githooks/commit-msg to use relative symlink (portable across machines) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add tool_upgrade command + fix TOCTOU in save_to path validation Add `tool_upgrade` — a new extension management tool that automatically detects and reinstalls WASM extensions with outdated WIT versions. Preserves authentication secrets during upgrade. Supports upgrading a single extension by name or all installed WASM tools/channels at once. Fix TOCTOU in `validate_save_to_path`: validate the path *before* creating parent directories, so traversal paths like `/tmp/../../etc/` cannot cause filesystem mutations outside /tmp before being rejected. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify WIT package version to 0.3.0 across tool.wit and all capabilities tool.wit and channel.wit share the `near:agent` package namespace, so they must declare the same version. Bumps tool.wit from 0.2.0 to 0.3.0 and updates all capabilities files and registry entries to match. Fixes `cargo component build` failure: "package identifier near:agent@0.2.0 does not match previous package name of near:agent@0.3.0" [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move WIT file comments after package declaration WIT treats `//` comments before `package` as doc comments. When both tool.wit and channel.wit had header comments, the parser rejected them as "doc comments on multiple 'package' items". Move comments after the package declaration in both files. Also bumps tool registry versions to 0.2.0 to match the WIT 0.3.0 bump. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: display extension versions in gateway Extensions tab Add version field to InstalledExtension and RegistryEntry types, pipe through the web API (ExtensionInfo, RegistryEntryInfo), and render as a badge in the gateway UI for both installed and available extensions. For installed WASM extensions, version is read from the capabilities file with a fallback to the registry entry when the local file has no version (old installations). Bump all extension Cargo.toml and registry JSON versions from 0.1.0 to 0.2.0 to keep them in sync. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add document text extraction middleware for PDF, Office, and text files Extract text from document attachments (PDF, DOCX, PPTX, XLSX, RTF, plain text, code files) so the LLM can reason about uploaded documents. Uses pdf-extract for PDFs, zip+XML parsing for Office XML formats, and UTF-8 decode for text files. Wired into the agent loop after transcription middleware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: download document files in Telegram channel for text extraction The DocumentExtractionMiddleware needs file bytes in the attachment `data` field, but only voice files were being downloaded. Document attachments (PDFs, DOCX, etc.) had empty `data` and a source_url with a credential placeholder that only works inside the WASM host's http_request. Add `download_and_store_documents()` that downloads non-voice, non-image, non-audio attachments via the existing two-step getFile→download flow and stores bytes via `store_attachment_data` for host-side extraction. Also rename `download_voice_file` → `download_telegram_file` since it's generic for any file_id. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: allow Office MIME types and increase file download limit for Telegram Two issues preventing document extraction from Telegram: 1. PPTX/DOCX/XLSX MIME types (application/vnd.*) were dropped by the WASM host attachment allowlist — add application/vnd., application/msword, and application/rtf prefixes. 2. Telegram file downloads over 10 MB failed with "Response body too large" — set max_response_bytes to 20 MB in Telegram capabilities. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: report document extraction errors back to user instead of silently skipping - Bump max_response_bytes to 50 MB for Telegram file downloads - When document extraction fails (too large, download error, parse error), set extracted_text to a user-friendly error message instead of leaving it None. This ensures the LLM tells the user what went wrong. - On Telegram download failure, set extracted_text with the error so the user sees feedback even when the file never reaches the extraction middleware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: store extracted document text in workspace memory for search/recall After document extraction succeeds, write the extracted text to workspace memory at `documents/{date}/{filename}`. This enables: - Full-text and semantic search over past uploaded documents - Cross-conversation recall ("what did that PDF say?") - Automatic chunking and embedding via the workspace pipeline Documents are stored with metadata header (uploader, channel, date, MIME type). Error messages (extraction failures) are not stored — only successful extractions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: CI failures — formatting, unused assignment warning - Run cargo fmt on document_extraction and agent_loop modules - Suppress unused_assignments warning on trace_llm_ref (used only behind #[cfg(feature = "libsql")]) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments — security, correctness, and code quality Security fixes: - Remove SSRF-prone download() from DocumentExtractionMiddleware (nearai#13) - Sanitize filenames in workspace path to prevent directory traversal (nearai#11) - Pre-check file size before reading in WASM wrapper to prevent OOM (nearai#2) - Percent-encode file_id in Telegram source URLs (nearai#7) Correctness fixes: - Clear image_content_parts on turn end to prevent memory leak (nearai#1) - Find first *successful* transcription instead of first overall (nearai#3) - Enforce data.len() size limit in document extraction (nearai#10) - Use UTF-8 safe truncation with char_indices() (nearai#12) Robustness & code quality: - Add 120s timeout to OpenAI Whisper HTTP client (nearai#5) - Trim trailing slash from Whisper base_url (nearai#6) - Allow ~/.ironclaw/ paths in WASM wrapper (nearai#8) - Return error from on_broadcast in Slack/Discord/WhatsApp (nearai#9) - Fix doc comment in HTTP tool (nearai#4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: formatting — cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address latest PR review — doc comments, error messages, version bumps - Fix DocumentExtractionMiddleware doc comment (no longer downloads from source_url) - Fix error message: "no inline data" instead of "no download URL" - Log error + fallback instead of silent unwrap_or_default on Whisper HTTP client - Bump all capabilities.json versions from 0.1.0 to 0.2.0 to match Cargo.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unsupported profile: minimal from CI workflows [skip-regression-check] dtolnay/rust-toolchain@stable does not accept the 'profile' input (it was a parameter for the deprecated actions-rs/toolchain action). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: merge with latest main — resolve compilation errors and PR review nits - Add version: None to RegistryEntry/InstalledExtension test constructors - Fix MessageContent type mismatches in nearai_chat tests (String → MessageContent::Text) - Fix .contains() calls on MessageContent — use .as_text().unwrap() - Remove redundant trace_llm_ref = None assignment in test_rig - Check data size before clone in document extraction to avoid unnecessary allocation [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
drchirag1991
pushed a commit
to drchirag1991/ironclaw
that referenced
this pull request
Apr 8, 2026
…i-tenant isolation (nearai#1626) * feat: complete multi-tenant isolation — per-user budgets, model selection, heartbeat cycling Finishes the remaining isolation work from phases 2–4 of nearai#59: Phase 2 (DB scoping): Fix /status and /list commands to use _for_user DB variants instead of global queries that leaked cross-user job data. Phase 3 (Runtime isolation): Per-user workspace in routine engine's spawn_fire so lightweight routines run in the correct user context. Per-user daily cost tracking in CostGuard with configurable budget via MAX_COST_PER_USER_PER_DAY_CENTS. Multi-user heartbeat that cycles through all users with routines, auto-detected from GATEWAY_USER_TOKENS. Phase 4 (Provider/tools): Per-user model selection via preferred_model setting — looked up from SettingsStore on first iteration, threaded through ReasoningContext.model_override to CompletionRequest. Works with providers that support per-request model overrides (NearAI). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use selected_model setting key to match /model command persistence The dispatcher was reading "preferred_model" but the /model command (merged from staging) persists to "selected_model". Since set_setting is already per-user scoped, using the same key makes /model work as the per-user model override in multi-tenant mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: heartbeat hygiene, /model multi-tenant guard, RigAdapter model override Three follow-up fixes for multi-tenant isolation: 1. Multi-user heartbeat now runs memory hygiene per user before each heartbeat check, matching single-user heartbeat behavior. 2. /model command in multi-tenant mode only persists to per-user settings (selected_model) without calling set_model() on the shared LlmProvider. The per-request model_override in the dispatcher reads from the same setting. Added multi_tenant flag to AgentConfig (auto-detected from GATEWAY_USER_TOKENS). 3. RigAdapter now supports per-request model overrides by injecting the model name into rig-core's additional_params. OpenAI/Anthropic/Ollama API servers use last-key-wins for duplicate JSON keys, so the override takes effect via serde's flatten serialization order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR review — cost model attribution, heartbeat concurrency, pruning Fixes from review comments on nearai#1614: - Cost tracking now uses the override model name (not active_model_name) when a per-user model override is active, for accurate attribution. - Multi-user heartbeat runs per-user checks concurrently via JoinSet instead of sequentially, preventing one slow user from blocking others. - Per-user failure counts tracked independently; users exceeding max_failures are skipped (matching single-user semantics). - per_user_daily_cost HashMap pruned on day rollover to prevent unbounded growth in long-lived deployments. - Doc comment fixed: says "routines" not "active routines". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: /status ownership, model persistence scoping, heartbeat robustness Addresses second round of PR review on nearai#1614: - /status <job_id> DB path now validates job.user_id == requesting user before returning data (was missing ownership check, security fix). - persist_selected_model takes user_id param instead of owner_id, and skips .env/TOML writes in multi-tenant mode (these are shared global files). handle_system_command now receives user_id from caller. - JoinSet collection handles Err(JoinError) explicitly instead of silently dropping panicked tasks. - Notification forwarder extracts owner_id from response metadata in multi-tenant mode for per-user routing instead of broadcasting to the agent owner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cost pricing, fire_manual workspace, heartbeat concurrency cap Round 3 review fixes: - Cost tracking passes None for cost_per_token when model override is active, letting CostGuard look up pricing by model name instead of using the default provider's rates (serrrfirat). - fire_manual() now uses per-user workspace, matching spawn_fire() pattern (serrrfirat). - Removed MULTI_TENANT env var — multi-tenant mode is auto-detected solely from GATEWAY_USER_TOKENS presence (serrrfirat + Copilot). - Multi-user heartbeat capped at 8 concurrent tasks to avoid flooding the LLM provider (serrrfirat + Copilot). - Fixed inject_model_override doc comment accuracy (Copilot). - Added comment explaining multi-tenant notification routing priority (Copilot). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: user-scoped webhook endpoint for multi-tenant isolation Adds POST /api/webhooks/u/{user_id}/{path} — a user-scoped webhook endpoint that filters the routine lookup by user_id, preventing cross-user webhook triggering when paths collide. The existing /api/webhooks/{path} endpoint remains unchanged for backward compatibility in single-user deployments. Changes: - get_webhook_routine_by_path gains user_id: Option<&str> param - Both postgres and libsql implementations add AND user_id = ? filter when user_id is provided - New webhook_trigger_user_scoped_handler extracts (user_id, path) from URL and passes to shared fire_webhook_inner logic - Route registered on public router (webhooks are called by external services that can't send bearer tokens) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(db): add UserStore trait with users, api_tokens, invitations tables Foundation for DB-backed user management (nearai#1605): - UserRecord, ApiTokenRecord, InvitationRecord types in db/mod.rs - UserStore sub-trait (17 methods) added to Database supertrait - PostgreSQL migration V14__users.sql (users, api_tokens, invitations) - libSQL schema + incremental migration V14 - Full implementations for both PgBackend (via Store delegation) and LibSqlBackend (direct SQL in libsql/users.rs) - authenticate_token JOINs api_tokens+users with active/non-revoked checks; has_any_users for bootstrap detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(web): DB-backed auth, user/token/invitation API handlers Adds the web gateway layer for DB-backed user management (nearai#1605): Auth refactor: - CombinedAuthState wraps env-var tokens (MultiAuthState) + optional DbAuthenticator for DB-backed token lookup with LRU cache (60s TTL, 1024 max entries) - auth_middleware tries env-var tokens first, then DB fallback - From<MultiAuthState> impl for backward compatibility - main.rs wires with_db_auth when database is available API handlers (12 new endpoints): - /api/admin/users — CRUD: create, list, detail, update, suspend, activate - /api/tokens — create (returns plaintext once), list, revoke - /api/invitations — create, list, accept (creates user + first token) Token creation: 32 random bytes → hex plaintext, SHA-256 hash stored. Invitation accept: validates hash + pending + not expired, creates user record and first API token atomically. All test files updated for CombinedAuthState type change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: startup env-var user migration + UserStore integration tests Completes the DB-backed user management feature (nearai#1605): - Startup migration: when GATEWAY_USER_TOKENS is set and the users table is empty, inserts env-var users + hashed tokens into DB. Logs deprecation notice when DB already has users. - hash_token made pub for reuse in migration code. - 10 integration tests for UserStore (libsql file-backed): - has_any_users bootstrap detection - create/get/get_by_email/list/update user lifecycle - token create → authenticate → revoke → reject cycle - suspended user tokens rejected - wrong-user token revoke returns false - invitation create → accept → user created - record_login and record_token_usage timestamps - libSQL migration: removed FK constraints from V14 (incompatible with execute_batch inside transactions). Tables in both base SCHEMA and incremental migration for fresh and existing databases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove GATEWAY_USER_TOKENS, fix review feedback GATEWAY_USER_TOKENS never went to production — replaced entirely by DB-backed user management via /api/admin/users and /api/tokens. Removed: - UserTokenConfig struct and GATEWAY_USER_TOKENS env var parsing - user_tokens field from GatewayConfig - GatewayChannel::new_multi_auth() constructor - Env-var user migration block in main.rs (~90 lines) - multi_tenant auto-detection from GATEWAY_USER_TOKENS (now runtime via db.has_any_users() in app.rs) Review fixes (zmanian): - User ID generation: UUID instead of display-name derivation (nearai#1) - Invitation accept moved to public router (no auth needed) (nearai#3) - libSQL get_invitation_by_hash aligned with postgres: filters status='pending' AND expires_at > now (nearai#4) - UUID parse: returns DatabaseError::Serialization instead of unwrap_or_default (nearai#7) - PostgreSQL SELECT * replaced with explicit column lists (nearai#8) - Sort order aligned (both backends use DESC) (nearai#6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add role-based access control (admin/member) Adds a `role` field (admin|member) to user management: Schema: - `role TEXT NOT NULL DEFAULT 'member'` added to users table in both PostgreSQL V14 migration and libSQL schema/incremental migration - UserRecord gains `role: String` field - UserIdentity gains `role: String` field, populated from DB in DbAuthenticator and defaulting to "admin" for single-user mode Access control: - AdminUser extractor: returns 403 Forbidden if role != "admin" - /api/admin/users/* handlers: require AdminUser (create, list, detail, update, suspend, activate) - POST /api/invitations: requires AdminUser (only admins can invite) - User creation accepts optional "role" param (defaults to "member") - Invitation acceptance creates users with "member" role Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(web): add Users admin tab to web UI Adds a Users tab to the web gateway UI for managing users, tokens, and roles without needing direct API calls. Features: - User list table with ID, name, email, role, status, created date - Create user form with display name, email, role selector - Suspend/activate actions per user - Create API token for any user (shows plaintext once with copy button) - Role badges (admin highlighted, member muted) - Non-admin users see "Admin access required" message - Keyboard shortcut: Cmd/Ctrl+5 switches to Users tab CSS: - Reuses routines-table styles for the user list - Badge, token-display, btn-small, btn-danger, btn-primary components Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: move Users to Settings subtab, bootstrap admin user on first run - Moved Users from top-level tab to Settings sidebar subtab (under Skills, before Theme toggle) - On first startup with empty users table, automatically creates an admin user from GATEWAY_USER_ID config with a corresponding API token from GATEWAY_AUTH_TOKEN. This ensures the owner appears in the Users panel immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: user creation shows token, + Token works, no password save popup Three UI/UX fixes: 1. Create user now generates an initial API token and shows it in a copy-able banner instead of triggering the browser's password save dialog. Uses autocomplete="off" and type="text" for email field. 2. "+ Token" button works: exposed createTokenForUser/suspendUser/ activateUser on window for inline onclick handlers in dynamically generated table rows. Token creation uses showTokenBanner helper. 3. Admin token creation: POST /api/tokens now accepts optional "user_id" field when the requesting user is admin, allowing token creation for other users from the Users panel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use event delegation for user action buttons (CSP compliance) Inline onclick handlers are blocked by the Content-Security-Policy (script-src 'self' without 'unsafe-inline'). Switched to data-action attributes with a delegated click listener on the users table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add i18n for Users subtab, show login link on user creation - Added 'settings.users' i18n key for English and Chinese - Token banner now shows a full login link (domain/?token=xxx) with a Copy Link button, plus the raw token below - Login link works automatically via existing ?token= auto-auth Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: token hash mismatch — hash hex string, not raw bytes Critical auth bug: token creation hashed the raw 32 bytes (hasher.update(token_bytes)) but authentication hashed the hex-encoded string (hash_token(candidate) where candidate is the hex string the user sends). This meant newly created tokens could never authenticate. Fixed all 4 token creation sites (users, tokens, invitations create, invitations accept) to use hash_token(&plaintext_token) which hashes the hex string consistently with the auth lookup path. Removed now-unused sha2::Digest imports from handlers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove invitation system The invitation flow is redundant — admin create user already generates a token and shows a login link. Invitations add complexity without value until email integration exists. Removed: - InvitationRecord struct and 4 UserStore trait methods - invitations table from V14 migration (postgres + both libsql schemas) - PostgreSQL Store methods (create/get/accept/list invitations) - libSQL UserStore invitation methods + row_to_invitation helper - invitations.rs handler file (212 lines) - /api/invitations routes (create, list, accept) - test_invitation_lifecycle test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: user deletion, self-service profile, per-user job limits, usage API Four multi-tenancy improvements: 1. User deletion cascade (DELETE /api/admin/users/{id}): Deletes user and all data across 11 user-scoped tables (settings, secrets, routines, memory, jobs, conversations, etc.). Admin only. 2. Self-service profile (GET/PATCH /api/profile): Users can read and update their own display_name and metadata without admin privileges. 3. Per-user job concurrency (MAX_JOBS_PER_USER env var): Scheduler checks active_jobs_for(user_id) before dispatch. Prevents one user from exhausting all job slots. 4. Usage reporting (GET /api/admin/usage?user_id=X&period=day|week|month): Aggregates LLM costs from llm_calls via agent_jobs.user_id. Returns per-user, per-model breakdown of calls, tokens, and cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add TenantCtx for compile-time tenant isolation Implements zmanian's architectural proposal from nearai#1614 review: two-tier scoped database access (TenantScope/AdminScope) so handler code cannot accidentally bypass tenant scoping. TenantScope (default): wraps user_id + Arc<dyn Database>, auto-binds user_id on every operation. ID-based lookups return None for cross- tenant resources. No escape hatch — forgetting to scope is a compile error. AdminScope (explicit opt-in): cross-tenant access for system-level components (heartbeat, routine engine, self-repair, scheduler, worker). TenantCtx bundles TenantScope + workspace + cost guard + per-user rate limiting. Constructed once per request in handle_message, threaded through all command handlers and ChatDelegate. Key changes: - New src/tenant.rs (~920 lines): TenantScope, AdminScope, TenantCtx, TenantRateState, TenantRateRegistry - All command handlers: user_id: &str → ctx: &TenantCtx - ChatDelegate: cost check/record/settings via self.tenant - System components: store field changed to AdminScope - Config: TENANT_MAX_LLM_CONCURRENT, TENANT_MAX_JOBS_CONCURRENT env vars - Fixes bug: /status <job_id> cross-tenant leak (now auto-filtered) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR nearai#1626 review feedback — bounded LRU cache, admin auth, FK cleanup - Replace HashMap with lru::LruCache in DbAuthenticator so the token cache is hard-bounded at 1024 entries (evicts LRU, not just expired) - Gate admin user endpoints (list/detail/update/suspend/activate) with AdminUser extractor so members get 403 instead of full access - Add api_tokens to libSQL delete_user cleanup list to prevent orphaned tokens (libSQL has no FK cascade) - Add regression tests for all three fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update CA certificates in runtime Docker image Ensures the root certificate bundle is current so TLS handshakes to services like Supabase succeed on Railway. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CI failures — formatting, no-panics check - Run cargo fmt on test code - Replace .expect() with const NonZeroUsize in DbAuthenticator - Add // safety: comments for test-only code in multi_tenant.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: switch PostgreSQL TLS from rustls to native-tls rustls with rustls-native-certs fails TLS handshake on Railway's slim container (empty or stale root cert store). native-tls delegates to OpenSSL on Linux which handles system certs more reliably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Adding user management api * feat: admin secrets provisioning API + API documentation - Add PUT/GET/DELETE /api/admin/users/{id}/secrets/{name} endpoints for application backends to provision per-user secrets (AES-256-GCM encrypted) - Add secrets_store field to GatewayState with builder wiring - Create docs/USER_MANAGEMENT_API.md with full API spec covering users, secrets, tokens, profile, and usage endpoints - Update web gateway CLAUDE.md route table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add CatchPanicLayer to capture handler panics Without this, panics in async handlers silently drop the connection and the edge proxy returns a generic 503. Now panics are caught, logged, and returned as 500 with the panic message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address second-round review — transactional delete, overflow, error logging - C1: Wrap PostgreSQL delete_user() in a transaction so partial cleanup can't leave users in a half-deleted state - M2: Add job_events to delete cleanup (both backends) — FK to agent_jobs without CASCADE would cause FK violation - H1/M4: Cap expires_in_days to 36500 before i64 cast (tokens + secrets) - H2: Validate target user exists before creating admin token to prevent orphan tokens on libSQL - H3: Log DB errors in DbAuthenticator::authenticate() instead of silently swallowing them as 401 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert to rustls with webpki-roots fallback for PostgreSQL TLS native-tls/OpenSSL caused silent crashes (segfaults in C code) during DB writes on Railway containers. Switch back to rustls but add webpki-roots as a fallback when system certs are missing, which was the original TLS handshake failure on slim container images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update Cargo.lock for rustls + webpki-roots Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * debug: add /api/debug/db-write endpoint to diagnose user insert failure Temporary diagnostic endpoint that tests DB INSERT to users table with full error logging. No auth required. Will be removed after debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: use cargo-chef in Dockerfile for dependency caching Splits the build into planner/deps/builder stages. Dependencies are only recompiled when Cargo.toml or Cargo.lock change. Source-only changes skip straight to the final build stage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * debug: add tracing to users_create_handler Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: guard created_by FK in user creation handler The auth identity user_id (from owner_id scope) may not match any user row in the DB, causing a FK violation on the created_by column. Check that the referenced user exists before setting created_by. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: collapse GATEWAY_USER_ID into IRONCLAW_OWNER_ID Remove the separate GATEWAY_USER_ID config. The gateway now uses IRONCLAW_OWNER_ID (config.owner_id) directly for auth identity, bootstrap user creation, and workspace scoping. Previously, with_owner_scope() rebinds the auth identity to owner_id while keeping default_sender_id as the gateway user_id. This caused a FK constraint violation when creating users because the auth identity ("default") didn't match any user in the DB ("nearai"). Changes: - Remove GATEWAY_USER_ID env var and gateway_user_id from settings - Remove user_id field from GatewayConfig - Add owner_id parameter to GatewayChannel::new() - Remove with_owner_scope() method - Remove default_sender_id from GatewayState - Remove sender override logic in chat/approval handlers - Remove debug endpoint and tracing from prior debugging - Update all tests and E2E fixtures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: hide Users tab for non-admins, remove auth hint text - Fetch /api/profile after login and hide the Users settings tab when the user's role is not admin - Remove the "Enter the GATEWAY_AUTH_TOKEN" hint from the login page since tokens are now managed via the admin panel, not .env files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review feedback (auth 503, token expiry, CORS PATCH) - DB auth errors now return 503 instead of 401 so outages are distinguishable from invalid tokens (serrrfirat H3) - Cap expires_in_days to 36500 before i64 cast to prevent negative duration from u64 overflow (serrrfirat H1) - Add PATCH to CORS allowed methods for profile/user update endpoints (Copilot) - Stop leaking panic details in CatchPanicLayer response body Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden multi-tenant isolation — review fixes from nearai#1614 - Add conversation ownership checks in TenantScope: add_conversation_message, touch_conversation, list_conversation_messages (+ paginated), update_conversation_metadata_field, get_conversation_metadata now return NotFound for conversations not owned by the tenant (cross-tenant data leak) - Fix multi-user heartbeat: clear notify_user_id per runner so notifications persist to the correct user, not the shared config target - Move hygiene tasks into bounded JoinSet instead of unbounded tokio::spawn - Revert send_notification to private visibility (only used within module) - Use effective_model_name() for cost attribution in dispatcher so providers that ignore per-request model overrides report the actual model used - Fix inject_model_override doc comment; add 3 unit tests - Fix heartbeat doc comment ("routines" not "active routines") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Jobs, Cost, Last Active columns to admin Users table Add UserSummaryStats struct and user_summary_stats() batch query to the UserStore trait (both PostgreSQL and libSQL backends). The admin users list endpoint now fetches per-user aggregates (job count, total LLM spend, most recent activity) in a single query and includes them inline in the response. The frontend Users table displays three new columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review comments and CI formatting failures CI fixes: - cargo fmt fixes in cli/mod.rs and db/tls.rs Security/correctness (from Copilot + serrrfirat + pranavraja99 reviews): - Token create: reject expires_in_days > 36500 with 400 instead of silent clamp - Token create: return 404 when admin targets non-existent user - User create: map duplicate email constraint violations to 409 Conflict - User create: remove unnecessary DB roundtrip for created_by (use AdminUser directly) - DB auth: log warn on DB lookup failures instead of silently swallowing errors - libSQL: add FK constraints on users.created_by and api_tokens.user_id Config fixes: - agent.multi_tenant: resolve from AGENT_MULTI_TENANT env var instead of hardcoding false - heartbeat.multi_tenant: fix doc comment to match actual env-var-based behavior UI fix: - showTokenBanner: pass correct title ("Token created!" vs "User created!") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining review comments (round 2) - Secrets handlers: normalize name to lowercase before store operations, validate target user_id exists (returns 404 if not found) - libSQL: propagate cost parsing errors instead of unwrap_or_default() in both user_usage_stats and user_summary_stats - users_list_handler: propagate user_summary_stats DB errors (was silently swallowed with unwrap_or_default) - loadUsers: distinguish 401/403 (admin required) from other errors - Docs: fix users.id type (TEXT not UUID), remove "invitation flow" from V14 migration comment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: i18n for Users tab, atomic user+token creation, transactional delete_user i18n: - Add 31 translation keys for all Users tab strings (en + zh-CN) - Wire data-i18n attributes on HTML elements (headings, buttons, inputs, table headers, empty state) - Replace all hard-coded strings in app.js with I18n.t() calls Atomic user+token creation: - Add create_user_with_token() to UserStore trait - PostgreSQL: wraps both INSERTs in conn.transaction() with auto-rollback - libSQL: wraps in explicit BEGIN/COMMIT with ROLLBACK on error - Handler uses single atomic call instead of two separate operations Transactional delete_user for libSQL: - Wrap multi-table DELETE cascade in BEGIN/COMMIT transaction - ROLLBACK on any error to prevent partial cleanup / inconsistent state - Matches the PostgreSQL implementation which already used transactions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert V14 migration to match deployed checksum [skip-regression-check] Refinery checksums applied migrations — editing V14__users.sql after it was already applied causes deployment failures. Revert the cosmetic comment changes (added in df40b22) to restore the original checksum. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: bootstrap onboarding flow for multi-tenant users The bootstrap greeting and workspace seeding only ran for the owner workspace at startup, so new users created via the admin API never received the welcome message or identity files (BOOTSTRAP.md, SOUL.md, AGENTS.md, USER.md, etc.). Three fixes: - tenant_ctx(): seed per-user workspace on first creation via seed_if_empty(), which writes identity files and sets bootstrap_pending when the workspace is truly fresh - handle_message(): check take_bootstrap_pending() on the tenant workspace (not the owner workspace) and persist the greeting to the user's own assistant conversation + broadcast via SSE - WorkspacePool: seed new per-user workspaces in the web gateway so memory tools also see identity files immediately The existing single-user bootstrap in Agent::run() is preserved for non-multi-tenant deployments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining PR review comments (round 3) - Docs: fix metadata description from "merge patch" to "full replacement" - Secrets: reject expires_in_days > 36500 with 400 (was silently clamped) - libSQL: CAST(SUM(cost) AS TEXT) in user_usage_stats and user_summary_stats to prevent SQLite numeric coercion from crashing get_text() — this was the root cause of the Copilot "SUM returns numeric type" comments - Add 3 regression tests: user_summary_stats (empty + with data) and user_usage_stats (multi-model aggregation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add role change support for users (admin/member toggle) - Add update_user_role() to UserStore trait + both backends (PostgreSQL and libSQL) - Extend PATCH /api/admin/users/{id} to accept optional "role" field with validation (must be "admin" or "member") - Add "Make Admin" / "Make Member" toggle button in Users table actions - Add i18n keys for role change (en + zh-CN) - Update API docs to document the role field on PATCH - Fix test helpers to use fmt_ts() for timestamps (was using SQLite datetime('now') which produces incompatible format for string comparison) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: show live LLM spend in Users table instead of only DB-recorded costs [skip-regression-check] Chat turns record LLM cost in CostGuard (in-memory) but don't create agent_jobs/llm_calls DB rows — those are only written for background jobs. The Users table was querying only from DB, so it showed $0.00 for users who only chatted. Now supplements DB stats with CostGuard.daily_spend_for_user() — the same source displayed in the status bar token counter. Shows whichever is larger (DB historical total vs live daily spend). Also falls back to last_login_at for "Last Active" when no DB job activity exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: persist chat LLM calls to DB and fix usage stats query Two root causes for zero usage stats: 1. ChatDelegate only recorded LLM costs to CostGuard (in-memory) — never to the llm_calls DB table. Added DB persistence via TenantScope.record_llm_call() after each chat LLM call, with job_id=NULL and conversation_id=thread_id. 2. user_summary_stats query only joined agent_jobs→llm_calls, missing chat calls (which have job_id=NULL). Redesigned query to start from llm_calls and resolve user_id via COALESCE(agent_jobs.user_id, conversations.user_id) — covers both job and chat LLM calls. Both PostgreSQL and libSQL queries updated. TenantScope gets record_llm_call() method. Tests updated for new query semantics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review comments — input validation, cost semantics, panic safety [skip-regression-check] - Validate display_name: trim whitespace, reject empty strings (create + update) - Validate metadata: must be a JSON object, return 400 if not (admin + profile) - secrets_list_handler: verify target user_id exists before listing - Cost display: use DB total directly (chat calls now persist to DB), remove confusing max(db,live) CostGuard fallback - CatchPanicLayer: truncate panic payload to 200 chars in log to limit potential sensitive data exposure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Copilot round 5 — docs, secrets consistency, token name, provider field [skip-regression-check] - Docs: users.id note updated to "typically UUID v4 strings (bootstrap admin may use a custom ID)" - secrets_list_handler: return 503 when DB store is None (was falling through to list secrets without user validation) - tokens_create: trim + reject empty token name (matching display_name pattern) - LlmCallRecord.provider: use llm_backend ("nearai","openai") instead of model_name() which returns the model identifier - user_summary_stats zero-LLM users: acceptable — handler already falls back to 0 cost and last_login_at for missing entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: DB auth returns 503 on outage, scheduler counts only blocking jobs From serrrfirat review: - DB auth: return Err(()) on database errors so middleware returns 503 instead of silently returning Ok(None) → 401 (auth miss) - Scheduler: add parallel_blocking_count_for() that uses is_parallel_blocking() (Pending/InProgress/Stuck) instead of is_active() for per-user concurrency — Completed/Submitted jobs no longer count against MAX_JOBS_PER_USER From Copilot: - CLAUDE.md: fix secrets route paths from {id} to {user_id} - token_hash: use .as_slice() instead of .to_vec() to avoid heap allocation on every token auth/creation call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: immediate auth cache invalidation on security-critical actions (zmanian review nearai#6) Add DbAuthenticator::invalidate_user() that evicts all cached entries for a user. Called after: - Suspend user (immediate lockout, was 60s delay) - Activate user (immediate access restoration) - Role change (admin↔member takes effect immediately) - Token revocation (revoked token can't be reused from cache) The DbAuthenticator is shared (via Clone, which Arc-clones the cache) between the auth middleware and GatewayState, so handlers can evict entries from the same cache the middleware reads. Also from zmanian's review: - Items 1-5, 7-11 were already resolved in prior commits - Item 12 (String→enum for status/role) is deferred as a broader refactor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: last-admin protection, usage stats for chat calls, UTF-8 safe panic truncation Last-admin protection: - Suspend, delete, and role-demotion of the last active admin now return 409 Conflict instead of succeeding and locking out the admin API - Helper is_last_admin() checks active admin count before destructive ops Usage stats: - user_usage_stats() now includes chat LLM calls (job_id=NULL) by joining via conversations.user_id, matching user_summary_stats() - Both PostgreSQL and libSQL queries updated Panic handler: - Use floor_char_boundary(200) instead of byte-index [..200] to prevent panic on multi-byte UTF-8 characters in panic messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: workspace seed race, bootstrap atomicity, email trim, secrets upsert response [skip-regression-check] - WorkspacePool: await seed_if_empty() synchronously after inserting into cache (drop lock first to avoid blocking), so callers see identity files immediately instead of racing a background task - Bootstrap admin: use create_user_with_token() for atomic user+token creation, matching the admin create endpoint - Email: trim whitespace, treat empty as None to prevent " " being stored and breaking uniqueness - Secrets PUT: report "updated" vs "created" based on prior existence - Last token_hash.to_vec() → .as_slice() in authenticate_token Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: disable unscoped webhook endpoint in multi-tenant mode [skip-regression-check] The original /api/webhooks/{path} endpoint looks up routines across all users. In multi-tenant mode, anyone who knows the webhook path + secret could trigger another user's routine. Now returns 410 Gone with a message pointing to the scoped endpoint /api/webhooks/u/{user_id}/{path}. Detection uses state.db_auth.is_some() — present only when DB-backed auth is enabled (multi-tenant). Single-user deployments are unaffected. From: standardtoaster review comment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: webhook multi-tenant check, secrets error propagation, stale doc comment [skip-regression-check] - Webhook: use workspace_pool.is_some() instead of db_auth.is_some() for multi-tenant detection — db_auth is set for any DB deployment, workspace_pool is only set when has_any_users() was true at startup - Secrets: propagate exists() errors instead of unwrap_or(false) so backend outages surface as 500 rather than incorrect "created" status - Config: fix stale workspace_read_scopes comment referencing user_id Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ilblackdragon
added a commit
that referenced
this pull request
Apr 9, 2026
* feat(workspace): add JSON Schema validation to document metadata Add a `schema` field to `DocumentMetadata` that enables automatic content validation on workspace writes. When a document or its folder `.config` carries a JSON Schema, all write operations (write, append, patch, write_to_layer, append_to_layer) validate content against it before persisting. This is the foundation for typed system state (settings, extension configs, skill manifests) stored as workspace documents. Builds on the metadata infrastructure from #1723 — schema is inherited via the existing `.config` chain (folder → document → defaults). Refs: #640, #1894, #1937 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tools): add channel-agnostic ToolDispatcher with audit trail Introduce `ToolDispatcher` — a universal entry point for executing tools from any caller (gateway, CLI, routine engine, WASM channels). Creates lightweight system jobs for FK integrity, records ActionRecords, and returns ToolOutput. This is a third entry point alongside v1's Worker::execute_tool() and v2's EffectBridgeAdapter::execute_action(). DispatchSource::Channel(String) is intentionally string-typed — channels are interchangeable extensions that can appear at runtime. Also adds JobContext::system() factory and create_system_job() to both PostgreSQL and libSQL backends. Refs: #640 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(workspace): settings-as-workspace-documents with dual-write adapter Add WorkspaceSettingsAdapter that implements SettingsStore by reading/ writing workspace documents at _system/settings/{key}.json. During migration, dual-writes to both the legacy settings table and workspace. Reads prefer workspace, falling back to the legacy table. Known setting keys (llm_backend, selected_model, tool_permissions.*, etc.) get JSON Schemas stored in document metadata — writes are validated automatically by Phase 0's schema validation. Also adds settings_schemas.rs with compile-time schema registry and settings_path() helper. Refs: #640, #1937 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(gateway): wire ToolDispatcher into GatewayState Add tool_dispatcher field to GatewayState with with_tool_dispatcher() builder method. Create and wire the dispatcher in main.rs when both tool_registry and database are available. All 16 GatewayState construction sites updated. Per-handler migration (routing mutations through ToolDispatcher instead of direct DB calls) is deferred to follow-up PRs — each handler has complex ownership checks, cache refresh, and response types. Refs: #640 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tools): add system introspection tools (tools_list, version) Add SystemToolsListTool and SystemVersionTool as proper Tool implementations that replace hardcoded /tools and /version commands. Registered at startup via register_system_tools(). Available in both v1 and v2 engines — no is_v1_only_tool filter to worry about. Refs: #640 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(workspace): extension and skill state schemas and path helpers Add workspace path helpers and JSON Schemas for storing extension configs, extension state, and skill manifests under _system/extensions/ and _system/skills/. This establishes the workspace document structure that ExtensionManager and SkillRegistry will use as a durable persistence backend (read-through cache pattern). Runtime state (active MCP connections, WASM runtimes) stays in memory. Only durable config and activation state moves to workspace documents. Refs: #640, #1741 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR review feedback and CI failures CI fixes: - deny.toml: allow MIT-0 license required by jsonschema - workspace/document.rs: #[allow(dead_code)] on system path constants pending follow-up phases that consume them - workspace/settings_adapter.rs: remove unused chrono::Utc import - workspace/settings_adapter.rs: collapse nested if into && form Review fixes (gemini-code-assist): - tools/dispatch.rs: await save_action directly instead of fire-and-forget tokio::spawn so short-lived CLI callers cannot drop audit records before they are persisted; surface errors via tracing::warn - tools/dispatch.rs: remove DispatchSource::Agent variant — sequence_num=0 with a reused job_id would violate UNIQUE(job_id, sequence_num). Agent callers must use Worker::execute_tool() which manages sequence numbers atomically against the agent's existing job - workspace/settings_adapter.rs: validate content against the schema BEFORE the first workspace write so the initial document creation cannot bypass schema enforcement (subsequent writes are validated by the workspace resolved-metadata path established after the first write) Refs: #2049 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: unify all machine state under .system/ Rename the workspace prefix from `_system/` to `.system/` (Unix dot-prefix convention for hidden internal state) and migrate v2 engine state from `engine/` to `.system/engine/` so all machine-managed state lives under one root. New layout: .system/ ├── settings/ (per-user settings as workspace docs) ├── extensions/ (extension config + activation state) ├── skills/ (skill manifests) └── engine/ ├── README.md (auto-generated index) ├── knowledge/ (lessons, skills, summaries, specs, issues) ├── orchestrator/ (Python orchestrator versions, failures, overlays) ├── projects/ (project files + nested missions/) └── runtime/ (threads, steps, events, leases, conversations) The inner `.runtime/` dot-prefix is dropped under `.system/engine/` since `.system/` itself is the hidden marker; no double-hiding needed. The `ENGINE_PREFIX` constant in `workspace::document::system_paths` is declared as the canonical convention; bridge `store_adapter` continues to define per-subdirectory constants below it for ergonomic interpolation. No legacy migration code — pre-production rename. Refs: #2049 * fix(pr-2049): security, correctness, and robustness fixes from review Critical security: - dispatch.rs: redact sensitive params before persisting ActionRecord (was leaking plaintext secrets into the audit log for tools with sensitive_params()) - settings_schemas.rs: validate settings keys against path traversal (reject /, \, .., leading ., empty, length > 128, non-alphanumeric); wire validation into all settings_adapter read/write/delete paths Data correctness: - history/store.rs + libsql/jobs.rs: write status as JobState::Completed .to_string() ('completed' snake_case) instead of 'Completed'; system jobs were round-tripping as Pending in parse_job_state() - settings_adapter.rs: fix .system/.config metadata to set skip_versioning: false (was true) — descendants inherit this via find_nearest_config, so the previous value silently disabled versioning for ALL .system/** documents, contradicting the audit- trail intent - workspace/mod.rs: add resolve_metadata_in_scope; use it in write_to_layer / append_to_layer so non-primary layer writes resolve schema/indexing/versioning from the target layer's .config chain instead of the primary user_id's. Also pass &scope (not &self.user_id) to maybe_save_version so versions are attributed to the correct scope Pipeline parity: - dispatch.rs: add SafetyLayer to ToolDispatcher; mirror Worker pipeline (prepare_tool_params -> validator -> redact -> timeout -> sanitize output) so dispatch path gets the same safety guarantees as the agent worker. Sanitized output is now stored in ActionRecord.output_sanitized instead of duplicating raw JSON Robustness: - settings_adapter.rs: propagate update_metadata errors in ensure_system_config and write_to_workspace (was silently ignored via let _ =, leaving schemas/skip_indexing unenforced) - settings_adapter.rs: set_all_settings now collects the first workspace write error and returns it after the legacy write completes, so partial-migration state is observable - settings_schemas.rs: rewrite llm_custom_providers schema to match CustomLlmProviderSettings (id/name/adapter/base_url/default_model/ api_key/builtin instead of stale name/protocol/base_url/model) Build: - Cargo.toml: jsonschema with default-features = false to avoid pulling a second reqwest major version Docs: - db/mod.rs: docstring for create_system_job uses 'completed' snake_case - workspace/document.rs: clarify .system/ versioning ("by default ARE versioned; individual files may opt out via skip_versioning") - settings_adapter.rs: clarify per-key reads prefer workspace, aggregate reads stay on legacy during migration - tools/builtin/system.rs: trim doc to match implemented scope (system_tools_list, system_version) - channels/web/mod.rs: move stale 'sweep tasks managed by with_oauth' comment back to oauth_sweep_shutdown line Refs: #2049 * docs+ci: enforce 'everything goes through tools' principle Document the core design principle from #2049 in two places so future contributors (human and AI) discover it during development: - CLAUDE.md: new "Everything Goes Through Tools" section near the "Adding a New Channel" guide. Includes the rule, the rationale (audit trail, safety pipeline parity, channel-agnostic surface, agent parity), and a pointer to the detailed rule file. - .claude/rules/tools.md: full pattern with required/forbidden examples, the list of layers that ARE exempt (Worker::execute_tool, v2 EffectBridgeAdapter, tool implementations themselves, background engine jobs, read-aggregation queries), and how to annotate intentional exceptions. Also extends `paths` to cover src/channels/** and src/cli/** so it surfaces when those files are edited. Enforce with a new pre-commit safety check (#7) in scripts/pre-commit-safety.sh: - Scans newly added lines under src/channels/web/handlers/*.rs and src/cli/*.rs for direct touches of state.{store, workspace, workspace_pool, extension_manager, skill_registry, session_manager}. - Suppress with a trailing `// dispatch-exempt: <reason>` comment on the same line, matching the existing `// safety:` convention. - Only checks added lines (`+` in the diff), so existing untouched handlers don't trip the check during incremental migration. The check fires only for new code: handlers that haven't been migrated yet (52 existing direct accesses across 12 handler files) won't break unmodified, but any new line that bypasses the dispatcher will be flagged at commit time. Refs: #2049 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(pr-2049): address Copilot review on workspace schema layer - workspace::extension_state: extension/skill path helpers now reuse the canonical name validators (`canonicalize_extension_name`, `validate_skill_name`) instead of a weak `replace('/', "_")`. Names containing `..`, `\`, NUL, or other escapes are now rejected at the helper boundary, eliminating a path-traversal foothold for callers. Helpers return `Result<String, PathError>`. Regression tests added. - workspace::settings_adapter::ensure_system_config: now idempotent across upgrades. If `.system/.config` already exists with stale metadata (e.g. an older `skip_versioning: true` from before fix #3042846635), it is repaired to the expected inherited values instead of being left silently broken. Regression test added. - workspace::settings_adapter::write_to_workspace: lazily seeds `.system/.config` via a `OnceCell`, so callers no longer need to remember to invoke `ensure_system_config()` at startup before any setting write. Regression test added. - workspace::settings_adapter::delete_setting: workspace delete failures are now logged via `tracing::warn!` instead of being silently dropped. We still don't propagate the error — the legacy table is the source of truth during migration and a stale workspace doc is recoverable on the next write — but partial-delete state is now observable. - workspace::schema: documented why we don't cache compiled validators yet (settings/extension/skill writes are not a hot path; revisit if schema validation moves into a frequent write path). [skip-regression-check] schema.rs change is doc-only. * fix(pr-2049): address 4 remaining review issues 1. tool_dispatcher dropped during gateway startup src/channels/web/mod.rs: rebuild_state was initializing tool_dispatcher to None, so every subsequent with_* call zeroed the dispatcher the first caller injected. Preserve it across rebuild_state like every other field. Regression test: tool_dispatcher_survives_subsequent_with_calls. 2. WorkspaceSettingsAdapter not wired into runtime src/app.rs: Build the adapter in build_all() when workspace+db are both present, eagerly call ensure_system_config(), expose on AppComponents as settings_store, and thread it into init_extensions(...) so register_permission_tools and upgrade_tool_list receive it instead of the raw db. src/main.rs: SIGHUP handler prefers the adapter over raw db. src/workspace/mod.rs: re-export WorkspaceSettingsAdapter. 3. changed_by regression on layered writes src/workspace/mod.rs: write_to_layer and append_to_layer were passing the target layer's scope as changed_by, so version history attributed layered edits to the layer name instead of the actor. Pass self.user_id while keeping metadata resolution in the target scope. Regression test: layered_writes_record_actor_in_changed_by. 4. Legacy engine/ paths invisible after upgrade src/bridge/store_adapter.rs: Add migrate_legacy_engine_paths(), called at the start of load_state_from_workspace(), which scans list_all() for engine/... documents and rewrites them to .system/engine/... Idempotent: skips rewrites when the new path already exists, deletes the legacy duplicate either way. Three regression tests in #[cfg(all(test, feature = "libsql"))] module. Quality gate: cargo fmt, cargo clippy --all --all-features zero warnings, cargo test --all-features --lib 4313 passed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(e2e): use PUT for settings write in ownership test test_settings_written_and_readable was sending POST /api/settings/{key} but the route has been PUT since #4 (Feb 2026) — the test was returning 405 Method Not Allowed. Switch to httpx.put() so it matches the current route registration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(pr-2049): address second round of review feedback Addresses the remaining unresolved PR #2049 review comments from serrrfirat and ilblackdragon. ## Changes ### ToolDispatcher — integration coverage + log level - src/tools/dispatch.rs: add two libsql-gated integration tests for the full dispatch pipeline: (a) persist an ActionRecord with sensitive params redacted in the audit row while the tool still sees the raw value, sanitized output populated; (b) honor the per-tool execution_timeout() and record a failure action. - Tests use a raw-SQL helper to find system-category jobs since list_agent_jobs_for_user intentionally filters them out. - Replace warn! with debug! on audit persistence failure — dispatch is reachable from interactive CLI/REPL sessions where warn!/info! output corrupts the terminal UI (CLAUDE.md Code Style → logging). ### WorkspaceSettingsAdapter — log level - src/workspace/settings_adapter.rs: same warn! → debug! fix on the delete_setting workspace failure path, for the same REPL reason. ### Schema validation — surface all errors - src/workspace/schema.rs: switch from jsonschema::validate to validator_for + iter_errors so users fixing a malformed setting see every violation in one round instead of playing whack-a-mole. Also distinguishes "invalid schema" from "invalid content" errors. - Regression tests: multiple_errors_are_all_reported and invalid_schema_is_distinguished_from_invalid_content. ### create_system_job — started_at + row growth docs - src/db/libsql/jobs.rs and src/history/store.rs: include started_at in the INSERT (set to the same instant as created_at/completed_at) so duration queries don't see NULL and "started but not completed" filters don't misclassify these rows. Fixed in both backends. - Add doc comments on both impls warning about row growth per dispatch call. Deleting rows would violate "LLM data is never deleted" (CLAUDE.md); if listing-query performance becomes a concern, prefer a partial index (WHERE category != 'system') over deletion. ### Lib test repair - src/channels/web/server.rs: extensions_setup_submit_handler Err branch now sets resp.activated = Some(false) so clients and the regression test see an explicit `false` rather than `null`. Also rename the test's fake channel to snake_case (test_failing_channel) so it matches the canonicalize-extension-names behavior from PR #2129 — previously the test was passing a dashed name and getting "Capabilities file not found" instead of the intended activation failure. ## Not addressed (false positive / deferred) - dispatch.rs:177 output_raw/output_sanitized swap — verified against ActionRecord::succeed(Option<String>, Value, Duration) and the worker's call site at job.rs:704; argument order is correct. - settings_adapter.rs:186 TOCTOU window — author self-classified as "Low / completeness" and no other code path writes to .system/settings/** without going through write_to_workspace. - schema.rs recompilation caching — deferred per earlier review. ## Quality gate - cargo fmt - cargo clippy --all --benches --tests --examples --all-features zero warnings - cargo test --all-features --lib: 4387 passed, 0 failed, 3 ignored Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(pr-2049): address third round of review feedback Addresses unresolved comments from serrrfirat's "Paranoid Architect Review" and Copilot's third pass on the engine-state migration. ## src/workspace/settings_adapter.rs ### HIGH — Cross-tenant data leak through owner-scoped Workspace `Workspace` is constructed for a single user_id at AppBuilder time. Without gating, `set_setting("user_B", key, val)` would dual-write into the **owner's** workspace, and a subsequent `user_A.get_setting(...)` would return user_B's value: a real cross-user data leak. Fix: - Add `gate_user_id` field set to `workspace.user_id()` at construction. - All `SettingsStore` methods that touch the workspace now check `workspace_allowed_for(user_id)` first; non-owner callers fall through to the legacy table only — preserving their pre-#2049 behavior. - This matches the long-term plan: per-user settings live in the legacy table until a per-user `WorkspaceSettingsAdapter` (one per WorkspacePool entry) is wired up; admin/global settings go through the workspace-backed path so they pick up schema validation. Regression test: `workspace_settings_are_owner_gated_in_multi_tenant_mode` asserts (a) owner's workspace doc is not overwritten by a non-owner write, (b) each user reads back their own legacy value, and (c) a non-owner with no legacy entry must NOT see the owner's workspace value bleeding through. ### MEDIUM — Dual-write order Reverse `set_setting` and `set_all_settings` to write legacy first, workspace second. The legacy table is the source of truth during migration (it backs aggregate `list_settings` reads), so writing it first guarantees those readers always see a consistent value even if the workspace write fails. Failed workspace writes are self-healing on the next per-key read-miss. ### MEDIUM — `ensure_system_config_lazy` double-execution race Replace the manual `get()`/`set()` pattern with `OnceCell::get_or_try_init`. Two concurrent first-callers no longer both run `ensure_system_config()`. Functionally equivalent (idempotent either way) but no longer wasteful. ## src/bridge/store_adapter.rs ### MEDIUM — Migration drops document metadata (S3) `migrate_legacy_engine_paths` previously copied only `doc.content`, silently dropping the `metadata` column. Now calls `ws.update_metadata(new_doc.id, &doc.metadata)` after each write to preserve schema/skip_indexing/hygiene flags. Logged-not-fatal: content has already been moved, metadata loss is recoverable. Regression test: `migration_preserves_document_metadata` seeds a doc with custom metadata and asserts it survives the rewrite. ### MEDIUM — `ws.exists()` swallowed transient errors (Copilot) `unwrap_or(false)` on the existence check could cause the migrator to overwrite an existing `.system/engine/...` doc when storage hiccups. Now propagates the error (counts as failed step + `continue`), per Copilot's exact suggested patch. ### LOW — `list_all()` runs every startup (Copilot) Add a cheap preflight: `ws.list("engine")` first; only fall through to the recursive `list_all()` discovery when the directory listing returns at least one entry. Steady-state startups (post-migration) skip the full workspace scan entirely. Regression test: `migration_preflight_skips_full_scan_when_no_legacy_paths` asserts unrelated and already-migrated documents are untouched. ### MEDIUM — Counter undercount on `already_present` (S5) When `already_present` is true the legacy duplicate is still deleted, but the previous code skipped the `migrated += 1` increment, undercounting in debug logs. Fixed: `migrated` now counts every successful path migration including the already-present case. ### Documented — Version-history loss is acceptable scope (C1) Read-write-delete pattern means `memory_document_versions.document_id ON DELETE CASCADE` drops the legacy doc's version chain. Documented in the function-level doc comment as intentional + bounded: - v2 engine state is runtime state (rewritten on every mutation), not user-curated data - v2 was newly introduced in this PR — no production deployment with pre-existing curated history at risk - A path-preserving rename op would need new trait methods on both backends; out of scope for fix-forward. If a future caller needs history-preserving rename, it should be added to the storage layer properly, not bolted onto migration. ## Quality gate - cargo fmt - cargo clippy --all --benches --tests --examples --all-features zero warnings - cargo test --all-features --lib: 4390 passed, 0 failed, 3 ignored (+3 new tests on top of round 2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(pr-2049): address fourth round of review feedback Two latent issues flagged by serrrfirat in the latest review pass: 1. **Null schema permanently locks documents** (`src/workspace/schema.rs`). `serde_json` deserializes a metadata field of `"schema": null` as `Some(Value::Null)`, not `None`, so the upstream `if let Some(schema) = &metadata.schema` check passes through to `validate_content_against_schema`. There, `validator_for(Value::Null)` errors out and every subsequent write to that document is blocked — a latent DoS. Added an explicit `schema.is_null()` early-return guard at the top of the validator, plus a regression test (`null_schema_is_treated_as_no_op`) that asserts even non-JSON content passes when the schema is null. 2. **System job titles were raw source labels** (`src/history/store.rs`, `src/db/libsql/jobs.rs`). `create_system_job` set `title = source`, so any UI rendering `agent_jobs.title` would display dispatched system jobs as `channel:gateway` / `system` / etc. instead of a human-readable label. Both PostgreSQL and libSQL backends now write `format!("System: {source}")`. Updated the two dispatch integration tests that pinned the old format. Schema-recompilation comment (`schema.rs:47`) was acknowledged as "acceptable for now" by the reviewer; existing NOTE in the source already documents the caching trade-off and upgrade path, so no code change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(pr-2049): address fifth round of review feedback Eight comments from Copilot + serrrfirat. Real fixes for the load-bearing gaps; doc clarifications for the rest where the existing behavior is intentional. **Real code changes** - `src/tools/dispatch.rs` — enforce `tool.parameters_schema()` (JSON Schema) in the dispatch path. Previously the SafetyLayer validator only checked for injection patterns; channel/CLI/routine callers could pass arbitrary shapes and only discover the mismatch (or worse, silently malformed behavior) inside the tool itself. Now we run `jsonschema::validate(&tool.parameters_schema(), &normalized_params)` after the injection check, with a permissive-empty-schema fast path so tools that haven't yet declared a schema aren't penalised. Regression test `dispatch_rejects_params_violating_tool_schema` asserts a required-field violation is rejected before the tool is invoked. - `src/workspace/settings_adapter.rs` — `write_to_workspace` now calls `schema_for_key(key)` once and reuses the resolved schema for both pre-write validation and post-write metadata persistence (was called twice). Eliminates duplicate work and removes a theoretical divergence window if the schema registry ever became non-deterministic. - `src/workspace/settings_adapter.rs` — `ensure_system_config` now also rewrites the `.config` document content when its metadata is repaired, not just the metadata column. The metadata column is the inheritance source of truth, but having the doc's content silently diverge from it confuses anyone reading the doc directly to understand which inherited flags are active. - `src/error.rs` + `src/workspace/settings_schemas.rs` — new `WorkspaceError::InvalidPath { path, reason }` variant. Path/key rejection (path-traversal, character set, length) now surfaces as `InvalidPath`, not `SchemaValidation` — callers and downstream UIs can distinguish "your settings *key* has bad characters" from "your settings *value* failed JSON-Schema validation" without string-matching error messages. `validate_settings_key` returns the new variant; the one match site in `settings_adapter.rs::write_to_workspace` is updated. Regression test `validate_settings_key_returns_invalid_path_variant`. **Documentation-only fixes** - `src/tools/dispatch.rs` — clarify in the `dispatch()` doc-comment that `sanitize_tool_output` runs only against the persisted ActionRecord payload, NOT against the value returned to the caller. This mirrors `Worker::execute_tool` (the agent loop also receives the raw output so reasoning can be reproduced from history). Channels that forward dispatcher output to end users must run their own boundary sanitization at the channel edge. - `src/history/store.rs` + `src/db/libsql/jobs.rs` — `create_system_job` doc updated to explicitly state that system job timestamps do NOT reflect tool execution time (the row is INSERTed before the tool runs, with all three timestamps pinned to "now"). Consumers that need execution duration must read `job_actions.duration_ms` for the associated action rows. Restructuring to a two-phase INSERT+UPDATE was rejected: the audit row must be durable even if the dispatcher panics mid-tool, and the second write would double per-dispatch DB cost. - `src/workspace/schema.rs` — added baseline regression test `moderately_complex_schema_compiles_within_budget` that pins schema compile + validate latency for a moderately deep nested schema at <500ms wall-clock. Guards against orders-of-magnitude regressions from a future `jsonschema` upgrade or accidentally pathological schema construction. Hard limits on schema complexity are deferred (the real defense today is keeping schema-bearing paths under `.system/`, which is system-controlled). **Acknowledged, no change** - libSQL `create_system_job` unbounded row growth — already documented as intentional in the existing comment block, with the mitigation path spelled out (partial index on `WHERE category != 'system'` for listing queries). Rate-limiting dispatch would silently drop user-initiated actions, which is worse than unbounded retention. The "LLM data is never deleted" rule (CLAUDE.md) explicitly applies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8 tasks
ilblackdragon
added a commit
that referenced
this pull request
Apr 9, 2026
- default.py: hoist `CHARS_PER_TOKEN` and `MESSAGE_OVERHEAD_CHARS` above
the `estimate_context_tokens` definition (and the `FINAL(result)`
entry-point call). The constants were previously defined at lines
846-847 — *after* the entry point — so any execution path that ran
`compact_if_needed → estimate_context_tokens` would NameError. The
bug was latent because `enable_compaction` defaults to false in
CI; this commit removes the dead-zone landmine.
- ironclaw_skills validation: add `validate_skill_version` enforcing a
semver-ish character class (`[a-zA-Z0-9._\-+~]{1,32}`) and reject
hostile values at parse time. Wired into `parser.rs` via a new
`SkillParseError::InvalidVersion` variant. Closes the XML attribute
injection vector through `format_skills` in default.py, which
interpolates `version` directly into `<skill version="...">` and
was the only field without character-class validation.
Regression test: `test_parser_rejects_xml_breakout_in_version`.
- orchestrator handle_execute_action + handle_execute_actions_parallel:
replace the non-atomic `find_lease_for_action` + `consume_use` pair
with the atomic `find_and_consume` after the policy check passes.
Mirrors `structured.rs::execute_action_batch_with_results` and
closes the TOCTOU window where two concurrent calls could each
observe a one-use lease and both proceed to execute.
- skill_credential_injection: add the three missing security tests
(#6 LLM auth header rejection on credentialed host, #7 non-auth
header passthrough on credentialed host, #8 LLM auth header
passthrough on unregistered host) that were listed in the file
doc comment but never implemented. Tests drive `HttpTool::execute`
directly so they exercise the actual rejection branch in
`http.rs:503-520`, not just the upstream `requires_approval` gate.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
serrrfirat
added a commit
that referenced
this pull request
Apr 13, 2026
Critical fixes: - Use DB-first config system for MissionsConfig instead of raw std::env::var in router.rs (issue #1) - SessionSummaryHook now uses thread_ids from HookEvent::SessionEnd to summarize the correct conversation instead of guessing via recency; falls back to most-recent for backward compatibility (#2) - Add per-user rate limiter (10/min, 60/hr) and 15s timeout on reasoning LLM calls in MemorySearchTool to prevent unbounded usage (#3) Test coverage: - Caller-level tests for reasoning-augmented recall (LLM wiring, disabled config, and failure fallback paths) (#4) - SessionSummaryHook LLM failure path test confirming fail-open behavior (#5) - reasoning_enabled config field tests (default, env, DB override) (#6) - MissionSettings and SearchSettings round-trip assertions in comprehensive_db_map_round_trip (#11) Convention fixes: - Remove double env-var parsing in MissionsConfig::resolve (#7) - Use ChatMessage::system()/user() constructors in SessionSummaryHook (#8) - Add TODO comments for inline prompt strings (#9) - Add timeout on reasoning LLM call (#10) CI fixes: - Remove 4 stale wasmtime advisory entries from deny.toml - Add RUSTSEC-2026-0097 (rand 0.8.5) to advisory ignore list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 14, 2026
pranavraja99
added a commit
that referenced
this pull request
Apr 14, 2026
- Deduplicate host_matches_pattern: single source in secrets/types.rs, delete copies in credential_injector.rs and policy.rs - Make WASM wrappers path-aware: add path_patterns to ResolvedHostCredential and check in inject_host_credentials - Restore optional field on CredentialMapping (dropped during rebase) - Add path_patterns to CredentialMappingSchema and SkillCredentialSpec - Add test for path-scoped WASM credential injection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
bundled.rsmodule that embeds the Telegram WASM channel binary, allowing it to be installed without downloading externallyUrlPathvariant for credential injection, supporting URL placeholder replacement (e.g.,{TELEGRAM_BOT_TOKEN}in API URLs)Changes
src/channels/wasm/bundled.rs— New module:bundled_channel_names(),install_bundled_channel()withinclude_bytes!for telegram WASM + capabilitiessrc/setup/wizard.rs— Refactored channel selection to merge discovered + bundled channels, auto-install selected bundled channels, and run migrations before channel setupsrc/tools/wasm/capabilities_schema.rs— AddedUrlPathcredential location variant with JSON parsing supportsrc/tools/wasm/credential_injector.rs— HandleUrlPathin injection (deferred to channel/tool wrappers)src/secrets/types.rs— AddedUrlPathvariant toCredentialLocationenumTest plan
cargo testpasses for new tests inbundled.rs,wizard.rs, andcapabilities_schema.rs🤖 Generated with Claude Code