fix(trust-gateway): add 1MB body size limit to readBody#1793
Closed
topcoder1 wants to merge 419 commits into
Closed
fix(trust-gateway): add 1MB body size limit to readBody#1793topcoder1 wants to merge 419 commits into
topcoder1 wants to merge 419 commits into
Conversation
SSE trigger dispatches to Telegram JID instead of WhatsApp main. Superpilot's own Telegram escalation disabled (NanoClaw replaces it). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ompts The IPC email_trigger handler was calling sendMessage() which sent the raw instruction prompt to Telegram. Now it enqueues a container agent task via the group queue, which processes the emails and sends clean proposals back to Telegram. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…containers The container-runner was checking process.env for these tokens, but readEnvFile() deliberately does NOT populate process.env. This meant tokens defined only in .env never reached containers. Now uses readEnvFile() with process.env fallback, matching how discord.ts resolves the token. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Blocks agent invocations and scheduled tasks when estimated daily spend exceeds DAILY_BUDGET_USD. Tasks compute their next run time so they resume the following day. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ject/sender Three critical bugs: 1. Container agents couldn't write to processed_items (store/ was read-only). Added read-write overlay mount for store/ directory. 2. IPC sendMessage wasn't stripping <internal> tags before sending to Telegram. Added formatOutbound() to IPC and email trigger paths. 3. Empty From/Subject in trigger prompts now show 'unknown sender' and '(no subject)' instead of blank strings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… JID Two bugs: 1. DISCORD_BOT_TOKEN and NANOCLAW_SERVICE_TOKEN were only injected into isMain containers (whatsapp_main). Telegram/Discord containers had no tokens. Moved injection out of the isMain guard. 2. Email triggers ran on whatsapp_main but results forwarded to Telegram. When user replied on Telegram, it went to a different container with no context about the proposal. Now triggers run on the Telegram JID directly, so user replies stay in the same session. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ck real SDK cost - container-runner: if container exits non-zero AFTER streaming a successful result, treat as post-response cleanup (OOM/runtime kill/SDK teardown) and resolve as success. Previously a code-137 reap surfaced a bogus "Email intelligence trigger failed" Telegram message to the user even though the agent's real reply had already been delivered. - Cost tracking: pipe real total_cost_usd + usage + num_turns from the SDK's result message through ContainerOutput → runAgent/runTask → logSessionCost. Falls back to the old time-based estimate only when the container exits without ever emitting a cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sion Three related improvements driven by live bugs seen today: 1. **Email-trigger close-delay (src/index.ts)**: email triggers are single-shot (agent replies, we're done), but previously used the 30-min idle keep-alive path, leaving the container alive long after the result was delivered. This wasted wall-clock time AND gave external reapers (Docker Desktop OOM, etc.) an 18-min window to SIGKILL the container with code 137. Now mirrors task-scheduler: close the stdin 10s after the first result, triggering a clean _close-sentinel exit. Pairs with the earlier fix #1 that prevents post-streaming non-zero exits from surfacing as user errors. 2. **Hallucination guard for briefings (groups/main/CLAUDE.md)**: adds an "Evidence discipline" section enforcing quote-don't-paraphrase for load-bearing claims, distinguishing recommendations from completed actions, and preferring underclaim over overclaim. Driven by today's OVH briefing where the agent turned Dmitrii's cancellation *recommendation* + Yacine's ambiguous "I confirm" into "team confirmed cancellation of all OVH servers" — the cancellation had NOT been executed and Jonathan had to re-ask Dmitrii to cancel in the same thread. 3. **Test-fixture suppression (src/email-sse.ts + CLAUDE.md step 2)**: drops SSE triggers with thread_ids matching /^test-approval[-_]/i at the edge, so dev-harness fixtures stop waking the agent for work it can't complete. Defense in depth: CLAUDE.md also tells the agent to skip these if any slip through. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. **System-injected acknowledgment (src/index.ts)**: email triggers routinely take 30–90s. Previously the user saw nothing between the SSE event arriving and the first agent result. Now sends a "⏳ New email(s) — processing now…" message immediately when the trigger is enqueued, so the user has instant confirmation that work started. Pairs with the existing typing indicator and the agent's own in-flight messages. 2. **container-runner.test mock fix (src/container-runner.test.ts)**: adds SUPERPILOT_MCP_URL and SUPERPILOT_API_URL to the config.js vi mock. Three pre-existing failures were blocking the test suite from green — unrelated to any recent source change, just mock drift from when those constants were added to container-runner's -e args. Full suite: 401/401 passing (was 398/401 with pre-existing failures). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upgrades the email-trigger "⏳ working" acknowledgment from a static
message to a live status line that narrates what the agent is currently
doing. On Telegram the single ack message is edited in place (via
editMessageText) whenever the agent invokes a tool, then deleted before
the final answer lands so the chat ends with a single clean reply.
Flow:
- agent-runner detects tool_use blocks in assistant messages and emits
a new `progressLabel` field in ContainerOutput (pretty-printed tool
name, e.g. "Reading Gmail thread", "Generating reply").
- container-runner passes progressLabel through via onOutput as part of
the existing ContainerOutput type.
- host (enqueueEmailTrigger) calls the new Channel.sendProgress() to
get a handle, then updates the handle on every progressLabel event
and clears it before the final onResult.
- types.ts introduces ProgressHandle { update, clear } and
Channel.sendProgress?() — optional, channels that don't support
edit-in-place fall back to sendMessage append-only behavior.
- Telegram channel implements sendProgress using bot.api.editMessageText
and bot.api.deleteMessage. Discord/WhatsApp/Gmail remain append-only
(no implementation required — the optional method is undefined).
Errors in the progress path are swallowed to debug logs so a failing
edit never blocks real agent work.
Tests: 401/401 pass — no existing tests touch the progress path, and
the new code is all optional/fallback-safe.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three bugs in the Discord digest pipeline that manifested as the 1:45 PM briefing claim "Digest unavailable — bot token not configured" (an invented reason, not a real diagnostic): 1. **--output-only was unimplemented.** The morning-briefing skill passed --output-only expecting a preview-only mode, but the script ignored sys.argv entirely and always called send_dm(), so every briefing run accidentally double-posted the digest to Discord. Now supports the flag properly: with it, the digest is emitted to stdout only and no DM is sent. 2. **Errors were swallowed.** The skill redirected 2>/dev/null and used `|| echo "Discord digest unavailable"` as a bare fallback, which left the briefing agent with no real error context — it then invented plausible reasons like "bot token not configured" (same hallucination class as today's OVH incident). Script now emits a DIGEST-ERROR: prefix with the literal failure reason on both stdout and stderr, and the skill tells the agent to quote that error verbatim instead of inventing one. 3. **No startup diagnostic.** Added explicit token check that prints "DIGEST-ERROR: DISCORD_BOT_TOKEN environment variable is not set in the container" when missing, plus exception handling around build_digest/send_dm that surfaces the real error type and message. Skill update (container/skills/morning-briefing/SKILL.md): documents the new flag, removes the 2>/dev/null swallow, and references the Evidence discipline rules in CLAUDE.md to enforce quote-not-invent behavior for digest errors specifically. Verified manually: - Without token: exit 2, "DIGEST-ERROR: DISCORD_BOT_TOKEN environment variable is not set in the container" on both streams - With token + --output-only: digest on stdout, NO DM posted, exit 0 - With token + no flag: preserves the old DM-posting behavior for the scheduled-task code path The existing commit 81280ca fixed token injection into containers; this one fixes the diagnostic-and-no-double-post gap that was hiding whether injection was actually working. Tests: 401/401 pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10-task implementation plan covering: - Restart stuck container (immediate unblock) - Pre-emptive OAuth token refresh script + TS wrapper + tests - Wire refresh into email-trigger and scheduled-task spawn paths - Mount hardening (only mount Gmail dirs with credentials.json) - GMAIL-DEGRADED skill rule + Evidence discipline rule #7 - Diagnostic script + operator runbook - Final verification including tomorrow morning briefing rubric
Pre-emptively refreshes Google OAuth access_tokens for all Gmail accounts before each email-intelligence container spawn. Skips accounts that don't need refresh (>5 min until expiry). Atomic write prevents partial-write corruption. Exit codes distinguish missing accounts (expected) from real refresh errors (token revoked, network, etc.). Standalone for now — wired into container spawn paths in next commits.
…ve 0600 perms, atomic+race-safe write Four code-quality fixes from review: 1. Persist rotated refresh_token if Google returns a new one. Without this, a Google-side rotation discards the new token and breaks future refreshes. 2. Preserve original credentials.json mode (0600) instead of inheriting the tmp file's umask-default 0644. OAuth credentials should not be world-readable. 3. Wrap tmp-write+rename in try/finally so a crash between the write and the replace doesn't leak credentials.json.tmp on disk. 4. Use PID-suffixed tmp filenames so two concurrent invocations (e.g., an email-trigger spawn racing a scheduled task) cannot stomp each other's tmp file.
Wraps scripts/refresh-gmail-tokens.py with a typed Promise interface. Never throws — all failure modes (script crash, timeout, missing accounts, real refresh errors) collapse to a structured GmailRefreshResult so callers can decide whether to spawn the agent anyway. Tests cover: ok exit, missing exit (code 2), error exit (code 3), script-crash exit, and timeout. 5 new tests, full suite green.
Inserts a refreshGmailTokens() call between the group lookup and the progress-message ack inside enqueueEmailTrigger. Token refresh is fast (<200ms when nothing needs refresh), never blocks the spawn, and on failure logs a warning so the operator can see why the agent degraded. Addresses the symptom Jonathan reported (3 emails unable to process due to Gmail MCP unavailability over a 90-minute window) by preventing the mid-session OAuth token expiry that triggers the gmail-mcp drop.
Mirrors the email-trigger refresh added in the previous commit. The morning briefing runs for >30 min and was the canonical victim of the Gmail MCP mid-session drop — refreshing right before the spawn keeps tokens valid for the entire briefing window.
…tus debug Reviewer nits from Task 5: add group field to the warn context for easier log correlation, and mirror Task 4's symmetry by emitting a debug log for the 'missing' status (helps when scheduled-task gmail accounts are silently unauthorized at 7:30am).
Two related Gmail-MCP reliability fixes: 1. container-runner now requires credentials.json (not just gcp-oauth.keys.json) to mount a Gmail account dir into the container. Mounting a dir with only the OAuth client config but no granted token causes the gmail-mcp to attempt auth flows mid-session and fail confusingly. Adds .gmail-mcp-dev to the candidate list while we're at it. 2. agent-runner now passes explicit GMAIL_MCP_HOME and HOME env vars to the gmail-mcp launch, making credential discovery deterministic instead of relying on the npx-spawned process inheriting the right HOME.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Widen ItemClassifiedEvent source type to accept 'sse-classifier'. Record delegation counter on user-approved guardrailed actions so the guardrail can graduate to auto-approval. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Also exclude test files from tsc build to fix rootDir errors when tests import from container/agent-runner/src/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use maxOutputTokens instead of deprecated maxTokens, add explicit LanguageModel return type to resolveUtilityModel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Bump ai dependency from ^4.3.0 to ^6.0.0 in container package.json - Add --legacy-peer-deps to Dockerfile npm install for zod v3/v4 compat Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- CRITICAL: Add trust gateway check to Vercel tool bridge write-class tools (send_message, schedule, relay_message). Fails open if gateway unreachable. - HIGH: Use script-enriched prompt variable in Vercel runner dispatch instead of raw containerInput.prompt - HIGH: Fix embedText() to use correct textEmbeddingModel() API - MEDIUM: Wire getEscalationModel() fallback so auto-escalation works for groups without explicit escalationModel config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sage fields) QA testing revealed vercel-runner.ts used v5 API names against ai v6: - CoreMessage → ModelMessage - maxSteps → stopWhen: stepCountIs(50) - usage.promptTokens → usage.inputTokens - usage.completionTokens → usage.outputTokens - @ai-sdk/openai and @ai-sdk/google bumped from ^1.x to ^3.x - Regenerated package-lock.json to pin ai v6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers MCP tool bridging, trust gateway pending flow, switch_model IPC, and session message structure preservation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix container vercel-runner for ai v6 API changes - Add deferred items implementation plan Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stop flattening tool_use/tool_result content parts to strings when saving sessions. Widen SessionMessage.content to string | unknown[] and pass messages through without transformation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the old checkTrust function with checkTrustWithPolling that handles the full pending approval flow: polls GET /trust/approval/:id until approved/denied/timeout instead of treating "pending" as denial. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows agents to dynamically switch their LLM provider and model mid-conversation via IPC. Non-main groups can only switch their own model; the main group can switch any group's model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Exposes switch_model as an IPC tool agents can call to escalate or downgrade their LLM provider/model for the next message. Also adds a human-readable label in the Vercel runner progress display. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds mcp-bridge module that builds and connects MCP server configs (nanoclaw IPC, Gmail, Notion, SuperPilot) for non-Claude agents using the @ai-sdk/mcp client. Includes tests covering all registration paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Connects MCP servers (nanoclaw IPC, Gmail, Notion, SuperPilot) into the Vercel AI SDK generateText call, giving non-Claude agents access to the same MCP tools as the Claude SDK path. MCP connections are cleaned up after each query. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hoist mcpConnection variable above try block so it's accessible in catch. Prevents leaked child processes when generateText throws. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
AI SDK v6 generateText accesses tool.inputSchema, not tool.parameters. Using parameters caused asSchema(undefined) which produced an empty JSON Schema, rejected by OpenAI as 'type: "None"'. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ry usage readBody() now tracks accumulated bytes and destroys the request with a 413 response when the body exceeds 1MB, preventing potential DoS via oversized payloads. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
Opened against upstream by mistake — re-targeting to fork. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
readBody()function in the trust gateway HTTP server.typefields — no divergenceTest plan
/trust/evaluatewith a body > 1MB and verify 413 response🤖 Generated with Claude Code