Skip to content

fix(trust-gateway): add 1MB body size limit to readBody#1793

Closed
topcoder1 wants to merge 419 commits into
nanocoai:mainfrom
topcoder1:claude/unruffled-euclid
Closed

fix(trust-gateway): add 1MB body size limit to readBody#1793
topcoder1 wants to merge 419 commits into
nanocoai:mainfrom
topcoder1:claude/unruffled-euclid

Conversation

@topcoder1
Copy link
Copy Markdown

Summary

  • Adds a 1MB size cap to the unbounded readBody() function in the trust gateway HTTP server
  • Returns 413 (Payload Too Large) when exceeded, preventing potential DoS via oversized request bodies
  • Verified eventBus emit keys already align with event .type fields — no divergence

Test plan

  • Send a POST to /trust/evaluate with a body > 1MB and verify 413 response
  • Send normal requests and verify they still work
  • Pre-existing build errors are unrelated (optional deps: playwright, ai-sdk, pixelmatch)

🤖 Generated with Claude Code

Jonathan Zhang and others added 30 commits April 10, 2026 23:01
SSE trigger dispatches to Telegram JID instead of WhatsApp main.
Superpilot's own Telegram escalation disabled (NanoClaw replaces it).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ompts

The IPC email_trigger handler was calling sendMessage() which sent the
raw instruction prompt to Telegram. Now it enqueues a container agent
task via the group queue, which processes the emails and sends clean
proposals back to Telegram.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…containers

The container-runner was checking process.env for these tokens, but
readEnvFile() deliberately does NOT populate process.env. This meant
tokens defined only in .env never reached containers. Now uses
readEnvFile() with process.env fallback, matching how discord.ts
resolves the token.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Blocks agent invocations and scheduled tasks when estimated daily
spend exceeds DAILY_BUDGET_USD. Tasks compute their next run time
so they resume the following day.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ject/sender

Three critical bugs:
1. Container agents couldn't write to processed_items (store/ was read-only).
   Added read-write overlay mount for store/ directory.
2. IPC sendMessage wasn't stripping <internal> tags before sending to
   Telegram. Added formatOutbound() to IPC and email trigger paths.
3. Empty From/Subject in trigger prompts now show 'unknown sender' and
   '(no subject)' instead of blank strings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… JID

Two bugs:
1. DISCORD_BOT_TOKEN and NANOCLAW_SERVICE_TOKEN were only injected into
   isMain containers (whatsapp_main). Telegram/Discord containers had
   no tokens. Moved injection out of the isMain guard.
2. Email triggers ran on whatsapp_main but results forwarded to Telegram.
   When user replied on Telegram, it went to a different container with
   no context about the proposal. Now triggers run on the Telegram JID
   directly, so user replies stay in the same session.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ck real SDK cost

- container-runner: if container exits non-zero AFTER streaming a successful
  result, treat as post-response cleanup (OOM/runtime kill/SDK teardown) and
  resolve as success. Previously a code-137 reap surfaced a bogus
  "Email intelligence trigger failed" Telegram message to the user even
  though the agent's real reply had already been delivered.

- Cost tracking: pipe real total_cost_usd + usage + num_turns from the SDK's
  result message through ContainerOutput → runAgent/runTask → logSessionCost.
  Falls back to the old time-based estimate only when the container exits
  without ever emitting a cost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sion

Three related improvements driven by live bugs seen today:

1. **Email-trigger close-delay (src/index.ts)**: email triggers are
   single-shot (agent replies, we're done), but previously used the
   30-min idle keep-alive path, leaving the container alive long after
   the result was delivered. This wasted wall-clock time AND gave
   external reapers (Docker Desktop OOM, etc.) an 18-min window to
   SIGKILL the container with code 137. Now mirrors task-scheduler:
   close the stdin 10s after the first result, triggering a clean
   _close-sentinel exit. Pairs with the earlier fix #1 that prevents
   post-streaming non-zero exits from surfacing as user errors.

2. **Hallucination guard for briefings (groups/main/CLAUDE.md)**: adds
   an "Evidence discipline" section enforcing quote-don't-paraphrase
   for load-bearing claims, distinguishing recommendations from
   completed actions, and preferring underclaim over overclaim. Driven
   by today's OVH briefing where the agent turned Dmitrii's
   cancellation *recommendation* + Yacine's ambiguous "I confirm" into
   "team confirmed cancellation of all OVH servers" — the cancellation
   had NOT been executed and Jonathan had to re-ask Dmitrii to cancel
   in the same thread.

3. **Test-fixture suppression (src/email-sse.ts + CLAUDE.md step 2)**:
   drops SSE triggers with thread_ids matching /^test-approval[-_]/i
   at the edge, so dev-harness fixtures stop waking the agent for work
   it can't complete. Defense in depth: CLAUDE.md also tells the agent
   to skip these if any slip through.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. **System-injected acknowledgment (src/index.ts)**: email triggers
   routinely take 30–90s. Previously the user saw nothing between the
   SSE event arriving and the first agent result. Now sends a "⏳ New
   email(s) — processing now…" message immediately when the trigger is
   enqueued, so the user has instant confirmation that work started.
   Pairs with the existing typing indicator and the agent's own
   in-flight messages.

2. **container-runner.test mock fix (src/container-runner.test.ts)**:
   adds SUPERPILOT_MCP_URL and SUPERPILOT_API_URL to the config.js vi
   mock. Three pre-existing failures were blocking the test suite from
   green — unrelated to any recent source change, just mock drift from
   when those constants were added to container-runner's -e args.

Full suite: 401/401 passing (was 398/401 with pre-existing failures).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upgrades the email-trigger "⏳ working" acknowledgment from a static
message to a live status line that narrates what the agent is currently
doing. On Telegram the single ack message is edited in place (via
editMessageText) whenever the agent invokes a tool, then deleted before
the final answer lands so the chat ends with a single clean reply.

Flow:
- agent-runner detects tool_use blocks in assistant messages and emits
  a new `progressLabel` field in ContainerOutput (pretty-printed tool
  name, e.g. "Reading Gmail thread", "Generating reply").
- container-runner passes progressLabel through via onOutput as part of
  the existing ContainerOutput type.
- host (enqueueEmailTrigger) calls the new Channel.sendProgress() to
  get a handle, then updates the handle on every progressLabel event
  and clears it before the final onResult.
- types.ts introduces ProgressHandle { update, clear } and
  Channel.sendProgress?() — optional, channels that don't support
  edit-in-place fall back to sendMessage append-only behavior.
- Telegram channel implements sendProgress using bot.api.editMessageText
  and bot.api.deleteMessage. Discord/WhatsApp/Gmail remain append-only
  (no implementation required — the optional method is undefined).

Errors in the progress path are swallowed to debug logs so a failing
edit never blocks real agent work.

Tests: 401/401 pass — no existing tests touch the progress path, and
the new code is all optional/fallback-safe.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three bugs in the Discord digest pipeline that manifested as the 1:45 PM
briefing claim "Digest unavailable — bot token not configured" (an
invented reason, not a real diagnostic):

1. **--output-only was unimplemented.** The morning-briefing skill
   passed --output-only expecting a preview-only mode, but the script
   ignored sys.argv entirely and always called send_dm(), so every
   briefing run accidentally double-posted the digest to Discord.
   Now supports the flag properly: with it, the digest is emitted to
   stdout only and no DM is sent.

2. **Errors were swallowed.** The skill redirected 2>/dev/null and
   used `|| echo "Discord digest unavailable"` as a bare fallback,
   which left the briefing agent with no real error context — it then
   invented plausible reasons like "bot token not configured" (same
   hallucination class as today's OVH incident). Script now emits a
   DIGEST-ERROR: prefix with the literal failure reason on both stdout
   and stderr, and the skill tells the agent to quote that error
   verbatim instead of inventing one.

3. **No startup diagnostic.** Added explicit token check that prints
   "DIGEST-ERROR: DISCORD_BOT_TOKEN environment variable is not set
   in the container" when missing, plus exception handling around
   build_digest/send_dm that surfaces the real error type and message.

Skill update (container/skills/morning-briefing/SKILL.md): documents
the new flag, removes the 2>/dev/null swallow, and references the
Evidence discipline rules in CLAUDE.md to enforce quote-not-invent
behavior for digest errors specifically.

Verified manually:
- Without token: exit 2, "DIGEST-ERROR: DISCORD_BOT_TOKEN environment
  variable is not set in the container" on both streams
- With token + --output-only: digest on stdout, NO DM posted, exit 0
- With token + no flag: preserves the old DM-posting behavior for
  the scheduled-task code path

The existing commit 81280ca fixed token injection into containers;
this one fixes the diagnostic-and-no-double-post gap that was hiding
whether injection was actually working.

Tests: 401/401 pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10-task implementation plan covering:
- Restart stuck container (immediate unblock)
- Pre-emptive OAuth token refresh script + TS wrapper + tests
- Wire refresh into email-trigger and scheduled-task spawn paths
- Mount hardening (only mount Gmail dirs with credentials.json)
- GMAIL-DEGRADED skill rule + Evidence discipline rule #7
- Diagnostic script + operator runbook
- Final verification including tomorrow morning briefing rubric
Pre-emptively refreshes Google OAuth access_tokens for all Gmail accounts
before each email-intelligence container spawn. Skips accounts that don't
need refresh (>5 min until expiry). Atomic write prevents partial-write
corruption. Exit codes distinguish missing accounts (expected) from real
refresh errors (token revoked, network, etc.).

Standalone for now — wired into container spawn paths in next commits.
…ve 0600 perms, atomic+race-safe write

Four code-quality fixes from review:

1. Persist rotated refresh_token if Google returns a new one. Without this,
   a Google-side rotation discards the new token and breaks future refreshes.

2. Preserve original credentials.json mode (0600) instead of inheriting the
   tmp file's umask-default 0644. OAuth credentials should not be world-readable.

3. Wrap tmp-write+rename in try/finally so a crash between the write and the
   replace doesn't leak credentials.json.tmp on disk.

4. Use PID-suffixed tmp filenames so two concurrent invocations (e.g., an
   email-trigger spawn racing a scheduled task) cannot stomp each other's
   tmp file.
Wraps scripts/refresh-gmail-tokens.py with a typed Promise interface.
Never throws — all failure modes (script crash, timeout, missing accounts,
real refresh errors) collapse to a structured GmailRefreshResult so
callers can decide whether to spawn the agent anyway.

Tests cover: ok exit, missing exit (code 2), error exit (code 3),
script-crash exit, and timeout. 5 new tests, full suite green.
Inserts a refreshGmailTokens() call between the group lookup and the
progress-message ack inside enqueueEmailTrigger. Token refresh is fast
(<200ms when nothing needs refresh), never blocks the spawn, and on
failure logs a warning so the operator can see why the agent degraded.

Addresses the symptom Jonathan reported (3 emails unable to process due
to Gmail MCP unavailability over a 90-minute window) by preventing the
mid-session OAuth token expiry that triggers the gmail-mcp drop.
Mirrors the email-trigger refresh added in the previous commit. The
morning briefing runs for >30 min and was the canonical victim of the
Gmail MCP mid-session drop — refreshing right before the spawn keeps
tokens valid for the entire briefing window.
…tus debug

Reviewer nits from Task 5: add group field to the warn context for easier
log correlation, and mirror Task 4's symmetry by emitting a debug log
for the 'missing' status (helps when scheduled-task gmail accounts are
silently unauthorized at 7:30am).
Two related Gmail-MCP reliability fixes:

1. container-runner now requires credentials.json (not just gcp-oauth.keys.json)
   to mount a Gmail account dir into the container. Mounting a dir with only
   the OAuth client config but no granted token causes the gmail-mcp to
   attempt auth flows mid-session and fail confusingly. Adds .gmail-mcp-dev
   to the candidate list while we're at it.

2. agent-runner now passes explicit GMAIL_MCP_HOME and HOME env vars to the
   gmail-mcp launch, making credential discovery deterministic instead of
   relying on the npx-spawned process inheriting the right HOME.
Jonathan Zhang and others added 24 commits April 15, 2026 07:24
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Widen ItemClassifiedEvent source type to accept 'sse-classifier'.
Record delegation counter on user-approved guardrailed actions so the
guardrail can graduate to auto-approval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Also exclude test files from tsc build to fix rootDir errors when
tests import from container/agent-runner/src/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use maxOutputTokens instead of deprecated maxTokens, add explicit
LanguageModel return type to resolveUtilityModel.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Bump ai dependency from ^4.3.0 to ^6.0.0 in container package.json
- Add --legacy-peer-deps to Dockerfile npm install for zod v3/v4 compat

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- CRITICAL: Add trust gateway check to Vercel tool bridge write-class
  tools (send_message, schedule, relay_message). Fails open if gateway
  unreachable.
- HIGH: Use script-enriched prompt variable in Vercel runner dispatch
  instead of raw containerInput.prompt
- HIGH: Fix embedText() to use correct textEmbeddingModel() API
- MEDIUM: Wire getEscalationModel() fallback so auto-escalation works
  for groups without explicit escalationModel config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
)

Adds support for OpenAI, Google, Ollama, Groq, and Together providers
via Vercel AI SDK while preserving claude-agent-sdk for Anthropic models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sage fields)

QA testing revealed vercel-runner.ts used v5 API names against ai v6:
- CoreMessage → ModelMessage
- maxSteps → stopWhen: stepCountIs(50)
- usage.promptTokens → usage.inputTokens
- usage.completionTokens → usage.outputTokens
- @ai-sdk/openai and @ai-sdk/google bumped from ^1.x to ^3.x
- Regenerated package-lock.json to pin ai v6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers MCP tool bridging, trust gateway pending flow, switch_model IPC,
and session message structure preservation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix container vercel-runner for ai v6 API changes
- Add deferred items implementation plan

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stop flattening tool_use/tool_result content parts to strings when
saving sessions. Widen SessionMessage.content to string | unknown[]
and pass messages through without transformation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the old checkTrust function with checkTrustWithPolling that
handles the full pending approval flow: polls GET /trust/approval/:id
until approved/denied/timeout instead of treating "pending" as denial.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows agents to dynamically switch their LLM provider and model
mid-conversation via IPC. Non-main groups can only switch their own
model; the main group can switch any group's model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Exposes switch_model as an IPC tool agents can call to escalate or
downgrade their LLM provider/model for the next message. Also adds
a human-readable label in the Vercel runner progress display.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds mcp-bridge module that builds and connects MCP server configs
(nanoclaw IPC, Gmail, Notion, SuperPilot) for non-Claude agents using
the @ai-sdk/mcp client. Includes tests covering all registration paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Connects MCP servers (nanoclaw IPC, Gmail, Notion, SuperPilot) into
the Vercel AI SDK generateText call, giving non-Claude agents access
to the same MCP tools as the Claude SDK path. MCP connections are
cleaned up after each query.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hoist mcpConnection variable above try block so it's accessible in
catch. Prevents leaked child processes when generateText throws.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
AI SDK v6 generateText accesses tool.inputSchema, not tool.parameters.
Using parameters caused asSchema(undefined) which produced an empty
JSON Schema, rejected by OpenAI as 'type: "None"'.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ry usage

readBody() now tracks accumulated bytes and destroys the request with a
413 response when the body exceeds 1MB, preventing potential DoS via
oversized payloads.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@topcoder1
Copy link
Copy Markdown
Author

Opened against upstream by mistake — re-targeting to fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant