Skip to content

fix(container): pre-install Gmail/Notion MCP, log MCP diagnostics#1810

Closed
topcoder1 wants to merge 611 commits into
nanocoai:mainfrom
topcoder1:fix/gmail-mcp-reliability
Closed

fix(container): pre-install Gmail/Notion MCP, log MCP diagnostics#1810
topcoder1 wants to merge 611 commits into
nanocoai:mainfrom
topcoder1:fix/gmail-mcp-reliability

Conversation

@topcoder1
Copy link
Copy Markdown

Summary

  • Bake @gongrzhe/server-gmail-autoauth-mcp and @notionhq/notion-mcp-server into the container image — eliminates npx -y cold-start on every agent turn, which was timing out under load and leaving the SDK running without Gmail tools.
  • Persist MCP-related stderr lines ([mcp-probe], MCP server, gmail) to the container log even on exit 0, so "Gmail tools offline" complaints have forensic evidence instead of a silent discard.

Root cause

Every container run spawned four Gmail MCP servers via npx -y @gongrzhe/server-gmail-autoauth-mcp. Each invocation hit the npm registry for dependency resolution; on flaky network or registry rate-limits the spawn timed out, the SDK initialized without those tools, and the agent reported "Gmail tools offline." Meanwhile the container logs only captured stdout/stderr on non-zero exits — so these failures left no trail.

Test plan

  • nanoclaw-agent:latest rebuilt via ./container/build.sh (14.9s), packages confirmed baked into the global npm prefix.
  • Host npm run build clean; service restarted; new container-runner deployed.
  • Next container turn exercising Gmail should resolve MCP spawn from local cache (sub-second vs. multi-second npm fetch); failures, if any, surface as [mcp-probe] gmail-personal: FAIL … inside a new === MCP Diagnostics === section in groups/*/logs/container-*.log.

🤖 Generated with Claude Code

Jonathan Zhang and others added 30 commits April 15, 2026 05:30
Update telegram.test.ts expectations to match HTML parse_mode change,
add browser module mocks to index.test.ts and routing.test.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds calendar_events (with time-range index), thread_links (composite PK + item index), and idx_tracked_thread on tracked_items. Includes tests for insert/retrieve and upsert/conflict enforcement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add calendar-poller.ts with storeCalendarEvents, getUpcomingEvents,
getEventsInRange, pollCalendar, startCalendarPoller, stopCalendarPoller,
and cleanupOldEvents. Uses INSERT OR REPLACE upsert semantics, parses
flexible time formats (epoch ms/s, ISO string, Google Calendar dateTime
objects), and emits calendar.synced events via the event bus. 9 tests
covering storage, upsert, range queries, and boundary conditions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds thread-correlator module that links TrackedItems to calendar events
via attendee matching and to existing threads via subject normalization
(RE:/FWD: stripping), storing links in thread_links with INSERT OR IGNORE
and emitting thread.correlated events. 11 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements scheduling-advisor with findCalendarGaps, isInMeeting, scoreUrgency,
suggestDeliveryTime, and getNextMeetingIn. All 13 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Items sharing a thread_id are now grouped together in the FYI section
of the digest output. Multi-item threads show a normalized title with
count; single-item threads and threadless items continue using existing
source-grouped format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…duling flow

Exercises the full pipeline: store calendar events, insert TrackedItems,
correlateByAttendee, isInMeeting, scoreUrgency, suggestDeliveryTime, and
PushBuffer hold behavior — both during and outside a meeting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dual-runtime architecture: claude-agent-sdk for Claude, Vercel AI SDK
for all other providers. Covers provider config, utility LLM service,
auto-escalation, IPC/MCP tool bridging, and session persistence.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix normalizeSubject to use while loop for stripping unlimited RE:/FWD:
prefixes. Fix digest-engine test to use valid 'digest' classification
instead of 'fyi'.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers: provider resolution, auto-escalation, session store, IPC tool
bridge, Vercel AI SDK agent runner, host wiring, utility LLM service,
and container rebuild. TDD throughout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduces `LlmConfig` interface in `src/types.ts` and `resolveModel()` in `src/llm/provider.ts` for host-side LLM provider/model selection before container dispatch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…PC classification

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements scoreComplexity() with heuristic signals (message length,
code blocks, keywords, question count, file references) to decide
whether to upgrade to a stronger model before agent dispatch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements saveSession/loadSession for persisting CoreMessage[] arrays
as JSON files, with 100-message trim and UUID-based session ID generation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion

Wire shouldRequireApproval/recordDelegation into handleEvaluate so that
handle_* tools approved by the trust engine are held for user approval
until the delegation counter reaches threshold. Add integration tests
verifying classifyTool mapping and per-class counter independence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nters

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Jonathan Zhang and others added 22 commits April 16, 2026 21:27
Capture startMiniAppServer return to get the PendingSendRegistry, pass
eventBus to the server, call registry.shutdown() before queue.shutdown()
during graceful shutdown, and subscribe to email.draft.send_failed to
notify the main Telegram group with Retry/Open-in-Gmail actions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Email alerts remaining: tracks A + C + D
chore: move smoke scripts to scripts/dev/
Two related bugs caused the Telegram "Want me to forward X?" message to
show no action buttons, and Yes/No clicks (when they did appear) to be
cosmetic — the agent never learned the user's answer.

1. Agent-authored IPC messages bypassed classifyAndFormat. The container's
   send_message / relay_message tools write type:"message" IPC files that
   the host delivered via plain channel.sendMessage, skipping question
   detection and inline-keyboard attachment. Add sendAgentMessage to
   IpcDeps, implemented in index.ts to run formatOutbound →
   classifyAndFormat → sendMessageWithActions (falling back to plain send
   when the channel lacks keyboards or no actions were detected). Both
   the type:"message" handler and relay_message now use it.

2. answer:yes/no callbacks only removed a status-bar item. Wire an
   injectUserReply dep that pipes a synthesized reply ("✅ Yes — proceed."
   / "❌ No — do not proceed.") back into the active container via
   queue.sendMessage, or enqueues a message check if no container is
   running. Also clear the buttons after the click so the user sees the
   answer was registered.

Tests: new cases in callback-router.test.ts cover answer:yes, answer:no,
and answer:defer. ipc-relay.test.ts updated to assert the new
sendAgentMessage call site. email-trigger-pipeline and ipc-auth stubs
extended for the new IpcDeps field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…est matrix

Addresses three gaps uncovered while triaging Telegram bot UX:

1. Retry on Gmail failures. The generic error handler in callback-router
   used to replace the message with "⚠️ expand failed: ..." and strip all
   buttons, leaving the user stuck during a transient outage. For
   expand / archive / confirm_archive failures it now renders
   🔄 Retry + ❌ Dismiss. The new retry:<action>:<entity>[:extra] and
   dismiss_failure:<id> cases re-dispatch the original call or clear the
   keyboard. confirm_archive's existing dedicated retry_archive path is
   unchanged.

2. Email context for agent-authored messages. The container's
   send_message tool now accepts email_id + email_account. The host
   IpcDeps.sendAgentMessage plumbs them through and attaches
   📧 Expand / 🌐 Full Email / 🗄 Archive when provided — the same button
   set email triggers get — so ad-hoc agent messages about a specific
   email (e.g. follow-up questions) carry the same affordances as the
   original notification.

3. Exhaustive callback matrix test. New
   src/__tests__/telegram-callback-matrix.test.ts covers every top-level
   action (archive, confirm_archive, answer yes/no, dismiss, stop,
   unknown), the Gmail-outage retry flow, and guards against retry
   buttons leaking onto non-retryable failures (rsvp). 11 cases, all
   headless — no real bot token needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sage

The send_message IPC tool accepts email_id/email_account (added in the
previous commit), but the agent only uses capabilities it's told about.
Update the three instruction surfaces so this becomes habit:

1. groups/main/CLAUDE.md Communication section: add a rule that every
   email-specific message must carry both fields; omit for batch/general
   chat.
2. groups/main/CLAUDE.md Email Intelligence step 7: when reporting
   results via send_message, pass email_id and email_account so the
   user gets Expand / Full Email / Archive buttons.
3. container/skills/capabilities/SKILL.md send_message bullet: same
   guidance for non-email-intelligence flows.
4. src/ipc.ts email-trigger prompt: add the instruction to the system-
   injected trigger prompt so the agent sees it even if the per-group
   CLAUDE.md drifts.

New test in email-trigger-pipeline.test.ts verifies the generated prompt
contains email_id, email_account, and the three button names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-ups to the previous round of Telegram UX fixes:

#3 Auto-infer email context. Even when the agent forgets to pass
   email_id/email_account, the host scrapes the outgoing text for
   (thread: <id>) and [account] markers and infers context. Instructions
   drift, code doesn't. New src/email-context-inference.ts, plugged into
   sendAgentMessage in index.ts. 9 unit tests cover single-thread,
   multi-thread (returns null), short-match false positives, blocklist
   words like [email]/[internal], and multi-account ambiguity.

#4 Unified retry dispatcher. confirm_archive's failure handler used to
   emit retry_archive:<id> and we had a dedicated case for it. The new
   retry:<action>:<entity>[:extra] dispatcher already covers this, so
   confirm_archive now emits retry:confirm_archive:<id>. retry_archive
   stays as a back-compat alias that re-dispatches through the unified
   path. Single code path for all retries.

#5 Person-name forward. FORWARD_PERSON_PATTERN matches "forward X to
   <2-3 capitalized words>?" (e.g. "Philip Ye") when the email-address
   pattern doesn't. Emits 📨 Forward to <Name> + ❌ No. On click, the
   host injects a reply telling the agent to resolve the name via
   search_contacts and forward — the container already has that tool.
   Single-word names stay ambiguous and fall through to generic Yes/No.

Also adds scripts/dev/smoke-telegram-callbacks.ts — manual live-bot
smoke test that posts real messages through the Bot API to verify
server-side keyboard rendering (HTML escaping, web_app URLs, edit-in-
place, long labels). Not wired into CI.

Host service was restarted to pick up these changes. Container image
rebuild is blocked by a pre-existing broken state in
container/agent-runner (missing @ai-sdk/mcp, ai, @ai-sdk/openai
packages); the new email_id argument to the container send_message
tool will activate when that build is fixed separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
container/agent-runner/src/mcp-bridge.ts and vercel-runner.ts have been
importing @ai-sdk/mcp, @ai-sdk/mcp/mcp-stdio, ai, @ai-sdk/openai, and
@ai-sdk/google since landed, but none were listed in package.json. The
container build has been failing ever since, and src/llm/mcp-bridge.test.ts
was failing with "Cannot find package '@ai-sdk/mcp/mcp-stdio'" on every
CI run.

Add the four missing packages at current latest-major versions:
- @ai-sdk/mcp ^1.0.36
- ai ^6.0.168
- @ai-sdk/openai ^3.0.53
- @ai-sdk/google ^3.0.64

npm install brings in 114 packages; the container typecheck and build
now succeed, and the host test suite goes from 1451/1455 to 1455/1455
passing — the 4 failing tests all rooted in the same missing import.

This also activates the email_id / email_account args added to the
send_message MCP tool last commit — the container now has the updated
binary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… telemetry

Three follow-ups to the Telegram callback UX work:

#3 Host-side contact lookup for forward_person. When the user taps
   📨 Forward to <Name> and the macOS Contacts DB has exactly one email
   for that name, inject the resolved address directly — skipping the
   agent's search_contacts round-trip. On miss / ambiguity / DB not
   available, fall back to delegating the lookup to the agent (which
   has the container-side search_contacts MCP tool). Read-only:
   copy-to-tmp pattern avoids WAL lock contention. 6 unit tests cover
   missing dir, empty query, dedup, ambiguity, non-email rows, and
   sqlite errors.

#4 Yes/No emoji consistency. Every other button in the system uses an
   emoji prefix (📧 Expand, 🌐 Full Email, 🗄 Archive, 📨 Forward,
   🔄 Retry, ❌ Cancel). The generic yes-no question pair was plain
   text "Yes" / "No" / "Let me think...". Now ✅ Yes / ❌ No /
   ⏳ Let me think…. Matches the visual vocabulary of the rest of the
   keyboard.

#5 Telemetry for email context resolution. sendAgentMessage now logs
   whether the Expand/Full Email/Archive buttons came from an
   agent-explicit email_id or from host-side inference. Lets us answer
   "are instructions sticking?" from the log without sampling message
   bodies.

Full test suite: 1462/1462 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eliminates the main source of container-side Gmail flakiness: every
agent turn was cold-starting the Gmail MCP via `npx -y` per account,
which times out under load or flaky network and leaves the SDK
running without Gmail tools.

- Dockerfile: bake @gongrzhe/server-gmail-autoauth-mcp and
  @notionhq/notion-mcp-server into the image so npx resolves from
  local cache instead of the npm registry.
- container-runner.ts: on exit-0 runs, grep stderr for [mcp-probe]
  and MCP/gmail lines and append them as "=== MCP Diagnostics ==="
  in the container log. Previously these were discarded unless the
  container exited non-zero, so "Gmail tools offline" complaints
  had no forensic trail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gavrielc pushed a commit that referenced this pull request Apr 24, 2026
Adds /add-gmail-tool — a Utility skill that installs Gmail as an MCP tool
in NanoClaw v2 using OneCLI for credential injection. No raw OAuth tokens
ever reach the container; the gateway swaps the "onecli-managed" stub
bearer for the real token at request time.

Scope (3 files):
- container/Dockerfile: pnpm global-install of
  @gongrzhe/server-gmail-autoauth-mcp@1.1.11, pinned behind GMAIL_MCP_VERSION.
  Also pins zod-to-json-schema@3.22.5 to avoid an ERR_PACKAGE_PATH_NOT_EXPORTED
  crash: the MCP server's loose zod range resolves zod@3.24.x while
  zod-to-json-schema@3.25.x imports the zod/v3 subpath that only exists in
  zod>=3.25.
- container/agent-runner/src/providers/claude.ts: adds 'mcp__gmail__*' to
  TOOL_ALLOWLIST so the agent can invoke the server's tools.
- .claude/skills/add-gmail-tool/SKILL.md: pre-flight checks (OneCLI Gmail app
  connected, stubs present, mount allowlist covers ~/.gmail-mcp, agent
  secret-mode), per-group wiring in container.json (mount + mcpServers),
  verification steps, troubleshooting, removal instructions. Credits to
  gongrzhe for the MCP server and the add-atomic-chat-tool / add-vercel
  skill patterns.

Addresses #1500 (proxy Gmail OAuth through credential proxy) on the Gmail
side. Overlaps in intent with #1810 but stays surgical — no bundled
unrelated changes.

Tested end-to-end on Linux/Docker: CLI and WhatsApp self-chat agents can
list labels, search/read/send mail via OneCLI-injected tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gavrielc
Copy link
Copy Markdown
Collaborator

gavrielc commented May 3, 2026

Closing this one. It looks like a fair amount of personal install state (per group CLAUDE.md files, fork specific tooling, local config, etc.) got mixed in with the actual change you wanted to contribute, which makes it hard to review or merge cleanly.

If there is a focused change you would still like to land, please open a new PR with just that diff and we will take a look. Thanks for the interest.

@gavrielc gavrielc closed this May 3, 2026
nv-slang-bot Bot pushed a commit to slang-coworkers/nanoclaw that referenced this pull request May 12, 2026
Adds /add-gmail-tool — a Utility skill that installs Gmail as an MCP tool
in NanoClaw v2 using OneCLI for credential injection. No raw OAuth tokens
ever reach the container; the gateway swaps the "onecli-managed" stub
bearer for the real token at request time.

Scope (3 files):
- container/Dockerfile: pnpm global-install of
  @gongrzhe/server-gmail-autoauth-mcp@1.1.11, pinned behind GMAIL_MCP_VERSION.
  Also pins zod-to-json-schema@3.22.5 to avoid an ERR_PACKAGE_PATH_NOT_EXPORTED
  crash: the MCP server's loose zod range resolves zod@3.24.x while
  zod-to-json-schema@3.25.x imports the zod/v3 subpath that only exists in
  zod>=3.25.
- container/agent-runner/src/providers/claude.ts: adds 'mcp__gmail__*' to
  TOOL_ALLOWLIST so the agent can invoke the server's tools.
- .claude/skills/add-gmail-tool/SKILL.md: pre-flight checks (OneCLI Gmail app
  connected, stubs present, mount allowlist covers ~/.gmail-mcp, agent
  secret-mode), per-group wiring in container.json (mount + mcpServers),
  verification steps, troubleshooting, removal instructions. Credits to
  gongrzhe for the MCP server and the add-atomic-chat-tool / add-vercel
  skill patterns.

Addresses nanocoai#1500 (proxy Gmail OAuth through credential proxy) on the Gmail
side. Overlaps in intent with nanocoai#1810 but stays surgical — no bundled
unrelated changes.

Tested end-to-end on Linux/Docker: CLI and WhatsApp self-chat agents can
list labels, search/read/send mail via OneCLI-injected tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants