Problem
MCP tool schemas are loaded into context all at once at the start of every conversation, regardless of whether they'll be used. Each tool's full JSON schema (name, description, parameter types) must be present because tool calling is an API-level primitive — the model can only generate tool_use blocks for tools declared in the request's tools array.
This adds up fast. A typical setup with Gmail (31 tools), Google Drive (22 tools), and Playwright (21 tools) costs ~9.6k tokens — nearly 5% of context — on every turn, even when the user is just editing local files and will never touch email or a browser.
Proposed Solution: Lazy Tool Group Loading
Instead of registering all MCP tools upfront, the CLI could:
- Start with lightweight "tool group" placeholders — e.g., a single meta-tool or annotation like
gmail_tools (~50 tokens) that tells the model "Gmail tools are available if needed"
- When the model signals it needs a tool group, the CLI dynamically registers those MCP tools with the API
- On the next turn, the model can now generate real tool calls against the fully-registered schemas
Potential UX Variants
- Explicit meta-tool: Model calls
load_tool_group("gmail"), CLI registers the tools and re-sends
- Automatic detection: CLI detects from model output that it wants to use an unloaded group and transparently loads it
- Session-level config: User specifies which tool groups to preload vs. lazy-load in settings
Why This Can't Be Done with Skills Today
Skills are prompt-level expansions — they inject text into the conversation. Even if a skill loaded the full schema text into the prompt, the model still couldn't produce valid tool_use blocks because the tools aren't in the API request's tools array. The lazy loading must happen at the CLI/runtime layer.
Benefits
- Context savings: ~100-200 tokens per tool group placeholder vs. ~2-4k per fully-loaded group. For a setup with 3 MCP servers, this could save ~8-9k tokens per turn.
- Scales with MCP ecosystem growth: As users add more MCP servers, the current approach becomes increasingly costly. Lazy loading keeps context usage proportional to what's actually needed.
- No capability loss: Tools are still fully available — they just load on demand.
Example
Current behavior (every turn):
tools array: [gmail_list_accounts, gmail_get_profile, gmail_search, ... (31 Gmail tools),
drive_list_files, drive_search, ... (22 Drive tools),
browser_click, browser_snapshot, ... (21 Playwright tools)]
→ ~9.6k tokens consumed
Proposed behavior:
Turn 1 (user editing code):
tools array: [Read, Write, Edit, Bash, Grep, Glob, ...,
gmail_tool_group (placeholder), gdrive_tool_group (placeholder), playwright_tool_group (placeholder)]
→ ~200 tokens for MCP placeholders
Turn 5 (user asks to check email):
Model calls: load_tool_group("gmail")
CLI registers Gmail tools
Turn 6:
tools array: [..., gmail_list_accounts, gmail_search, ..., gdrive_tool_group (placeholder), playwright_tool_group (placeholder)]
→ ~3.4k tokens for Gmail + ~100 tokens for other placeholders
Problem
MCP tool schemas are loaded into context all at once at the start of every conversation, regardless of whether they'll be used. Each tool's full JSON schema (name, description, parameter types) must be present because tool calling is an API-level primitive — the model can only generate
tool_useblocks for tools declared in the request'stoolsarray.This adds up fast. A typical setup with Gmail (31 tools), Google Drive (22 tools), and Playwright (21 tools) costs ~9.6k tokens — nearly 5% of context — on every turn, even when the user is just editing local files and will never touch email or a browser.
Proposed Solution: Lazy Tool Group Loading
Instead of registering all MCP tools upfront, the CLI could:
gmail_tools(~50 tokens) that tells the model "Gmail tools are available if needed"Potential UX Variants
load_tool_group("gmail"), CLI registers the tools and re-sendsWhy This Can't Be Done with Skills Today
Skills are prompt-level expansions — they inject text into the conversation. Even if a skill loaded the full schema text into the prompt, the model still couldn't produce valid
tool_useblocks because the tools aren't in the API request'stoolsarray. The lazy loading must happen at the CLI/runtime layer.Benefits
Example
Current behavior (every turn):
Proposed behavior: