Skip to content

Feature: Lazy MCP tool group loading to reduce context usage #23508

@JoelCooperPhD

Description

@JoelCooperPhD

Problem

MCP tool schemas are loaded into context all at once at the start of every conversation, regardless of whether they'll be used. Each tool's full JSON schema (name, description, parameter types) must be present because tool calling is an API-level primitive — the model can only generate tool_use blocks for tools declared in the request's tools array.

This adds up fast. A typical setup with Gmail (31 tools), Google Drive (22 tools), and Playwright (21 tools) costs ~9.6k tokens — nearly 5% of context — on every turn, even when the user is just editing local files and will never touch email or a browser.

Proposed Solution: Lazy Tool Group Loading

Instead of registering all MCP tools upfront, the CLI could:

  1. Start with lightweight "tool group" placeholders — e.g., a single meta-tool or annotation like gmail_tools (~50 tokens) that tells the model "Gmail tools are available if needed"
  2. When the model signals it needs a tool group, the CLI dynamically registers those MCP tools with the API
  3. On the next turn, the model can now generate real tool calls against the fully-registered schemas

Potential UX Variants

  • Explicit meta-tool: Model calls load_tool_group("gmail"), CLI registers the tools and re-sends
  • Automatic detection: CLI detects from model output that it wants to use an unloaded group and transparently loads it
  • Session-level config: User specifies which tool groups to preload vs. lazy-load in settings

Why This Can't Be Done with Skills Today

Skills are prompt-level expansions — they inject text into the conversation. Even if a skill loaded the full schema text into the prompt, the model still couldn't produce valid tool_use blocks because the tools aren't in the API request's tools array. The lazy loading must happen at the CLI/runtime layer.

Benefits

  • Context savings: ~100-200 tokens per tool group placeholder vs. ~2-4k per fully-loaded group. For a setup with 3 MCP servers, this could save ~8-9k tokens per turn.
  • Scales with MCP ecosystem growth: As users add more MCP servers, the current approach becomes increasingly costly. Lazy loading keeps context usage proportional to what's actually needed.
  • No capability loss: Tools are still fully available — they just load on demand.

Example

Current behavior (every turn):

tools array: [gmail_list_accounts, gmail_get_profile, gmail_search, ... (31 Gmail tools),
              drive_list_files, drive_search, ... (22 Drive tools),
              browser_click, browser_snapshot, ... (21 Playwright tools)]
→ ~9.6k tokens consumed

Proposed behavior:

Turn 1 (user editing code):
  tools array: [Read, Write, Edit, Bash, Grep, Glob, ...,
                gmail_tool_group (placeholder), gdrive_tool_group (placeholder), playwright_tool_group (placeholder)]
  → ~200 tokens for MCP placeholders

Turn 5 (user asks to check email):
  Model calls: load_tool_group("gmail")
  CLI registers Gmail tools

Turn 6:
  tools array: [..., gmail_list_accounts, gmail_search, ..., gdrive_tool_group (placeholder), playwright_tool_group (placeholder)]
  → ~3.4k tokens for Gmail + ~100 tokens for other placeholders

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions