Feature: Lazy MCP tool group loading to reduce context usage

## Problem

MCP tool schemas are loaded into context **all at once** at the start of every conversation, regardless of whether they'll be used. Each tool's full JSON schema (name, description, parameter types) must be present because tool calling is an API-level primitive — the model can only generate `tool_use` blocks for tools declared in the request's `tools` array.

This adds up fast. A typical setup with Gmail (31 tools), Google Drive (22 tools), and Playwright (21 tools) costs ~9.6k tokens — nearly 5% of context — on **every turn**, even when the user is just editing local files and will never touch email or a browser.

## Proposed Solution: Lazy Tool Group Loading

Instead of registering all MCP tools upfront, the CLI could:

1. **Start with lightweight "tool group" placeholders** — e.g., a single meta-tool or annotation like `gmail_tools` (~50 tokens) that tells the model "Gmail tools are available if needed"
2. **When the model signals it needs a tool group**, the CLI dynamically registers those MCP tools with the API
3. **On the next turn**, the model can now generate real tool calls against the fully-registered schemas

### Potential UX Variants

- **Explicit meta-tool**: Model calls `load_tool_group("gmail")`, CLI registers the tools and re-sends
- **Automatic detection**: CLI detects from model output that it wants to use an unloaded group and transparently loads it
- **Session-level config**: User specifies which tool groups to preload vs. lazy-load in settings

## Why This Can't Be Done with Skills Today

Skills are prompt-level expansions — they inject text into the conversation. Even if a skill loaded the full schema text into the prompt, the model still couldn't produce valid `tool_use` blocks because the tools aren't in the API request's `tools` array. The lazy loading **must** happen at the CLI/runtime layer.

## Benefits

- **Context savings**: ~100-200 tokens per tool group placeholder vs. ~2-4k per fully-loaded group. For a setup with 3 MCP servers, this could save ~8-9k tokens per turn.
- **Scales with MCP ecosystem growth**: As users add more MCP servers, the current approach becomes increasingly costly. Lazy loading keeps context usage proportional to what's actually needed.
- **No capability loss**: Tools are still fully available — they just load on demand.

## Example

Current behavior (every turn):
```
tools array: [gmail_list_accounts, gmail_get_profile, gmail_search, ... (31 Gmail tools),
              drive_list_files, drive_search, ... (22 Drive tools),
              browser_click, browser_snapshot, ... (21 Playwright tools)]
→ ~9.6k tokens consumed
```

Proposed behavior:
```
Turn 1 (user editing code):
  tools array: [Read, Write, Edit, Bash, Grep, Glob, ...,
                gmail_tool_group (placeholder), gdrive_tool_group (placeholder), playwright_tool_group (placeholder)]
  → ~200 tokens for MCP placeholders

Turn 5 (user asks to check email):
  Model calls: load_tool_group("gmail")
  CLI registers Gmail tools

Turn 6:
  tools array: [..., gmail_list_accounts, gmail_search, ..., gdrive_tool_group (placeholder), playwright_tool_group (placeholder)]
  → ~3.4k tokens for Gmail + ~100 tokens for other placeholders
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Lazy MCP tool group loading to reduce context usage #23508

Problem

Proposed Solution: Lazy Tool Group Loading

Potential UX Variants

Why This Can't Be Done with Skills Today

Benefits

Example

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Lazy MCP tool group loading to reduce context usage #23508

Description

Problem

Proposed Solution: Lazy Tool Group Loading

Potential UX Variants

Why This Can't Be Done with Skills Today

Benefits

Example

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions