What would you like to be added?
Feature Request: Microcompact Context Compression Strategy
Problem
Qwen Code's chatCompressionService.ts has a single compression strategy: summarize the conversation via an LLM API call when context usage exceeds 70% of the window, keeping the last 30%. This means every compression event incurs the latency and cost of an additional LLM call, even when much of the bloat comes from large, stale tool results that could be cheaply trimmed without any LLM involvement.
Proposed Solution
Add a microcompact pre-pass that runs before the LLM summarization step:
- Scan chat history for old tool results from tools known to produce large outputs (
read_file, run_shell_command, grep_search, glob, etc.)
- Skip the N most recent tool results (default: 5) to preserve the active working context
- Replace older, large tool results (> 500 chars) with a short cleared message
- If microcompact alone brings context usage below the threshold, skip the LLM summarization entirely
This is a zero-LLM-call compression strategy that reduces API costs and latency. The LLM summarization remains as a Phase 2 fallback when microcompact alone isn't sufficient.
Scope
- New module:
packages/core/src/services/microcompact.ts
- Changes to
packages/core/src/services/chatCompressionService.ts (add Phase 1 microcompact before Phase 2 LLM)
- New
MICROCOMPACTED status in CompressionStatus enum (turn.ts)
- Handle
MICROCOMPACTED status in client.ts and CompressionMessage.tsx
Impact
High (reduces API costs for compression). Medium effort.
Why is this needed?
Qwen Code's chatCompressionService.ts has a single compression strategy: summarize the conversation via an LLM API call when context usage exceeds 70% of the window, keeping the last 30%. This means every compression event incurs the latency and cost of an additional LLM call, even when much of the bloat comes from large, stale tool results that could be cheaply trimmed without any LLM involvement.
Additional context
No response
What would you like to be added?
Feature Request: Microcompact Context Compression Strategy
Problem
Qwen Code's
chatCompressionService.tshas a single compression strategy: summarize the conversation via an LLM API call when context usage exceeds 70% of the window, keeping the last 30%. This means every compression event incurs the latency and cost of an additional LLM call, even when much of the bloat comes from large, stale tool results that could be cheaply trimmed without any LLM involvement.Proposed Solution
Add a microcompact pre-pass that runs before the LLM summarization step:
read_file,run_shell_command,grep_search,glob, etc.)This is a zero-LLM-call compression strategy that reduces API costs and latency. The LLM summarization remains as a Phase 2 fallback when microcompact alone isn't sufficient.
Scope
packages/core/src/services/microcompact.tspackages/core/src/services/chatCompressionService.ts(add Phase 1 microcompact before Phase 2 LLM)MICROCOMPACTEDstatus inCompressionStatusenum (turn.ts)MICROCOMPACTEDstatus inclient.tsandCompressionMessage.tsxImpact
High (reduces API costs for compression). Medium effort.
Why is this needed?
Qwen Code's
chatCompressionService.tshas a single compression strategy: summarize the conversation via an LLM API call when context usage exceeds 70% of the window, keeping the last 30%. This means every compression event incurs the latency and cost of an additional LLM call, even when much of the bloat comes from large, stale tool results that could be cheaply trimmed without any LLM involvement.Additional context
No response