Skip to content

feat(context): implement observation masking for tool outputs#18389

Merged
abhipatel12 merged 4 commits intomainfrom
abhi/masking-prototype
Feb 6, 2026
Merged

feat(context): implement observation masking for tool outputs#18389
abhipatel12 merged 4 commits intomainfrom
abhi/masking-prototype

Conversation

@abhipatel12
Copy link
Collaborator

Summary

Implement observation masking for tool outputs to prevent context window overflow and mitigate "Context Rot" in long-running agentic loops.

Details

This PR introduces a robust context management system called Observation Masking. Key features include:

  • ObservationMaskingService: Implements a "Hybrid Backward Scanned FIFO" algorithm to identify bulky tool outputs that are safely outside the high-fidelity reasoning window.
  • Thresholds: Defaults to a 50,000 token protection buffer and a 30,000 token hysteresis threshold to minimize cache-invalidating masking events.
  • Offloading: Bulky tool outputs are offloaded to a local artifact store (observations/) and replaced with a structured XML metadata snippet.
  • Metadata Retention: Masked snippets retain high-signal information like exit codes, error messages, and head/tail previews, ensuring the agent maintains a coherent causal chain.
  • User Configuration: Added experimental settings to settingsSchema.ts for fine-tuning thresholds.

Related Issues

Fixes #16955

How to Validate

Automated Tests

npm test packages/core/src/services/observationMaskingService.test.ts

Manual Verification

  1. Enable the feature in .gemini/settings.json:
    {
      "experimental": {
        "observationMasking": {
          "enabled": true
        }
      }
    }
  2. Run a command with massive output (e.g., git log on a large repo).
  3. Continue the conversation and verify the old output is masked in subsequent turns.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run

@abhipatel12 abhipatel12 requested a review from a team as a code owner February 5, 2026 18:32
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @abhipatel12, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant context management feature called "Observation Masking" to prevent context window overflow and mitigate "Context Rot" in long-running agentic loops. By intelligently identifying and masking large tool outputs, the system ensures that the agent's context remains efficient and focused on recent, high-fidelity information, while still providing access to full historical data on demand.

Highlights

  • Observation Masking Implementation: Introduces a new ObservationMaskingService to manage context window overflow by intelligently masking bulky tool outputs.
  • Hybrid Backward Scanned FIFO Algorithm: Employs a sophisticated algorithm to identify and mask tool outputs that are safely outside the high-fidelity reasoning window, prioritizing recent context.
  • Configurable Thresholds: Adds experimental settings for toolProtectionThreshold (default 50,000 tokens) and hysteresisThreshold (default 30,000 tokens) to fine-tune masking behavior and minimize cache-invalidating events.
  • Tool Output Offloading: Bulky tool outputs are offloaded to a local artifact store (observations/) and replaced with a structured XML metadata snippet in the agent's context.
  • High-Signal Metadata Retention: Masked snippets retain crucial information like exit codes, error messages, and head/tail previews, ensuring the agent maintains a coherent causal chain without the full output.
  • Telemetry Integration: Adds telemetry events and keys to log observation masking activities, providing insights into token savings and masking frequency.
Changelog
  • packages/cli/src/config/config.ts
    • Added observationMasking configuration to the CLI's loaded settings.
  • packages/cli/src/config/settingsSchema.ts
    • Introduced new experimental settings for observationMasking, including enabled, toolProtectionThreshold, hysteresisThreshold, and protectLatestTurn, allowing user customization.
  • packages/core/src/config/config.ts
    • Defined the ObservationMaskingConfig interface and integrated observation masking parameters into the core Config class, including default values and getter methods.
  • packages/core/src/core/client.test.ts
    • Updated client tests to mock the new getObservationMaskingEnabled configuration.
  • packages/core/src/core/client.ts
    • Integrated the ObservationMaskingService into the GeminiClient, ensuring that observation masking is applied before token estimation.
  • packages/core/src/services/snapshots/observationMaskingService.test.ts.snap
    • Added a snapshot test for the expected format of a masked tool output.
  • packages/core/src/services/chatRecordingService.ts
    • Refined the type definition for message.type in recordMessage.
  • packages/core/src/services/observationMaskingService.test.ts
    • Added comprehensive unit tests for the new ObservationMaskingService, covering various scenarios like protection of latest turns, global aggregation, tool-aware previews, and handling of already masked content.
  • packages/core/src/services/observationMaskingService.ts
    • Implemented the core logic for ObservationMaskingService, including the mask method, the Hybrid Backward Scanned FIFO algorithm, offloading to local storage, and formatting of masked snippets with previews.
  • packages/core/src/telemetry/clearcut-logger/clearcut-logger.ts
    • Extended the Clearcut logger to include ObservationMaskingEvent for tracking masking activities.
  • packages/core/src/telemetry/clearcut-logger/event-metadata-key.ts
    • Added new telemetry event metadata keys specifically for observation masking.
  • packages/core/src/telemetry/loggers.ts
    • Introduced a new logObservationMasking function to record telemetry for observation masking events.
  • packages/core/src/telemetry/types.ts
    • Defined the ObservationMaskingEvent class for structured telemetry data related to observation masking.
  • schemas/settings.schema.json
    • Updated the main settings JSON schema to reflect the new observationMasking experimental configuration options.
Ignored Files
  • Ignored by pattern: .gemini/** (1)
    • .gemini/settings.json
Activity
  • The pull request introduces a new feature, "Observation Masking," to manage context window overflow.
  • Automated tests for packages/core/src/services/observationMaskingService.test.ts have been added and are expected to pass.
  • Manual verification steps are provided, including enabling the feature in .gemini/settings.json and running commands with massive output to observe masking.
  • The author has validated the changes on MacOS using npm run.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Size Change: +14.7 kB (+0.06%)

Total Size: 23.8 MB

Filename Size Change
./bundle/gemini.js 23.7 MB +14.7 kB (+0.06%)
ℹ️ View Unchanged
Filename Size
./bundle/sandbox-macos-permissive-closed.sb 1.03 kB
./bundle/sandbox-macos-permissive-open.sb 890 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB
./bundle/sandbox-macos-restrictive-closed.sb 3.29 kB
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB

compressed-size-action

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust observation masking feature to manage context window efficiency by offloading large tool outputs. The implementation is well-structured, with a new ObservationMaskingService, corresponding configurations, and thorough tests, and the integration into the GeminiClient is logical. However, the ObservationMaskingService has two high-severity security vulnerabilities: a Path Traversal flaw in how offloaded tool output files are named, and a Prompt Injection vulnerability in how masked snippets are formatted for the LLM. Both issues stem from the use of untrusted data from tool outputs and model responses without proper sanitization or escaping.

@gemini-cli gemini-cli bot added area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality 🔒 maintainer only ⛔ Do not contribute. Internal roadmap item. labels Feb 5, 2026
private formatMaskedSnippet(params: MaskedSnippetParams): string {
const { toolName, filePath, fileSizeMB, totalLines, tokens, preview } =
params;
return `[Observation Masked]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm if the model sees this there's a lot of noise. I'd imagine it doesn't need all of this information / may distract it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fair. I flattened the return a little bit.

When i didn't have the preview, there was a drop in quality.

So that may be helpful to keep. tool name and path also are fine i think.

Should i remove total tokens, line count, and file size? It was to give reference back to the model if it went to go read it.

So like "oh that file is 40k tokens, i should use grep or ranged reads"

But that can be a followup if we actually see model behavior where the model goes and checks these files later.

@abhipatel12 abhipatel12 force-pushed the abhi/masking-prototype branch from 541a0bd to 2f61bee Compare February 5, 2026 23:45
Copy link
Collaborator Author

@abhipatel12 abhipatel12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. LMK what you think!

private formatMaskedSnippet(params: MaskedSnippetParams): string {
const { toolName, filePath, fileSizeMB, totalLines, tokens, preview } =
params;
return `[Observation Masked]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fair. I flattened the return a little bit.

When i didn't have the preview, there was a drop in quality.

So that may be helpful to keep. tool name and path also are fine i think.

Should i remove total tokens, line count, and file size? It was to give reference back to the model if it went to go read it.

So like "oh that file is 40k tokens, i should use grep or ranged reads"

But that can be a followup if we actually see model behavior where the model goes and checks these files later.

@abhipatel12 abhipatel12 force-pushed the abhi/masking-prototype branch from 2f61bee to 00b7319 Compare February 6, 2026 00:36
@abhipatel12 abhipatel12 enabled auto-merge February 6, 2026 01:42
@abhipatel12 abhipatel12 added this pull request to the merge queue Feb 6, 2026
Merged via the queue into main with commit 8ec176e Feb 6, 2026
26 checks passed
@abhipatel12 abhipatel12 deleted the abhi/masking-prototype branch February 6, 2026 02:04
aswinashok44 pushed a commit to aswinashok44/gemini-cli that referenced this pull request Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality 🔒 maintainer only ⛔ Do not contribute. Internal roadmap item.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Context Mgmt] Rolling Tool Output Pruning

2 participants