Skip to content

fix(core): reduce LLM-based loop detection false positives#20701

Merged
SandyTao520 merged 3 commits intomainfrom
st/fix/llm-loop-detection-false-positives
Mar 2, 2026
Merged

fix(core): reduce LLM-based loop detection false positives#20701
SandyTao520 merged 3 commits intomainfrom
st/fix/llm-loop-detection-false-positives

Conversation

@SandyTao520
Copy link
Contributor

Summary

Reduces false positives in LLM-based loop detection by providing task context, tightening the system prompt, and tuning check frequency thresholds.

Details

The LLM-based loop detector was flagging legitimate batch operations (e.g., cross-file refactoring, adding headers to multiple files) as unproductive loops. Root causes:

  1. No task context: The loop-check LLM received only recent turns with no knowledge of the original user request, making it impossible to distinguish intentional batch work from actual loops.
  2. Vague system prompt: Subjective language like "a decent number of times" biased toward over-detection. The incremental-progress exception only covered same-file edits, not cross-file operations. The prompt also overlapped with the heuristic layer by describing identical-tool-call scenarios that are already caught before the LLM check runs.
  3. Aggressive check timing: First check at turn 30 with interval of 3 was too early/frequent for complex tasks.
  4. No argument differentiation: The prompt didn't instruct the LLM to compare tool arguments, so calls to the same tool on different files looked like repetition.

Changes:

  • User prompt context: Pass the original user request to the loop-detection LLM as the first content message so it can evaluate repetition in context of the task goal. Updated reset() signature and callsite in client.ts.
  • System prompt rewrite: Removed overlap with heuristic layer (identical tool calls already caught at layer 1). Clarified "consecutive assistant turns" → "consecutive model actions". Added explicit cross-file/batch exceptions and mandatory argument analysis (file paths, line numbers, content). Removed assumptions about function response visibility in trimmed history.
  • Threshold tuning: LLM_CHECK_AFTER_TURNS 30→40, DEFAULT_LLM_CHECK_INTERVAL 3→10, MIN_LLM_CHECK_INTERVAL 5→7.

Related Issues

Related to #18551

How to Validate

  1. Run the loop detection tests:
    npm test -w @google/gemini-cli-core -- src/services/loopDetectionService.test.ts
  2. Verify all 51 tests pass, including two new tests for user prompt inclusion/exclusion in LLM check contents.
  3. Run client tests to confirm no regressions:
    npm test -w @google/gemini-cli-core -- src/core/client.test.ts

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

@SandyTao520 SandyTao520 requested a review from a team as a code owner February 28, 2026 19:04
@gemini-cli
Copy link
Contributor

gemini-cli bot commented Feb 28, 2026

Hi @SandyTao520, thank you so much for your contribution to Gemini CLI! We really appreciate the time and effort you've put into this.

We're making some updates to our contribution process to improve how we track and review changes. Please take a moment to review our recent discussion post: Improving Our Contribution Process & Introducing New Guidelines.

Key Update: Starting January 26, 2026, the Gemini CLI project will require all pull requests to be associated with an existing issue. Any pull requests not linked to an issue by that date will be automatically closed.

Thank you for your understanding and for being a part of our community!

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the accuracy of the LLM-based loop detection mechanism by addressing several root causes of false positives. By providing the LLM with the original user's task context, refining the system prompt with clearer definitions and argument analysis instructions, and tuning the check frequency, the system can now more effectively differentiate between legitimate, productive batch operations and actual unproductive loops, leading to a more robust and less intrusive user experience.

Highlights

  • Task Context for Loop Detection: The original user request is now passed to the LLM-based loop detector, allowing it to evaluate repetition within the context of the overall task goal and distinguish legitimate batch operations from unproductive loops.
  • Refined System Prompt: The system prompt for the loop detection LLM has been significantly rewritten to clarify what constitutes an unproductive state, explicitly define what is NOT a loop (e.g., cross-file batch operations, incremental edits), and emphasize the critical importance of argument analysis (file paths, line numbers, content) when evaluating tool calls.
  • Tuned Check Frequency Thresholds: The timing for LLM-based loop checks has been adjusted to be less aggressive. The initial check is now performed after 40 turns (previously 30), the default interval between checks is increased to 10 turns (previously 3), and the minimum interval is set to 7 turns (previously 5).
Changelog
  • packages/core/src/core/client.ts
    • Imported the partListUnionToString utility.
    • Updated the loopDetector.reset method call to include the string representation of the current request, providing the original user prompt context to the loop detection service.
  • packages/core/src/services/loopDetectionService.test.ts
    • Updated test assertions to reflect the new LLM_CHECK_AFTER_TURNS value of 40, adjusting expected turn counts for initial checks.
    • Modified interval calculation comments and expected turn numbers in tests to align with the new MIN_LLM_CHECK_INTERVAL and DEFAULT_LLM_CHECK_INTERVAL values.
    • Added two new tests to verify that the user prompt is correctly included in the LLM check contents when provided, and excluded when not provided.
  • packages/core/src/services/loopDetectionService.ts
    • Updated constants: LLM_CHECK_AFTER_TURNS from 30 to 40, DEFAULT_LLM_CHECK_INTERVAL from 3 to 10, and MIN_LLM_CHECK_INTERVAL from 5 to 7.
    • Completely rewrote the LOOP_DETECTION_SYSTEM_PROMPT to provide more precise guidelines for identifying unproductive states, explicitly listing what is not a loop, and emphasizing argument analysis.
    • Added a new private member userPrompt to store the original user request.
    • Modified the _checkLlmForLoop method to conditionally prepend the userPrompt to the LLM's input contents if it is available.
    • Updated the reset method signature to accept an optional userPrompt parameter and store it.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the issue of false positives in the LLM-based loop detection by providing more context to the diagnostic model and refining its behavior. The changes, including passing the original user prompt, rewriting the system prompt for clarity, and tuning the check frequency, are well-reasoned and directly target the problems described. The updated system prompt is a particularly great improvement, offering much clearer guidance to the model. The identified high-severity security concern related to passing raw user input to the diagnostic LLM, which could open a vector for prompt injection, remains a valid point of feedback.

@github-actions
Copy link

github-actions bot commented Feb 28, 2026

Size Change: +2.87 kB (+0.01%)

Total Size: 25.8 MB

Filename Size Change
./bundle/gemini.js 25.3 MB +2.87 kB (+0.01%)
ℹ️ View Unchanged
Filename Size
./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js 221 kB
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js 227 kB
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js 11.5 kB
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js 132 B
./bundle/sandbox-macos-permissive-open.sb 890 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB
./bundle/sandbox-macos-strict-open.sb 4.82 kB
./bundle/sandbox-macos-strict-proxied.sb 5.02 kB

compressed-size-action

@gemini-cli gemini-cli bot added area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality 🔒 maintainer only ⛔ Do not contribute. Internal roadmap item. labels Feb 28, 2026
- Include original user prompt in loop-check context so the LLM can
  distinguish batch work from actual loops
- Rewrite system prompt with concrete thresholds, explicit cross-file
  batch exceptions, and mandatory argument differentiation
- Raise LLM_CHECK_AFTER_TURNS (30→40) and DEFAULT_LLM_CHECK_INTERVAL
  (3→10) to give complex tasks more runway before checking
- Increase MIN_LLM_CHECK_INTERVAL (5→7) to reduce over-checking
…g and build patterns

Address PR feedback: rename pattern categories to emphasize identical
outcomes are required, carve out build re-runs and debugging progress
as legitimate workflow, and rename 'Cognitive loops' to 'Stuck reasoning'
to avoid confusion with repeated command output.
Use <original_user_request> XML tags to clearly delineate user-provided
content from loop detection instructions.
@NTaylorMullen NTaylorMullen force-pushed the st/fix/llm-loop-detection-false-positives branch from e026684 to f5476d6 Compare March 1, 2026 02:44
@NTaylorMullen NTaylorMullen enabled auto-merge March 1, 2026 02:44
@NTaylorMullen NTaylorMullen added this pull request to the merge queue Mar 1, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 1, 2026
@SandyTao520 SandyTao520 added this pull request to the merge queue Mar 1, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 1, 2026
@SandyTao520 SandyTao520 added this pull request to the merge queue Mar 2, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 2, 2026
@SandyTao520 SandyTao520 added this pull request to the merge queue Mar 2, 2026
Merged via the queue into main with commit 7c9fceb Mar 2, 2026
27 checks passed
@SandyTao520 SandyTao520 deleted the st/fix/llm-loop-detection-false-positives branch March 2, 2026 19:23
BryanBradfo pushed a commit to BryanBradfo/gemini-cli that referenced this pull request Mar 5, 2026
struckoff pushed a commit to struckoff/gemini-cli that referenced this pull request Mar 6, 2026
liamhelmer pushed a commit to badal-io/gemini-cli that referenced this pull request Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality 🔒 maintainer only ⛔ Do not contribute. Internal roadmap item.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants