Skip to content

fix(core): deduplicate GEMINI.md files by device/inode on case-insensitive filesystems (#19904)#19915

Merged
scidomino merged 21 commits intogoogle-gemini:mainfrom
Nixxx19:fix/duplicate-gemini-md-case-insensitive
Mar 6, 2026
Merged

fix(core): deduplicate GEMINI.md files by device/inode on case-insensitive filesystems (#19904)#19915
scidomino merged 21 commits intogoogle-gemini:mainfrom
Nixxx19:fix/duplicate-gemini-md-case-insensitive

Conversation

@Nixxx19
Copy link
Contributor

@Nixxx19 Nixxx19 commented Feb 22, 2026

Summary

Fixes duplicate loading of GEMINI.md files on case-insensitive filesystems (e.g., macOS/Darwin) by implementing file identity-based deduplication using device and inode numbers. This prevents the same physical file from being loaded multiple times when discovered through different case variants (e.g., GEMINI.md and gemini.md), which wastes context window space.

Details

On case-insensitive filesystems, the file discovery process can find the same physical file through multiple path variants (e.g., /app/GEMINI.md and /app/gemini.md). Since these are different strings, the existing string-based deduplication (Set<string>) doesn't catch them, leading to duplicate context loading.

Solution:

  • Added deduplicatePathsByFileIdentity() function that uses fs.lstat() to get device (dev) and inode (ino) numbers
  • Deduplicates paths by creating a unique key from dev:ino combination
  • Integrated deduplication into all three code paths that load GEMINI.md files:
    1. loadServerHierarchicalMemory() - Old/non-JIT path
    2. loadJitSubdirectoryMemory() - JIT discover path
    3. ContextManager.loadMemoryContents() - JIT refresh path

Implementation notes:

  • Handles both number and bigint types for dev/ino (Node.js version/platform dependent)
  • Processes files in batches (20 concurrent) to prevent EMFILE errors
  • Gracefully handles files that cannot be stat'd (missing files, permission errors)
  • Includes debug logging when debug mode is enabled

Related Issues

Fixes #19904

How to Validate

Test on macOS (case-insensitive filesystem)

  1. Create a test scenario:

    cd /tmp
    mkdir test-gemini-dedup
    cd test-gemini-dedup
    echo "# Test content" > gemini.md
    # On macOS, both GEMINI.md and gemini.md point to the same file
  2. Verify deduplication works:

    # Run with debug mode to see deduplication logs
    DEBUG=true gemini
    # Check that only one file is loaded (not duplicates)
  3. Check the logs:

    • Look for deduplication debug messages showing which files are kept/skipped
    • Verify that fileCount in memory stats matches actual unique files

Run unit tests

npm run test -- src/utils/memoryDiscovery.test.ts src/services/contextManager.test.ts

Expected: All tests pass, including new deduplication tests:

  • should deduplicate files that point to the same inode (same physical file)
  • should handle case where files have different inodes (different files)
  • should handle files that cannot be stat'd (missing files)
  • should handle multiple duplicates (3+ paths)
  • should deduplicate JIT loaded files
  • should deduplicate files by file identity in case-insensitive filesystems

Manual verification

  1. Create a project with both GEMINI.md and gemini.md (same file, different case)
  2. Run gemini and check memory loading
  3. Verify context is loaded only once (not duplicated)
  4. Check that /memory list shows only one file path

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
    • No documentation changes needed - this is an internal bug fix
  • Added/updated tests (if needed)
    • Added comprehensive unit tests for deduplication function
    • Added integration test for ContextManager deduplication
    • All existing tests still pass
  • Noted breaking changes (if any)
    • No breaking changes - this is a bug fix that improves behavior
  • Validated on required platforms/methods:
    • MacOS
      • npm run - Tests pass
      • npx - Not applicable for development
      • Docker - Not tested (sandboxing not required for this fix)
      • Podman - Not tested (sandboxing not required for this fix)
      • Seatbelt - Not tested (sandboxing not required for this fix)
    • Windows - Not tested (case-insensitive filesystem, should work similarly)
    • Linux - Not tested (case-sensitive filesystem, less critical but should still work)

@Nixxx19 Nixxx19 requested a review from a team as a code owner February 22, 2026 15:21
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Nixxx19, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily addresses an issue where GEMINI.md files were being loaded multiple times on case-insensitive filesystems if they existed with different casing (e.g., GEMINI.md and gemini.md). By implementing file identity-based deduplication using device and inode numbers, the system now correctly identifies and loads each unique physical file only once, optimizing context window usage. Additionally, a bug preventing the persistence of user-selected preview models during certain authentication refreshes has been fixed, ensuring model selections are retained as expected.

Highlights

  • File Deduplication: Implemented a new mechanism to deduplicate GEMINI.md files based on their unique device and inode numbers, specifically addressing issues on case-insensitive filesystems like macOS.
  • New Utility Function: Introduced deduplicatePathsByFileIdentity in memoryDiscovery.ts to perform the file identity-based deduplication, handling batch processing and error cases.
  • Memory Loading Integration: Integrated the new deduplication logic into all three GEMINI.md memory loading paths: loadServerHierarchicalMemory, loadJitSubdirectoryMemory, and ContextManager.loadMemoryContents.
  • Model Persistence Fix: Resolved a bug where user-selected preview models could be incorrectly reset during authentication refresh, particularly for LOGIN_WITH_GOOGLE auth types.
  • Enhanced Testing: Added comprehensive unit tests for the new deduplication logic and the model persistence bug fix, ensuring robustness and correct behavior.
Changelog
  • packages/core/src/config/config.test.ts
  • packages/core/src/config/config.ts
    • Modified the refreshAuth method to refine the condition for resetting a preview model. The model will now only be reset if hasAccessToPreviewModel is false AND the authType is not LOGIN_WITH_GOOGLE, preventing unintended resets when refreshUserQuota() might not have been called.
  • packages/core/src/services/contextManager.test.ts
    • Mocked the newly introduced deduplicatePathsByFileIdentity function.
    • Added a new test case within the ContextManager describe block to verify that loadMemoryContents correctly deduplicates files based on file identity on case-insensitive filesystems.
  • packages/core/src/services/contextManager.ts
    • Imported the deduplicatePathsByFileIdentity utility function.
    • Integrated deduplicatePathsByFileIdentity into the loadMemoryContents method to perform file identity-based deduplication on all discovered GEMINI.md paths before reading their content.
  • packages/core/src/utils/memoryDiscovery.test.ts
    • Added a new describe block for case-insensitive filesystem deduplication containing four new test cases. These tests cover scenarios like deduplicating hard-linked files, handling distinct files with different inodes, gracefully managing missing files during stat operations, and deduplicating multiple paths pointing to the same physical file.
    • Added a new test case within the loadJitSubdirectoryMemory describe block to ensure that JIT memory loading also correctly deduplicates files based on inode.
  • packages/core/src/utils/memoryDiscovery.ts
    • Implemented and exported the deduplicatePathsByFileIdentity asynchronous function. This function takes an array of file paths, uses fs.lstat to retrieve device and inode numbers, processes paths in batches of 20 to prevent EMFILE errors, and returns a list of unique paths based on their dev:ino identity. It includes debug logging and handles potential lstat errors.
    • Modified loadServerHierarchicalMemory to apply deduplicatePathsByFileIdentity to the collected GEMINI.md paths.
    • Modified loadJitSubdirectoryMemory to apply deduplicatePathsByFileIdentity to the newPaths before reading their content.
Activity
  • The author implemented the core deduplication logic and integrated it across relevant memory loading paths.
  • Comprehensive unit and integration tests were added to validate the new deduplication functionality and ensure the model persistence fix.
  • The author validated the changes on macOS, specifically targeting the case-insensitive filesystem behavior.
  • The pre-merge checklist was updated, confirming no breaking changes and adequate test coverage.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-cli gemini-cli bot added the area/core Issues related to User Interface, OS Support, Core Functionality label Feb 22, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request correctly addresses the issue of duplicate GEMINI.md file loading on case-insensitive filesystems by implementing inode-based deduplication. This is a significant improvement for context window efficiency. However, the current implementation using lstat fails to deduplicate symbolic links pointing to the same physical file. Additionally, the JIT loading path in loadJitSubdirectoryMemory still relies on string-based filtering against already loaded paths, which will fail to prevent duplicates if a file was previously loaded via a different case variant. Addressing these issues will ensure the deduplication logic is robust across all loading scenarios.

@Nixxx19 Nixxx19 force-pushed the fix/duplicate-gemini-md-case-insensitive branch from aab4e37 to 00b78d6 Compare February 22, 2026 15:38
@gemini-cli gemini-cli bot added priority/p2 Important but can be addressed in a future release. help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! labels Feb 26, 2026
@Nixxx19
Copy link
Contributor Author

Nixxx19 commented Mar 4, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to deduplicate GEMINI.md files based on their file identity (device and inode numbers), which resolves an issue with duplicate context loading on case-insensitive filesystems. The implementation adds a new deduplicatePathsByFileIdentity utility and integrates it into the memory loading process for both hierarchical and JIT contexts. The changes are well-tested.

My review has two main points:

  1. The pull request includes unrelated test changes for a different bug fix (#19864), which should be moved to a separate PR to maintain focus.
  2. There's a performance inefficiency in the new JIT memory loading logic where file stats are fetched multiple times for the same paths. I've suggested a refactoring to address this.

Nixxx19 added 2 commits March 4, 2026 23:59
- Resolved conflicts in config.ts (kept main's cleaner === false logic)
- Removed duplicate test block for google-gemini#19864 from config.test.ts
- Optimized memoryDiscovery.ts to avoid redundant fs.stat calls
…thub.com/Nixxx19/gemini-cli into fix/duplicate-gemini-md-case-insensitive

# Conflicts:
#	packages/core/src/config/config.ts
@Nixxx19
Copy link
Contributor Author

Nixxx19 commented Mar 4, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces file identity-based deduplication to prevent loading the same GEMINI.md file multiple times on case-insensitive filesystems, which is a great improvement. However, I've found a critical runtime bug in ContextManager due to incorrect handling of the new deduplication function's return value, which was missed because of corresponding incorrect mocks in the tests. Additionally, there's a significant performance issue in the JIT memory loading logic that could cause slowdowns in projects with many memory files. Please see my detailed comments for suggestions on how to address these issues.

- Fix critical bug: destructure paths from deduplicatePathsByFileIdentity in contextManager
- Optimize performance: cache file identities in ContextManager to avoid redundant fs.stat calls
- Update MemoryLoadResult interface to include fileIdentities
- Update test mocks to match new return type
@Nixxx19
Copy link
Contributor Author

Nixxx19 commented Mar 4, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a crucial fix for deduplicating "GEMINI.md" files on case-insensitive filesystems by using file identity (device and inode). While the overall implementation is well-structured and includes thorough tests, a significant flaw was found in the concurrency limit implementation within the "loadJitSubdirectoryMemory" function. This critical issue could lead to resource exhaustion, "EMFILE" errors, and application crashes when handling a large number of files.

- Await each batch of fs.stat() calls inside the loop
- Prevents EMFILE errors when fallback path is used (e.g., in tests)
- Matches the pattern used in deduplicatePathsByFileIdentity
@Nixxx19
Copy link
Contributor Author

Nixxx19 commented Mar 4, 2026

@scidomino pr is ready for review, thank you so much!!

@scidomino
Copy link
Collaborator

Also, make sure npm run preflight passes.

Nixxx19 added 2 commits March 5, 2026 08:08
- Remove debugMode conditional check before logger.debug call
- logger.debug already wraps debugLogger which handles debug mode internally
…arameter

- Add 5th parameter (alreadyLoadedIdentities) to loadJitSubdirectoryMemory expectation
- Matches the updated function signature
@Nixxx19
Copy link
Contributor Author

Nixxx19 commented Mar 5, 2026

Also, make sure npm run preflight passes.

fixed them, thank you!!

Nixxx19 added 4 commits March 5, 2026 23:46
Use getErrorMessage() instead of introducing a new lint exception
for extracting error messages from Promise.allSettled rejections.
Removed all debugMode parameters and replaced conditional debug logging
with direct debugLogger.debug() calls throughout memoryDiscovery.ts
Remove debugMode parameter from loadServerHierarchicalMemory calls
to match updated function signature
@scidomino
Copy link
Collaborator

checks failed

@Nixxx19
Copy link
Contributor Author

Nixxx19 commented Mar 6, 2026

checks failed

@scidomino fixed, running fine locally, thank you!!

@scidomino scidomino added this pull request to the merge queue Mar 6, 2026
Merged via the queue into google-gemini:main with commit 82316ef Mar 6, 2026
27 checks passed
kunal-10-cloud pushed a commit to kunal-10-cloud/gemini-cli that referenced this pull request Mar 12, 2026
liamhelmer pushed a commit to badal-io/gemini-cli that referenced this pull request Mar 12, 2026
yashodipmore pushed a commit to yashodipmore/geemi-cli that referenced this pull request Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core Issues related to User Interface, OS Support, Core Functionality help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! priority/p2 Important but can be addressed in a future release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gemini CLI loads both GEMINI.md and gemini.md on case-insensitive filesystems, duplicating context

2 participants