fix(core): deduplicate GEMINI.md files by device/inode on case-insensitive filesystems (#19904)#19915
Conversation
Summary of ChangesHello @Nixxx19, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily addresses an issue where GEMINI.md files were being loaded multiple times on case-insensitive filesystems if they existed with different casing (e.g., GEMINI.md and gemini.md). By implementing file identity-based deduplication using device and inode numbers, the system now correctly identifies and loads each unique physical file only once, optimizing context window usage. Additionally, a bug preventing the persistence of user-selected preview models during certain authentication refreshes has been fixed, ensuring model selections are retained as expected. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request correctly addresses the issue of duplicate GEMINI.md file loading on case-insensitive filesystems by implementing inode-based deduplication. This is a significant improvement for context window efficiency. However, the current implementation using lstat fails to deduplicate symbolic links pointing to the same physical file. Additionally, the JIT loading path in loadJitSubdirectoryMemory still relies on string-based filtering against already loaded paths, which will fail to prevent duplicates if a file was previously loaded via a different case variant. Addressing these issues will ensure the deduplication logic is robust across all loading scenarios.
aab4e37 to
00b78d6
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to deduplicate GEMINI.md files based on their file identity (device and inode numbers), which resolves an issue with duplicate context loading on case-insensitive filesystems. The implementation adds a new deduplicatePathsByFileIdentity utility and integrates it into the memory loading process for both hierarchical and JIT contexts. The changes are well-tested.
My review has two main points:
- The pull request includes unrelated test changes for a different bug fix (
#19864), which should be moved to a separate PR to maintain focus. - There's a performance inefficiency in the new JIT memory loading logic where file stats are fetched multiple times for the same paths. I've suggested a refactoring to address this.
- Resolved conflicts in config.ts (kept main's cleaner === false logic) - Removed duplicate test block for google-gemini#19864 from config.test.ts - Optimized memoryDiscovery.ts to avoid redundant fs.stat calls
…thub.com/Nixxx19/gemini-cli into fix/duplicate-gemini-md-case-insensitive # Conflicts: # packages/core/src/config/config.ts
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces file identity-based deduplication to prevent loading the same GEMINI.md file multiple times on case-insensitive filesystems, which is a great improvement. However, I've found a critical runtime bug in ContextManager due to incorrect handling of the new deduplication function's return value, which was missed because of corresponding incorrect mocks in the tests. Additionally, there's a significant performance issue in the JIT memory loading logic that could cause slowdowns in projects with many memory files. Please see my detailed comments for suggestions on how to address these issues.
- Fix critical bug: destructure paths from deduplicatePathsByFileIdentity in contextManager - Optimize performance: cache file identities in ContextManager to avoid redundant fs.stat calls - Update MemoryLoadResult interface to include fileIdentities - Update test mocks to match new return type
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a crucial fix for deduplicating "GEMINI.md" files on case-insensitive filesystems by using file identity (device and inode). While the overall implementation is well-structured and includes thorough tests, a significant flaw was found in the concurrency limit implementation within the "loadJitSubdirectoryMemory" function. This critical issue could lead to resource exhaustion, "EMFILE" errors, and application crashes when handling a large number of files.
- Await each batch of fs.stat() calls inside the loop - Prevents EMFILE errors when fallback path is used (e.g., in tests) - Matches the pattern used in deduplicatePathsByFileIdentity
|
@scidomino pr is ready for review, thank you so much!! |
|
Also, make sure npm run preflight passes. |
- Remove debugMode conditional check before logger.debug call - logger.debug already wraps debugLogger which handles debug mode internally
…arameter - Add 5th parameter (alreadyLoadedIdentities) to loadJitSubdirectoryMemory expectation - Matches the updated function signature
fixed them, thank you!! |
Use getErrorMessage() instead of introducing a new lint exception for extracting error messages from Promise.allSettled rejections.
Removed all debugMode parameters and replaced conditional debug logging with direct debugLogger.debug() calls throughout memoryDiscovery.ts
Remove debugMode parameter from loadServerHierarchicalMemory calls to match updated function signature
|
checks failed |
@scidomino fixed, running fine locally, thank you!! |
Summary
Fixes duplicate loading of GEMINI.md files on case-insensitive filesystems (e.g., macOS/Darwin) by implementing file identity-based deduplication using device and inode numbers. This prevents the same physical file from being loaded multiple times when discovered through different case variants (e.g.,
GEMINI.mdandgemini.md), which wastes context window space.Details
On case-insensitive filesystems, the file discovery process can find the same physical file through multiple path variants (e.g.,
/app/GEMINI.mdand/app/gemini.md). Since these are different strings, the existing string-based deduplication (Set<string>) doesn't catch them, leading to duplicate context loading.Solution:
deduplicatePathsByFileIdentity()function that usesfs.lstat()to get device (dev) and inode (ino) numbersdev:inocombinationloadServerHierarchicalMemory()- Old/non-JIT pathloadJitSubdirectoryMemory()- JIT discover pathContextManager.loadMemoryContents()- JIT refresh pathImplementation notes:
numberandbiginttypes fordev/ino(Node.js version/platform dependent)Related Issues
Fixes #19904
How to Validate
Test on macOS (case-insensitive filesystem)
Create a test scenario:
Verify deduplication works:
Check the logs:
fileCountin memory stats matches actual unique filesRun unit tests
npm run test -- src/utils/memoryDiscovery.test.ts src/services/contextManager.test.tsExpected: All tests pass, including new deduplication tests:
should deduplicate files that point to the same inode (same physical file)should handle case where files have different inodes (different files)should handle files that cannot be stat'd (missing files)should handle multiple duplicates (3+ paths)should deduplicate JIT loaded filesshould deduplicate files by file identity in case-insensitive filesystemsManual verification
GEMINI.mdandgemini.md(same file, different case)geminiand check memory loading/memory listshows only one file pathPre-Merge Checklist