Skip to content

fix(core): improve windows sandbox reliability and fix integration tests#24480

Merged
ehedlund merged 10 commits intomainfrom
fix-windows-sandbox-tests
Apr 3, 2026
Merged

fix(core): improve windows sandbox reliability and fix integration tests#24480
ehedlund merged 10 commits intomainfrom
fix-windows-sandbox-tests

Conversation

@ehedlund
Copy link
Copy Markdown
Contributor

@ehedlund ehedlund commented Apr 1, 2026

Summary

This PR re-enables sandbox tests for Windows and fixes critical bugs in the Windows sandboxing implementation. It also stabilizes the test suite for CI by utilizing more reliable Windows-native tools and accounting for OS-specific networking and PTY limitations.

Details

1. Sandbox Helper Fixes (GeminiSandbox.cs)

  • Restored Execution Sync: Fixed a bug where the helper returned immediately with exit code 259 (STILL_ACTIVE) because WaitForSingleObject was missing.
  • Improved Error Reporting: Added Win32 error code reporting for CreateProcessAsUser, AssignProcessToJobObject, and SetInformationJobObject to simplify CI failure diagnosis.
  • Process Synchronization: Added CREATE_SUSPENDED and ResumeThread to guarantee the child process is correctly assigned to the Job Object before it starts.
  • Resource Cleanup: Added TerminateProcess and exhaustive handle closing in a finally block to prevent process orphans or leaks.

2. Sandbox Manager Logic (WindowsSandboxManager.ts)

  • Explicit ACL Grants: Updated icacls logic to explicitly grant Modify permissions to the Low Integrity SID. This ensures that files created within allowed directories inherit the correct labels for sandbox access.
  • Path Handling: Switched to resolveToRealPath and isSubpath utilities to handle symlinks and junction points consistently.
  • Manifest Lifecycle: Moved temporary manifest cleanup into a cleanup callback, ensuring files are removed only after the sandboxed command has fully completed.

3. Integration Test Stabilization (sandboxManager.integration.test.ts)

  • Native Tooling:
    • curl: Switched to native curl.exe to avoid PowerShell's Invoke-WebRequest wrapper and alias issues.
    • touch: Switched to powershell.exe New-Item to avoid cmd.exe redirection operators (>) which are unreliable in sandboxed contexts.
  • OS-Specific Skips:
    • Network Blocking: Added it.skipIf(Platform.isWindows) for loopback blocking tests, as Windows Job Object rate limits exempt 127.0.0.1.
    • Interactive PTYs: Added it.skipIf(Platform.isWindows) for node-pty tests. The Windows sandbox uses pipe interception, which is incompatible with ConPTY handle inheritance.

Related Issues

Related to https://github.com/google-gemini/maintainers-gemini-cli/issues/1533.

How to Validate

  1. Switch to a Windows environment (or observe Windows CI jobs).
  2. Run npm test -w @google/gemini-cli-core -- src/services/sandboxManager.integration.test.ts.
  3. Verify 11 tests pass and 2 tests skip cleanly.
  4. Confirm no regressions on macOS or Linux by running the same command.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

@ehedlund ehedlund requested a review from a team as a code owner April 1, 2026 22:53
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the reliability and security of the Windows sandbox implementation. It addresses critical issues related to process isolation, filesystem permission handling, and test environment stability, ensuring that the sandbox behaves correctly in restricted environments and that integration tests are more robust across platforms.

Highlights

  • Windows Sandbox Reliability: Improved process creation by adding the CREATE_BREAKAWAY_FROM_JOB flag and implementing explicit process termination if job assignment fails.
  • Filesystem Permission Management: Refactored path handling to use a 'writableRoots' pattern, ensuring consistent Low Integrity access and resolving issues with creating new files in sandbox environments.
  • Integration Test Robustness: Updated Windows test utilities to use PowerShell for file operations, unified network testing with curl, and increased timeouts to ensure stability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-cli
Copy link
Copy Markdown
Contributor

gemini-cli Bot commented Apr 1, 2026

Hi @ehedlund, thank you so much for your contribution to Gemini CLI! We really appreciate the time and effort you've put into this.

We're making some updates to our contribution process to improve how we track and review changes. Please take a moment to review our recent discussion post: Improving Our Contribution Process & Introducing New Guidelines.

Key Update: Starting January 26, 2026, the Gemini CLI project will require all pull requests to be associated with an existing issue. Any pull requests not linked to an issue by that date will be automatically closed.

Thank you for your understanding and for being a part of our community!

@google-cla
Copy link
Copy Markdown

google-cla Bot commented Apr 1, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the Windows sandbox by improving error reporting in the C# component, refining process creation flags, and updating path validation logic to support writing to new files within allowed roots. However, several critical issues were identified: resource leaks in the C# code where kernel handles are not closed during error paths, potential path traversal vulnerabilities caused by a regression in path resolution logic, and a violation of the principle of least privilege by granting 'Full Control' instead of 'Modify' permissions to the low-integrity SID.

Comment thread packages/core/src/sandbox/windows/GeminiSandbox.cs
Comment thread packages/core/src/sandbox/windows/WindowsSandboxManager.ts Outdated
Comment thread packages/core/src/sandbox/windows/WindowsSandboxManager.ts Outdated
Comment thread packages/core/src/sandbox/windows/WindowsSandboxManager.ts Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 1, 2026

Size Change: +492 B (0%)

Total Size: 34 MB

Filename Size Change
./bundle/chunk-P2SQW7I7.js 0 B -14.8 MB (removed) 🏆
./bundle/chunk-PWZO3PH4.js 0 B -3.15 MB (removed) 🏆
./bundle/core-W7DS4R63.js 0 B -45.1 kB (removed) 🏆
./bundle/devtoolsService-VR7WH6N7.js 0 B -28.4 kB (removed) 🏆
./bundle/interactiveCli-PWYY3XZG.js 0 B -1.63 MB (removed) 🏆
./bundle/oauth2-provider-AGQWXG4J.js 0 B -9.16 kB (removed) 🏆
./bundle/chunk-DNIZACM2.js 3.15 MB +3.15 MB (new file) 🆕
./bundle/chunk-NWAZ7MY6.js 14.8 MB +14.8 MB (new file) 🆕
./bundle/core-JHUZCJYC.js 45.1 kB +45.1 kB (new file) 🆕
./bundle/devtoolsService-THLHDT4D.js 28.4 kB +28.4 kB (new file) 🆕
./bundle/interactiveCli-EEXSGHVF.js 1.63 MB +1.63 MB (new file) 🆕
./bundle/oauth2-provider-23X4KBRE.js 9.16 kB +9.16 kB (new file) 🆕
ℹ️ View Unchanged
Filename Size
./bundle/bundled/third_party/index.js 8 MB
./bundle/chunk-34MYV7JD.js 2.45 kB
./bundle/chunk-5AUYMPVF.js 858 B
./bundle/chunk-5PS3AYFU.js 1.18 kB
./bundle/chunk-664ZODQF.js 124 kB
./bundle/chunk-DAHVX5MI.js 206 kB
./bundle/chunk-IUUIT4SU.js 56.5 kB
./bundle/chunk-RJTRUG2J.js 39.8 kB
./bundle/chunk-SU4IS2NT.js 1.96 MB
./bundle/devtools-36NN55EP.js 696 kB
./bundle/dist-T73EYRDX.js 356 B
./bundle/events-XB7DADIJ.js 418 B
./bundle/gemini.js 550 kB
./bundle/getMachineId-bsd-TXG52NKR.js 1.55 kB
./bundle/getMachineId-darwin-7OE4DDZ6.js 1.55 kB
./bundle/getMachineId-linux-SHIFKOOX.js 1.34 kB
./bundle/getMachineId-unsupported-5U5DOEYY.js 1.06 kB
./bundle/getMachineId-win-6KLLGOI4.js 1.72 kB
./bundle/memoryDiscovery-UJWZC5BA.js 980 B
./bundle/multipart-parser-KPBZEGQU.js 11.7 kB
./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js 222 kB
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js 229 kB
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js 13.4 kB
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js 132 B
./bundle/sandbox-macos-permissive-open.sb 890 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB
./bundle/sandbox-macos-strict-open.sb 4.82 kB
./bundle/sandbox-macos-strict-proxied.sb 5.02 kB
./bundle/src-QVCVGIUX.js 47 kB
./bundle/tree-sitter-7U6MW5PS.js 274 kB
./bundle/tree-sitter-bash-34ZGLXVX.js 1.84 MB

compressed-size-action

@ehedlund ehedlund force-pushed the fix-windows-sandbox-tests branch 2 times, most recently from 8df91de to 3b187c1 Compare April 1, 2026 22:58
@gemini-cli gemini-cli Bot added the status/need-issue Pull requests that need to have an associated issue. label Apr 1, 2026
@ehedlund ehedlund requested review from a team as code owners April 2, 2026 18:10
@github-actions

This comment was marked as outdated.

@ehedlund ehedlund force-pushed the fix-windows-sandbox-tests branch 2 times, most recently from 6c4aab7 to 64df5c1 Compare April 2, 2026 18:36
@ehedlund ehedlund removed request for a team April 2, 2026 18:47
@ehedlund ehedlund force-pushed the fix-windows-sandbox-tests branch from e4e2a25 to eb9aad7 Compare April 3, 2026 16:36
@ehedlund ehedlund force-pushed the fix-windows-sandbox-tests branch from eb9aad7 to c2ffff9 Compare April 3, 2026 16:40
Copy link
Copy Markdown
Collaborator

@galz10 galz10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Findings

Scope: Pull Request #24480

The PR significantly improves the reliability and security of the Windows sandbox by addressing native resource management, standardized path handling, and strengthened filesystem permissions. It also re-enables and fixes several unit and integration tests for Windows that were previously skipped or failing.

While the overall direction is excellent and fixes many fundamental issues, there are several concerns regarding resource leaks in the TypeScript code and a race condition in the native C# helper.

Metadata Review

  • Title: fix(core): improve windows sandbox reliability and fix integration tests
    • Adheres to Conventional Commits: Yes (fix(core)).
    • Clear and descriptive: Yes.
  • Body:
    • Provides a comprehensive summary of improvements.
    • Links to related work for CI coverage.
    • Inconsistency noted: The body claims to have implemented "atomic (suspended) process creation," but the code in GeminiSandbox.cs does not appear to use the CREATE_SUSPENDED flag or the necessary ResumeThread call.

Concerns (Action Required)

1. GeminiSandbox.cs: Race Condition in Process Creation

The PR summary mentions "atomic (suspended) process creation," but the diff for GeminiSandbox.cs shows that the process is created without the CREATE_SUSPENDED flag.

  • File: packages/core/src/sandbox/windows/GeminiSandbox.cs
  • Issue: The process is spawned with CREATE_BREAKAWAY_FROM_JOB (0x01000000) but not CREATE_SUSPENDED (0x4). This means there is a race condition where the process can start executing and potentially finish or escape certain job limits before AssignProcessToJobObject is successfully called on line 340.
  • Suggested Improvement: Follow the standard pattern for Job Object assignment:
    1. Add CREATE_SUSPENDED to creationFlags.
    2. Call AssignProcessToJobObject.
    3. Call ResumeThread(pi.hThread).
      This ensures the process is within the sandbox boundaries from its first instruction.

2. Resource Leak: Missing cleanup() Calls in TypeScript

The PR adds a cleanup function to the SandboxedCommand interface, but several key callers of prepareCommand have not been updated to execute it.

  • Files:
    • packages/core/src/services/sandboxedFileSystemService.ts
    • packages/core/src/utils/shell-utils.ts (spawnAsync)
    • packages/core/src/tools/grep.ts
    • packages/core/src/tools/tool-registry.ts
  • Issue: These services/tools call sandboxManager.prepareCommand but do not execute the returned cleanup() function. On Windows, WindowsSandboxManager creates a temporary directory and manifest file in os.tmpdir() for every command. Without calling cleanup(), these temporary files will leak until the OS or a manual cleanup occurs.
  • Suggested Improvement: Ensure all callers of prepareCommand execute cleanup() in a finally block after the spawned process terminates.

3. WindowsSandboxManager.ts: Case-Sensitivity Issue with isWithinRoot

  • File: packages/core/src/sandbox/windows/WindowsSandboxManager.ts
  • Context: Line 323 uses isWithinRoot(resolved, root) to check if a new file path resides within an allowed root.
  • Issue: isWithinRoot (from fileUtils.ts) uses path.resolve and startsWith, which is case-sensitive. On Windows, paths like C:\Workspace and c:\workspace represent the same location. If the casing of the drive letter or folder names differs between the request and the workspace path, isWithinRoot will return false, resulting in an unexpected "Access Denied" error for valid paths.
  • Suggested Improvement: Use isSubpath from packages/core/src/utils/paths.ts instead, as it correctly handles case-insensitivity on Windows by using path.win32.relative.

Nits (Suggestions)

  • GeminiSandbox.cs:

    • Diagnostic Output: The error message for AssignProcessToJobObject failure (Line 341) could include the command line to make debugging easier in complex sandboxed environments.
    • Magic Numbers: The JobObject information class indices (e.g., 9 for JobObjectExtendedLimitInformation, 32 for JobObjectNetRateControlInformation) could be moved to named constants at the top of the class for better readability.
  • WindowsSandboxManager.ts:

    • Constant Placement: LOW_INTEGRITY_SID is defined twice (Lines 43 and 482). It should be defined once as a static constant of the WindowsSandboxManager class.
  • WindowsSandboxManager.test.ts:

    • Mock Brittleness: The mock for fs.existsSync to skip csc.exe compilation (Line 42) is effective but relies on the exact internal constant HELPER_EXE. This is acceptable for unit tests but good to keep in mind if the internal structure changes.

@ehedlund
Copy link
Copy Markdown
Contributor Author

ehedlund commented Apr 3, 2026

  1. Resource Leak: Missing cleanup() Calls in TypeScript
    The PR adds a cleanup function to the SandboxedCommand interface, but several key callers of prepareCommand have not been updated to execute it.

Will address this in a separate PR since this is not specific to Windows. All 3 OSes implement cleanup(), and ShellExecutionService is the only caller using it.

All other comments have been addressed!

@ehedlund ehedlund added this pull request to the merge queue Apr 3, 2026
Merged via the queue into main with commit 370c45d Apr 3, 2026
29 checks passed
@ehedlund ehedlund deleted the fix-windows-sandbox-tests branch April 3, 2026 21:25
afanty2021 pushed a commit to afanty2021/gemini-cli that referenced this pull request Apr 4, 2026
warrenzhu25 pushed a commit to warrenzhu25/gemini-cli that referenced this pull request Apr 9, 2026
HaleTom pushed a commit to HaleTom/gemini-cli that referenced this pull request Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/need-issue Pull requests that need to have an associated issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants