Stabilize Windows cmd-based shell test harnesses#14958
Merged
aibrahim-oai merged 3 commits intomainfrom Mar 17, 2026
Merged
Conversation
Add a test-only shell override so Windows integration tests can pin cmd.exe, then make the nested PowerShell read in apply_patch_cli deterministic with an encoded non-interactive UTF-8 ReadAllText path that suppresses CLIXML progress noise. Co-authored-by: Codex <noreply@openai.com>
dylan-hurd-oai
approved these changes
Mar 17, 2026
Co-authored-by: Codex <noreply@openai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is flaky
The Windows shell-driven integration tests in
codex-rs/corewere intermittently unstable, especially:apply_patch_cli_can_use_shell_command_output_as_patch_inputwebsocket_test_codex_shell_chainwebsocket_v2_test_codex_shell_chainWhy it was flaky
These tests were exercising real shell-tool flows through whichever shell Codex selected on Windows, and the
apply_patchtest also nested a PowerShell read insidecmd /c.There were multiple independent sources of nondeterminism in that setup:
cmd.exe /c powershell.exe -Command "..."is quoting-sensitive; on CI that could leave the read command wrapped as a literal string instead of executing it.apply_patchtest was building a patch directly from shell stdout, so any quoting artifact or progress noise corrupted the patch input.So the failures were driven by shell startup and output-shape variance, not by the
apply_patchor websocket logic themselves.How this PR fixes it
user_shell_overridepath so Windows integration tests can pincmd.exeexplicitly.apply_patchharness.apply_patch_cli_can_use_shell_command_output_as_patch_inputto a UTF-8 PowerShell-EncodedCommandscript.-NonInteractive, set$ProgressPreference = 'SilentlyContinue', and read the file with[System.IO.File]::ReadAllText(...).Why this fix fixes the flakiness
The outer harness now runs under a deterministic shell, and the inner PowerShell read no longer depends on fragile
cmdquoting or on progress output staying quiet by accident. The shell tool returns only the file contents, so patch construction and websocket assertions depend on stable test inputs instead of on runner-specific shell behavior.