Skip to content

Stabilize Windows cmd-based shell test harnesses#14958

Merged
aibrahim-oai merged 3 commits intomainfrom
codex/windows-apply-patch-harness
Mar 17, 2026
Merged

Stabilize Windows cmd-based shell test harnesses#14958
aibrahim-oai merged 3 commits intomainfrom
codex/windows-apply-patch-harness

Conversation

@aibrahim-oai
Copy link
Copy Markdown
Collaborator

What is flaky

The Windows shell-driven integration tests in codex-rs/core were intermittently unstable, especially:

  • apply_patch_cli_can_use_shell_command_output_as_patch_input
  • websocket_test_codex_shell_chain
  • websocket_v2_test_codex_shell_chain

Why it was flaky

These tests were exercising real shell-tool flows through whichever shell Codex selected on Windows, and the apply_patch test also nested a PowerShell read inside cmd /c.

There were multiple independent sources of nondeterminism in that setup:

  • The test harness depended on the model-selected Windows shell instead of pinning the shell it actually meant to exercise.
  • cmd.exe /c powershell.exe -Command "..." is quoting-sensitive; on CI that could leave the read command wrapped as a literal string instead of executing it.
  • Even after getting the quoting right, PowerShell could emit CLIXML progress records like module-initialization output onto stdout.
  • The apply_patch test was building a patch directly from shell stdout, so any quoting artifact or progress noise corrupted the patch input.

So the failures were driven by shell startup and output-shape variance, not by the apply_patch or websocket logic themselves.

How this PR fixes it

  • Add a test-only user_shell_override path so Windows integration tests can pin cmd.exe explicitly.
  • Use that override in the websocket shell-chain tests and in the apply_patch harness.
  • Change the nested Windows file read in apply_patch_cli_can_use_shell_command_output_as_patch_input to a UTF-8 PowerShell -EncodedCommand script.
  • Run that nested PowerShell process with -NonInteractive, set $ProgressPreference = 'SilentlyContinue', and read the file with [System.IO.File]::ReadAllText(...).

Why this fix fixes the flakiness

The outer harness now runs under a deterministic shell, and the inner PowerShell read no longer depends on fragile cmd quoting or on progress output staying quiet by accident. The shell tool returns only the file contents, so patch construction and websocket assertions depend on stable test inputs instead of on runner-specific shell behavior.

Add a test-only shell override so Windows integration tests can pin cmd.exe, then make the nested PowerShell read in apply_patch_cli deterministic with an encoded non-interactive UTF-8 ReadAllText path that suppresses CLIXML progress noise.

Co-authored-by: Codex <noreply@openai.com>
@aibrahim-oai aibrahim-oai enabled auto-merge (squash) March 17, 2026 19:58
@aibrahim-oai aibrahim-oai merged commit b023886 into main Mar 17, 2026
33 checks passed
@aibrahim-oai aibrahim-oai deleted the codex/windows-apply-patch-harness branch March 17, 2026 20:21
@github-actions github-actions bot locked and limited conversation to collaborators Mar 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants