Stabilize approval matrix write-file command#14968
Merged
aibrahim-oai merged 1 commit intomainfrom Mar 17, 2026
Merged
Conversation
Co-authored-by: Codex <noreply@openai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is flaky
The approval-matrix
WriteFilescenario is flaky. It sometimes fails in CI even though the approval logic is unchanged, because the test delegates the file write and readback to shell parsing instead of deterministic file I/O.Why it was flaky
The test generated a command shaped like
printf ... > file && cat file. That means the scenario depended on shell quoting, redirection, newline handling, and encoding behavior in addition to the approval system it was actually trying to validate. If the shell interpreted the payload differently, the test would report an approval failure even though the product logic was fine.That also made failures hard to diagnose, because the test did not log the exact generated command or the parsed result payload.
How this PR fixes it
This PR replaces the shell-redirection path with a deterministic
python3 -cscript that writes the file withPath.write_text(..., encoding='utf-8')and then reads it back with the same UTF-8 path. It also logs the generated command and the resulting exit code/stdout for the approval scenario so any future failure is directly attributable.Why this fix fixes the flakiness
The scenario no longer depends on shell parsing and redirection semantics. The file contents are produced and read through explicit UTF-8 file I/O, so the approval test is measuring approval behavior instead of shell behavior. The added diagnostics mean a future failure will show the exact command/result pair instead of looking like a generic intermittent mismatch.