[CI] Fix 4-GPU test timeout by using 3 partitions #14287
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the
unit-test-backend-4-gputimeout issue introduced by #14222.PR #14222 moved
test_piecewise_cuda_graph.py(estimated 1200s) to theper-commit-4-gpusuite, which caused the LPT partition algorithm to create an unbalanced distribution with only 2 partitions:This PR increases the number of partitions from 2 to 3:
All partitions now fit within the 20-minute timeout.
Example failure
https://github.com/sgl-project/sglang/actions/runs/19845982270/job/56878342316?pr=14253
Test plan