Add /rerun-stage slash command to rerun specific PR test stages#14262
Add /rerun-stage slash command to rerun specific PR test stages#14262Kangyan-Zhou merged 9 commits intomainfrom
Conversation
Adds a new slash command '/rerun-stage <stage-name>' that allows developers to rerun individual stages/jobs in the PR Test workflow. This is useful when fixing test failures, as it avoids having to rerun the entire test suite. Usage: /rerun-stage unit-test-backend-1-gpu /rerun-stage accuracy-test-1-gpu /rerun-stage quantization-test Features: - Only reruns the specified stage if it failed/skipped - Provides helpful error messages if stage name is wrong - Lists common stage names when stage not found - Same permissions as /rerun-failed-ci Changes: - Updated slash-command-handler.yml to recognize /rerun-stage - Added handle_rerun_stage() function in slash_command_handler.py - Added 'can_rerun_stage' permission to all users with rerun access
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
This comment was marked as outdated.
This comment was marked as outdated.
|
Note: The Once this PR is merged, the command will be available for all PRs. You can test it on any PR after merge by commenting: This is GitHub's standard behavior to prevent malicious PRs from executing arbitrary workflow code. |
Changed approach from rerunning failed jobs to triggering a new workflow run with target_stage parameter. This allows running a specific stage immediately without waiting for its dependencies to pass. Changes: - Added target_stage input to pr-test.yml workflow_dispatch - Updated unit-test-backend-4-gpu condition to run when target_stage matches - Modified handle_rerun_stage() to use workflow_dispatch instead of rerun - Stage runs independently on PR branch, skipping 1-gpu and 2-gpu dependencies Usage: /rerun-stage unit-test-backend-4-gpu Result: Runs only 4-gpu test immediately, perfect for quick iterations
|
Updated implementation to use Example: |
Supported stages now: - unit-test-backend-2-gpu - unit-test-backend-4-gpu - unit-test-backend-8-gpu-h200 - unit-test-backend-8-gpu-h20 All stages can now be triggered independently without waiting for dependencies.
|
Added support for multiple GPU stages. Now supports:
All stages run independently without waiting for dependencies when triggered via |
|
Working example workflow run: https://github.com/sgl-project/sglang/actions/runs/19845385131 This run was triggered with: gh workflow run "PR Test" --ref feat/add-rerun-stage-slash-command -f version=release -f target_stage=unit-test-backend-4-gpu Shows the unit-test-backend-4-gpu stage running independently without waiting for 1-gpu and 2-gpu dependencies. |
|
/tag-and-rerun-ci |
There was a problem hiding this comment.
Can we apply the change to all the stages as well>
There was a problem hiding this comment.
change is now applied to all stages
Updates the feature to allow triggering any test stage independently, not just backend GPU tests. Added support for all stages: - stage-a-test-1 - multimodal-gen-test-1-gpu, multimodal-gen-test-2-gpu - quantization-test - unit-test-backend-1-gpu, unit-test-backend-2-gpu, unit-test-backend-4-gpu - unit-test-backend-8-gpu-h200, unit-test-backend-8-gpu-h20 - performance-test-1-gpu-part-1, performance-test-1-gpu-part-2, performance-test-1-gpu-part-3 - performance-test-2-gpu - accuracy-test-1-gpu, accuracy-test-2-gpu - unit-test-deepep-4-gpu, unit-test-deepep-8-gpu - unit-test-backend-4-gpu-b200, unit-test-backend-4-gpu-gb200
Adds a new slash command
/rerun-stage <stage-name>that allows developers to run individual test stages immediately, skipping dependencies. This is perfect for quick iterations when fixing specific test failures.Usage
This will:
How It Works
Uses
workflow_dispatchto trigger a new workflow run with atarget_stageparameter. The specified stage's job condition checks if it's the target and runs immediately, bypassing normal dependencies.Currently Supported Stages
unit-test-backend-4-gpuMore stages can be easily added by updating their job conditions in pr-test.yml.
Benefits
Before: Fix 4-gpu bug → push → wait 30min for 1-gpu and 2-gpu → finally test 4-gpu
After: Fix 4-gpu bug → push →
/rerun-stage unit-test-backend-4-gpu→ test immediately! 🚀Once you've validated the fix works, run the full CI to ensure everything passes.
Implementation
target_stageinput to pr-test.ymlworkflow_dispatchunit-test-backend-4-gpucondition to run when it's the targetworkflow_dispatchinstead of rerunning jobs/rerun-failed-ci