Skip to content

Comments

Add /rerun-stage slash command to rerun specific PR test stages#14262

Merged
Kangyan-Zhou merged 9 commits intomainfrom
feat/add-rerun-stage-slash-command
Dec 2, 2025
Merged

Add /rerun-stage slash command to rerun specific PR test stages#14262
Kangyan-Zhou merged 9 commits intomainfrom
feat/add-rerun-stage-slash-command

Conversation

@alisonshao
Copy link
Collaborator

@alisonshao alisonshao commented Dec 2, 2025

Adds a new slash command /rerun-stage <stage-name> that allows developers to run individual test stages immediately, skipping dependencies. This is perfect for quick iterations when fixing specific test failures.

Usage

/rerun-stage unit-test-backend-4-gpu

This will:

  • ✅ Run only the 4-gpu test stage
  • ✅ Skip waiting for 1-gpu and 2-gpu tests
  • ✅ Run on your PR branch immediately
  • ✅ Perfect for quick iteration cycles

How It Works

Uses workflow_dispatch to trigger a new workflow run with a target_stage parameter. The specified stage's job condition checks if it's the target and runs immediately, bypassing normal dependencies.

Currently Supported Stages

  • unit-test-backend-4-gpu

More stages can be easily added by updating their job conditions in pr-test.yml.

Benefits

Before: Fix 4-gpu bug → push → wait 30min for 1-gpu and 2-gpu → finally test 4-gpu
After: Fix 4-gpu bug → push → /rerun-stage unit-test-backend-4-gpu → test immediately! 🚀

Once you've validated the fix works, run the full CI to ensure everything passes.

Implementation

  • Added target_stage input to pr-test.yml workflow_dispatch
  • Updated unit-test-backend-4-gpu condition to run when it's the target
  • Modified slash command handler to trigger workflow_dispatch instead of rerunning jobs
  • Same permissions as /rerun-failed-ci

Adds a new slash command '/rerun-stage <stage-name>' that allows developers
to rerun individual stages/jobs in the PR Test workflow. This is useful when
fixing test failures, as it avoids having to rerun the entire test suite.

Usage:
  /rerun-stage unit-test-backend-1-gpu
  /rerun-stage accuracy-test-1-gpu
  /rerun-stage quantization-test

Features:
- Only reruns the specified stage if it failed/skipped
- Provides helpful error messages if stage name is wrong
- Lists common stage names when stage not found
- Same permissions as /rerun-failed-ci

Changes:
- Updated slash-command-handler.yml to recognize /rerun-stage
- Added handle_rerun_stage() function in slash_command_handler.py
- Added 'can_rerun_stage' permission to all users with rerun access
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@alisonshao

This comment was marked as outdated.

@alisonshao
Copy link
Collaborator Author

Note: The /rerun-stage command won't work on this PR yet because issue_comment workflows always run from the default branch (main) for security reasons.

Once this PR is merged, the command will be available for all PRs. You can test it on any PR after merge by commenting:

/rerun-stage unit-test-backend-4-gpu

This is GitHub's standard behavior to prevent malicious PRs from executing arbitrary workflow code.

Changed approach from rerunning failed jobs to triggering a new workflow
run with target_stage parameter. This allows running a specific stage
immediately without waiting for its dependencies to pass.

Changes:
- Added target_stage input to pr-test.yml workflow_dispatch
- Updated unit-test-backend-4-gpu condition to run when target_stage matches
- Modified handle_rerun_stage() to use workflow_dispatch instead of rerun
- Stage runs independently on PR branch, skipping 1-gpu and 2-gpu dependencies

Usage: /rerun-stage unit-test-backend-4-gpu
Result: Runs only 4-gpu test immediately, perfect for quick iterations
@alisonshao
Copy link
Collaborator Author

Updated implementation to use workflow_dispatch with target_stage parameter. The specified stage now runs immediately without waiting for dependencies.

Example: /rerun-stage unit-test-backend-4-gpu triggers only the 4-gpu test, skipping 1-gpu and 2-gpu.

Supported stages now:
- unit-test-backend-2-gpu
- unit-test-backend-4-gpu
- unit-test-backend-8-gpu-h200
- unit-test-backend-8-gpu-h20

All stages can now be triggered independently without waiting for dependencies.
@alisonshao
Copy link
Collaborator Author

Added support for multiple GPU stages. Now supports:

  • unit-test-backend-2-gpu
  • unit-test-backend-4-gpu
  • unit-test-backend-8-gpu-h200
  • unit-test-backend-8-gpu-h20

All stages run independently without waiting for dependencies when triggered via /rerun-stage.

@alisonshao
Copy link
Collaborator Author

Working example workflow run:

https://github.com/sgl-project/sglang/actions/runs/19845385131

This run was triggered with: gh workflow run "PR Test" --ref feat/add-rerun-stage-slash-command -f version=release -f target_stage=unit-test-backend-4-gpu

Shows the unit-test-backend-4-gpu stage running independently without waiting for 1-gpu and 2-gpu dependencies.

@alisonshao
Copy link
Collaborator Author

/tag-and-rerun-ci

@github-actions github-actions bot added the run-ci label Dec 2, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we apply the change to all the stages as well>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change is now applied to all stages

alisonshao and others added 4 commits December 1, 2025 19:44
Updates the feature to allow triggering any test stage independently, not just backend GPU tests.

Added support for all stages:
- stage-a-test-1
- multimodal-gen-test-1-gpu, multimodal-gen-test-2-gpu
- quantization-test
- unit-test-backend-1-gpu, unit-test-backend-2-gpu, unit-test-backend-4-gpu
- unit-test-backend-8-gpu-h200, unit-test-backend-8-gpu-h20
- performance-test-1-gpu-part-1, performance-test-1-gpu-part-2, performance-test-1-gpu-part-3
- performance-test-2-gpu
- accuracy-test-1-gpu, accuracy-test-2-gpu
- unit-test-deepep-4-gpu, unit-test-deepep-8-gpu
- unit-test-backend-4-gpu-b200, unit-test-backend-4-gpu-gb200
@Kangyan-Zhou Kangyan-Zhou merged commit 084b06e into main Dec 2, 2025
63 of 75 checks passed
@Kangyan-Zhou Kangyan-Zhou deleted the feat/add-rerun-stage-slash-command branch December 2, 2025 22:23
yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants