[Piecewise] Use same global graph memory pool as the main cuda graph … #14044

byjiang1996 · 2025-11-27T06:11:57Z

…runner's memory pool

Motivation

As titled

Accuracy Tests and Benchmarking and Profiling

from sglang.srt.entrypoints.http_server import launch_server
from sglang.srt.server_args import prepare_server_args

if __name__ == "__main__":
    # Simulate CLI arguments (excluding the script name)
    args = [
        "--model-path",
        "Qwen/Qwen3-8B",
        "--attention-backend",
        "fa3",
        # "--disable-cuda-graph",
        "--enable-piecewise-cuda-graph",
        "--tp",
        "2"
    ]
    server_args = prepare_server_args(args)
    launch_server(server_args)

When both decode cuda graph and piecewise cuda graph is enabled

python3 benchmark/gsm8k/bench_sglang.py 

Accuracy: 0.955
Invalid: 0.000
Latency: 5.809 s
Output throughput: 4113.320 token/s

When only piecewise cuda graph is enabled but decode cuda graph is disabled

python3 benchmark/gsm8k/bench_sglang.py 

Accuracy: 0.955
Invalid: 0.000
Latency: 15.619 s
Output throughput: 1538.560 token/s

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

…runner's memory pool

gemini-code-assist · 2025-11-27T06:12:01Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

This reverts commit 6f48bbd.

ByronHsu · 2025-12-01T06:17:12Z

/tag-and-rerun-ci

…a graph" (#14044) This reverts commit 0f8e539. The original PR introduced two issues: 1. A bug in set_graph_pool_id where `if _graph_pool_id is not None` prevented the variable from ever being set (since it's initialized to None) 2. Sharing memory pool between piecewise and main cuda graph runners caused uninitialized memory issues on H100, resulting in TestQwen3NextPiecewiseCudaGraph failing with accuracy=0.0 This revert restores separate memory pools for piecewise cuda graph runner.

…ph (#14278)

sgl-project#14044) Co-authored-by: Stefan He <[email protected]> Co-authored-by: BBuf <[email protected]>

…se CUDA graph (sgl-project#14278)

sgl-project#14044) Co-authored-by: Stefan He <[email protected]> Co-authored-by: BBuf <[email protected]>

…se CUDA graph (sgl-project#14278)

[Piecewise] Use same global graph memory pool as the main cuda graph …

3762fb1

…runner's memory pool

byjiang1996 requested review from ch-wan, hebiao064, merrymercy and yizhang2077 as code owners November 27, 2025 06:11

hebiao064 and others added 3 commits November 28, 2025 20:53

Merge branch 'main' into piecewisemempool

1a8a1f5

add moe_wna16_marlin_gemm_v2

6f48bbd

Revert "add moe_wna16_marlin_gemm_v2"

eeea208

This reverts commit 6f48bbd.

hebiao064 added the run-ci label Nov 29, 2025

Merge branch 'main' into piecewisemempool

a0c71bf

merrymercy force-pushed the main branch from eeea208 to 0fe74af Compare November 29, 2025 07:06

Merge branch 'main' into piecewisemempool

af4c5a9

ByronHsu merged commit 0f8e539 into sgl-project:main Dec 1, 2025
19 of 30 checks passed

alisonshao mentioned this pull request Dec 2, 2025

Revert PR #14044: Restore separate memory pool for piecewise CUDA graph #14278

Merged

3 tasks

Fridge003 pushed a commit that referenced this pull request Dec 2, 2025

Revert PR #14044: Restore separate memory pool for piecewise CUDA gra…

0141ca3

…ph (#14278)

harvenstar pushed a commit to harvenstar/sglang that referenced this pull request Dec 4, 2025

[Piecewise] Use same global graph memory pool as the main cuda graph … (

780450b

sgl-project#14044) Co-authored-by: Stefan He <[email protected]> Co-authored-by: BBuf <[email protected]>

harvenstar pushed a commit to harvenstar/sglang that referenced this pull request Dec 4, 2025

Revert PR sgl-project#14044: Restore separate memory pool for piecewi…

d8005a7

…se CUDA graph (sgl-project#14278)

yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025

Revert PR sgl-project#14044: Restore separate memory pool for piecewi…

98d1b19

…se CUDA graph (sgl-project#14278)

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

[Piecewise] Use same global graph memory pool as the main cuda graph … (

41c54ab

sgl-project#14044) Co-authored-by: Stefan He <[email protected]> Co-authored-by: BBuf <[email protected]>

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

Revert PR sgl-project#14044: Restore separate memory pool for piecewi…

cf1da38

…se CUDA graph (sgl-project#14278)

yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025

Revert PR sgl-project#14044: Restore separate memory pool for piecewi…

919b305

…se CUDA graph (sgl-project#14278)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Piecewise] Use same global graph memory pool as the main cuda graph … #14044

[Piecewise] Use same global graph memory pool as the main cuda graph … #14044

Uh oh!

byjiang1996 commented Nov 27, 2025

Uh oh!

gemini-code-assist bot commented Nov 27, 2025

Uh oh!

ByronHsu commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Piecewise] Use same global graph memory pool as the main cuda graph … #14044

[Piecewise] Use same global graph memory pool as the main cuda graph … #14044

Uh oh!

Conversation

byjiang1996 commented Nov 27, 2025

Motivation

Accuracy Tests and Benchmarking and Profiling

When both decode cuda graph and piecewise cuda graph is enabled

When only piecewise cuda graph is enabled but decode cuda graph is disabled

Checklist

Uh oh!

gemini-code-assist bot commented Nov 27, 2025

Uh oh!

ByronHsu commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants