Skip to content

Conversation

@byjiang1996
Copy link
Collaborator

…runner's memory pool

Motivation

As titled

Accuracy Tests and Benchmarking and Profiling

from sglang.srt.entrypoints.http_server import launch_server
from sglang.srt.server_args import prepare_server_args

if __name__ == "__main__":
    # Simulate CLI arguments (excluding the script name)
    args = [
        "--model-path",
        "Qwen/Qwen3-8B",
        "--attention-backend",
        "fa3",
        # "--disable-cuda-graph",
        "--enable-piecewise-cuda-graph",
        "--tp",
        "2"
    ]
    server_args = prepare_server_args(args)
    launch_server(server_args)

When both decode cuda graph and piecewise cuda graph is enabled

python3 benchmark/gsm8k/bench_sglang.py 

Accuracy: 0.955
Invalid: 0.000
Latency: 5.809 s
Output throughput: 4113.320 token/s

When only piecewise cuda graph is enabled but decode cuda graph is disabled

python3 benchmark/gsm8k/bench_sglang.py 

Accuracy: 0.955
Invalid: 0.000
Latency: 15.619 s
Output throughput: 1538.560 token/s

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@ByronHsu
Copy link
Collaborator

ByronHsu commented Dec 1, 2025

/tag-and-rerun-ci

@ByronHsu ByronHsu merged commit 0f8e539 into sgl-project:main Dec 1, 2025
19 of 30 checks passed
alisonshao added a commit that referenced this pull request Dec 2, 2025
…a graph" (#14044)

This reverts commit 0f8e539.

The original PR introduced two issues:
1. A bug in set_graph_pool_id where `if _graph_pool_id is not None` prevented
   the variable from ever being set (since it's initialized to None)
2. Sharing memory pool between piecewise and main cuda graph runners caused
   uninitialized memory issues on H100, resulting in TestQwen3NextPiecewiseCudaGraph
   failing with accuracy=0.0

This revert restores separate memory pools for piecewise cuda graph runner.
harvenstar pushed a commit to harvenstar/sglang that referenced this pull request Dec 4, 2025
harvenstar pushed a commit to harvenstar/sglang that referenced this pull request Dec 4, 2025
yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants