Skip to content

Allow benchmarking each forward pass in e2e systems#4666

Open
fzyzcjy wants to merge 80 commits intosgl-project:mainfrom
fzyzcjy:feat/fine_grained_benchmark
Open

Allow benchmarking each forward pass in e2e systems#4666
fzyzcjy wants to merge 80 commits intosgl-project:mainfrom
fzyzcjy:feat/fine_grained_benchmark

Conversation

@fzyzcjy
Copy link
Collaborator

@fzyzcjy fzyzcjy commented Mar 22, 2025

Motivation

On one hand, bench_one_batch is great, but it is somehow buggy for complex scenarios (e.g. DeepSeek V3), partially because it bypasses all logic in real schedulers. On the other hand, bench_serving is great, but only provides one single e2e latency number, so we cannot know the breakdown of the number. Therefore, I made this tiny PR to allow bench_one_batch_server to provide extra numbers about each single forward pass.

Example command

SGLANG_FINE_GRAINED_BENCHMARK_DIR=/tmp/sglang_fine_grained_benchmark python -m sglang.bench_one_batch_server --model-path deepseek-ai/DeepSeek-V2-Lite --trust-remote-code --tp 2 --dp 2 --enable-dp-attention --enable-deepep-moe --disable-cuda-graph --batch-size 4 16 64 256 --input-len 1024 --output-len 2 --port 5678

Example output

  forward_mode    throughput   latency  batch_size  num_tokens
0       EXTEND  18538.560266  0.110472           2        2048
1       DECODE     21.296126  0.093914           2           2
2       DECODE     28.449750  0.070299           2           2

This PR is based on #4699, so please subtract code diff from there

Modifications

Checklist

@fzyzcjy
Copy link
Collaborator Author

fzyzcjy commented Apr 1, 2025

Ping me when this PR is to be merged - currently I only resolve conflicts in #4068, and will port the resolve code back here when pinged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant