[NOT FINAL] add wip DSv4 aggregate and disaggregate recipes by ishandhanani · Pull Request #85 · NVIDIA/srt-slurm

ishandhanani · 2026-04-26T22:19:52Z

these are work in progress don't grab these and try to run them out of the box...that would be stupid

Summary

Creates an upstream NVIDIA/srt-slurm branch that contains the already-merged aggregate DeepSeek-V4 recipe work plus the disaggregated recipe work from fork PR #75.

Because #70 has already merged the aggregate recipes into main, this PR diff is intentionally focused on the remaining disaggregated additions:

Adds five GB300 DeepSeek-V4-Pro disaggregated recipes under recipes/gb300-fp4/1k1k-dsv4/.
Extends the GB300 DSv4 README with disaggregated topology documentation, NIXL state-buffer caveat, XPYD node semantics, and measured throughput table.
Keeps the branch in the upstream repo so follow-up review and iteration no longer depends on the fork branch.

This is intended to supersede the fork-based disagg PR #75. The aggregate recipe portion is already in main via #70, so it is present on this branch but does not reappear in the diff.

Validation

uv run srtctl dry-run -f recipes/gb300-fp4/1k1k-dsv4/disagg-*.yaml
make check

Adds the dynamo + NIXL disaggregated counterpart to the existing `gb300-fp4/1k1k-dsv4/agg-*` recipes: 1 prefill node + 1 decode node, both TP=4 on a single GB300, MXFP4 MoE kernels, chunked-prefill 4096. Same DSv4-Pro checkpoint and `dsv4-grace-blackwell` container as the agg recipes; nginx fan-in container is pulled from Docker Hub via enroot. `benchmark.type` is `manual` so the recipe brings the disagg server up and stops there — pair with sa-bench (custom_tokenizer `sa_bench_tokenizers.sglang_deepseek_v4.SGLangDeepseekV4Tokenizer` + chat template) once the server is healthy. README updated with a `Disaggregated` table to keep the existing agg matrix intact. Made-with: Cursor

Builds on the existing 1P1D TP=4 disagg recipe by adding four more points along the disagg topology curve, all sharing the same dynamo + NIXL frontend and the `dsv4-grace-blackwell` container: - disagg-1p1d-dep4-mega-moe.yaml (2 nodes, 8 GPU; both DEP=4) - disagg-1p2d-dep4-to-dep8-mega-moe.yaml (3 nodes, 12 GPU; P DEP=4, D DEP=8) - disagg-2p2d-dep8-mega-moe.yaml (4 nodes, 16 GPU; both DEP=8) - disagg-2p2d-tp8-mxfp4.yaml (4 nodes, 16 GPU; both TP=8, MXFP4) DEP recipes use TP+DP+DP-attention+DeepEP (mega_moe / DeepGEMM), mirroring the agg-balanced-tep / agg-max-tpt-tep topology but split across prefill and decode roles. Multi-node decode recipes intentionally do NOT set SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2 because CAR_V2 is single-node only and silently corrupts results across nodes. Also tightens the existing disagg-1p1d-tp4-mxfp4.yaml: switches from `benchmark.type: manual` to a low-latency sa-bench sweep (conc 4..128) and adds the same mrr / cgmb / mfs knobs as the new recipes for reproducibility. README gains: - a prominent NIXL state-buffer-fix prerequisite warning (upstream sglang PR pending) so reviewers know what container behaviour the recipes assume, - an XPYD = nodes (not instances) clarification, - a verified-throughput table from sa-bench runs at isl=osl=1024. Headline: the asymmetric 1P2D DEP4->DEP8 config delivers the highest per-GPU total token throughput (5,572 TPS/GPU at conc=2048) because at 1k/1k the workload is decode-heavy, so doubling the decode EP domain (4 -> 8 GPUs) buys far more than scaling prefill. Recipes are intentionally clean of local mounts / debug paths — pick up the required nixl/conn.py state-buffer-transfer fix via the container build process until the upstream sglang fix lands. Made-with: Cursor

… into recipes/dsv4-agg-disagg

SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK=0 only works when SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE=1 is also set. Without it, DeepEP buffer is too small for cuda-graph-max-bs=1024/2048 and capture hits deep_ep.cpp:1233 assertion. Add the full mega_moe env block to all three *-mega-moe.yaml, plus SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2=1 only on single-node sides.

* Add DSV4 Pro GB300 high-throughput recipe * fix wideep oom * optimize perf

… into recipes/dsv4-agg-disagg

YAMY1234 and others added 3 commits April 24, 2026 18:04

merge dsv4 disagg recipes

99b653a

This was referenced Apr 26, 2026

feat: DeepSeek-V4-Pro disaggregated 1P1D recipe on GB300 (1k/1k) #75

Closed

feat: DeepSeek-V4-Pro perf recipes for GB300 / GB200 (1k/1k agg) #69

Closed

ishandhanani marked this pull request as ready for review April 26, 2026 22:26

ishandhanani requested review from hjjq, kedarpotdar-nv, kyleliang-nv and qiching as code owners April 26, 2026 22:26

ishandhanani changed the title ~~[codex] add DSv4 aggregate and disaggregate recipes branch~~ [wip] add DSv4 aggregate and disaggregate recipes branch Apr 26, 2026

nixl

ee91431

ishandhanani requested review from alec-flowers, csahithi and nlevin-ui as code owners April 26, 2026 22:46

ishandhanani and others added 11 commits April 26, 2026 18:17

reshuffle like them mf experts

810ca97

Merge branch 'recipes/dsv4-agg-disagg' of github.com:NVIDIA/srt-slurm…

1a9c178

… into recipes/dsv4-agg-disagg

gb200

4d115b9

aime

346198c

aime

db5cce1

Merge branch 'main' into recipes/dsv4-agg-disagg

3aeecbc

Merge branch 'main' into recipes/dsv4-agg-disagg

18254bc

Add DSV4 Pro GB300 high-throughput recipe (#95)

8dca187

* Add DSV4 Pro GB300 high-throughput recipe * fix wideep oom * optimize perf

Merge branch 'main' into recipes/dsv4-agg-disagg

7d8354f

Merge branch 'main' into recipes/dsv4-agg-disagg

9d75f82

functionstackx mentioned this pull request Apr 28, 2026

SGL GB300 Day 0 DSV4 FP4 disagg SemiAnalysisAI/InferenceX#1169

Open

5 tasks

ishandhanani added 3 commits April 28, 2026 14:10

Merge branch 'main' into recipes/dsv4-agg-disagg

d4c7aa5

Merge branch 'recipes/dsv4-agg-disagg' of github.com:NVIDIA/srt-slurm…

77d1e10

… into recipes/dsv4-agg-disagg

moved

21054bf

ishandhanani added 3 commits April 28, 2026 14:37

go

fd0bcba

cleanups

b6f155f

go

8582338

ishandhanani changed the title ~~[wip] add DSv4 aggregate and disaggregate recipes branch~~ [NOT FINAL] add wip DSv4 aggregate and disaggregate recipes Apr 28, 2026

ishandhanani merged commit 1d665f8 into main Apr 28, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NOT FINAL] add wip DSv4 aggregate and disaggregate recipes#85

[NOT FINAL] add wip DSv4 aggregate and disaggregate recipes#85
ishandhanani merged 21 commits intomainfrom
recipes/dsv4-agg-disagg

ishandhanani commented Apr 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ishandhanani commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

these are work in progress don't grab these and try to run them out of the box...that would be stupid

Summary

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ishandhanani commented Apr 26, 2026 •

edited

Loading