Skip to content

Chore/upstream srt slurm#1211

Draft
cquil11 wants to merge 16 commits intomainfrom
chore/upstream-srt-slurm
Draft

Chore/upstream srt slurm#1211
cquil11 wants to merge 16 commits intomainfrom
chore/upstream-srt-slurm

Conversation

@cquil11
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 commented Apr 28, 2026

No description provided.

cquil11 and others added 3 commits April 28, 2026 09:50
Recipes referenced from NVIDIA/srt-slurm@sa-submission-q2-2026 are now tracked
under benchmarks/multi_node/srt-slurm-recipes/, mirroring the upstream
`recipes/` layout. The master-yaml plumbing for selecting one is hoisted out
of `prefill.additional-settings: ["CONFIG_FILE=recipes/..."]` into a
first-class `recipe:` field on the search-space entry, validated against
on-disk paths so unknown recipes fail fast at sweep generation. The benchmark
template resolves it to an absolute scratch-copy path passed to launchers as
CONFIG_FILE, so launcher behavior is unchanged otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six launchers each carried a ~22-line copy of the same git-clone, uv-install,
venv-create, srtctl-install sequence. Lift it into clone_and_install_srtctl()
in benchmarks/benchmark_lib.sh, parameterized by SRT_REPO_URL/SRT_BRANCH and
UV_INSTALL_DIR/UV_VENV_DIR env vars so each launcher can keep its workspace-
vs-NFS-vs-default-HOME placement decisions explicit at the call site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lift the `echo "$IMAGE" | sed 's/[/:@#]/_/g'` slug used to name squash files
out of 13 launchers and into sanitize_image_filename() in benchmark_lib.sh.
Cluster-specific separator (h100/h200-dgxc-slurm use '+' instead of '_') is
expressed as the second arg, and the nvcr.io/-prefix-strip variant becomes
`sanitize_image_filename "${IMAGE#nvcr.io/}" +` rather than a sed pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

cquil11 and others added 13 commits April 28, 2026 10:54
Restructure benchmarks/multi_node/srt-slurm-recipes/ from the upstream's
heterogeneous layout into a uniform tree:

  <model>/<framework>/<hw>-<precision>/<isl>/<osl>/<agg|disagg>/<stp|mtp>/<recipe>.yaml

so a sweep contributor can navigate to dsr1/trtllm/b200-fp4/1k/1k/disagg/mtp/
and immediately see every recipe that fits that cell. The 3 sglang
multi-override files that span both stp and mtp are parked one level
shallower (no trailing stp|mtp/), since the override section selects spec.

365 files moved, 388 active + 5 commented recipe references rewritten,
schema validation + tests still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per request, drop the awkward `1k/1k/` two-segment intermediate in the
recipe tree in favor of `1k1k/`. New shape:

  <model>/<framework>/<hw>-<precision>/<isl><osl>/<agg|disagg>/<stp|mtp>/<recipe>.yaml

370 files renamed, 393 recipe references in nvidia-master.yaml rewritten,
schema validation + tests still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the ishandhanani/srt-slurm@sa-submission-q1-2026 fallback in
launch_gb200-nv.sh — every launcher now clones NVIDIA/srt-slurm at the
pinned commit 52e697d (nginx fd-limit fix on origin/main, Apr 2026).
Pinning to a SHA instead of a moving branch keeps benchmark runs
reproducible across upstream churn.

Rename the helper's SRT_BRANCH env var to SRT_REF for accuracy
(it accepts any git ref, not just a branch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop env-var override for SRT_REPO_URL / SRT_REF — every benchmark run
must use the same pinned srtctl, no ad-hoc overrides at the call site.
Bumping the pin is now a one-line edit to benchmark_lib.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…of-life)

Stop relying on srt-slurm's bundled `benchmark.type: sa-bench` (which ships
its own copy of bench_serving.py inside the upstream repo) and instead use
`benchmark.type: custom` to run *this* repo's utils/bench_serving against the
already-ready frontend. Avoids dual-maintaining the bench client.

Plumbing:
- benchmarks/multi_node/srt_bench.sh: thin wrapper that mirrors sa-bench's
  per-conc warmup-then-bench loop, writes results to the same
  /logs/sa-bench_isl_<ISL>_osl_<OSL>/results_concurrency_<N>_gpus_<TOT>_ctx_<P>_gen_<D>.json
  layout the launcher result-harvesters already grep, with conc list parsed
  from x-separated env (e.g. "128x256x1024").
- Recipe shape: add `container_mounts: { $INFMAX_WORKSPACE: /infmax-workspace }`
  + replace `benchmark: { type: sa-bench, ... }` with
  `benchmark: { type: custom, command: "bash /infmax-workspace/...", env: {...} }`.

Migrated as proof-of-life:
- dsr1/trtllm/b200-fp4/1k1k mtp ctx1_gen2_dep8_batch64_eplb0_mtp2  (TRT-LLM)
- dsr1/sglang/gb200-fp4/1k1k stp low-latency                       (SGLang)
- dsv4/vllm/gb200-fp4/1k1k stp disagg-gb200-1p1d-dep8-tep8          (vLLM)

Remaining ~360 recipes still use sa-bench; they migrate in a follow-up once
this triplet runs end-to-end on a real cluster.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two pieces, one commit:

1. benchmark_lib.sh's run_benchmark_serving() gains optional pass-throughs
   for --tokenizer / --endpoint / --dataset-name / --dataset-path so the
   multi-node srt_bench.sh wrapper can reuse it instead of forking its own
   command-build. (--request-rate stays hardcoded "inf" — no recipe-level
   override.) ~50 lines of duplicated shell deleted from srt_bench.sh.

2. Recipe `benchmark.env` blocks lose every variable that is already
   exported by .github/workflows/benchmark-multinode-tmpl.yml at the
   workflow step level: MODEL, ISL, OSL, CONC_LIST, DISAGG, RANDOM_RANGE_RATIO.
   Those propagate down through srtctl → srun (default --export=ALL) → pyxis
   into the bench container, so srt_bench.sh reads them directly. Recipes
   now only carry per-recipe topology knobs (PREFILL_GPUS / DECODE_GPUS /
   TOTAL_GPUS — used in the result filename) plus the rare overrides.

Tokenizer is hardcoded to /model — srtctl's RuntimeContext.create
unconditionally bind-mounts the local model dir at that path in every
container, so AutoTokenizer.from_pretrained("/model") loads from the same
files the engine is serving. No HF Hub egress, works for HF-id and alias-
only `model:` values alike, no `TOKENIZER_PATH` knob in recipes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er/--endpoint

Walked back the --dataset-name / --dataset-path additions to
run_benchmark_serving — both default cleanly (random / unset) for every
multi-node throughput sweep we run, so the pass-throughs were dead
weight. srt_bench.sh stops setting DATASET_NAME / DATASET_PATH from env.

Kept --tokenizer (srt_bench points it at /model since --model is the
served-model-name alias, not a HF id) and --endpoint (recipes may need
/v1/chat/completions for chat-template-only request paths).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same behavior, fewer lines: collapse the two-step suffix split into a
single ${RECIPE#"${RECIPE%%:*}"} parameter expansion. 12 active lines
become 5. No semantic change — verified parsing for plain paths,
:override, and :zip_override_<name>[N] forms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ression

Upstream commit 52e697d (#108 "fix(nginx): raise file descriptor limit for
nginx workers") prepends `ulimit -n 1048576 &&` to the nginx srun command.
On clusters whose container inherits a sub-1M RLIMIT_NOFILE hard limit
from slurmd/PAM, the bash builtin's setrlimit fails with EPERM (raising
the hard rlimit needs CAP_SYS_RESOURCE in the init user namespace, which
pyxis --container-remap-root does not grant). The `&&` short-circuits and
nginx never starts — caught when re-running dsr1-fp4-gb200-dynamo-sglang.

Pin back to 698590e ("feat(config): cluster-wide default_bash_preamble for
ulimits and the like (#104)"), the immediately prior commit, where nginx
runs without the chained ulimit. Bump forward once upstream softens the
ulimit to `|| true` or makes it opt-in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the temporary rollback (698590e) with the upstream fix branch.
425b486 is the tip of NVIDIA/srt-slurm's `ishan-rework-nginx`, which makes
the nginx ulimit + nginx.conf `worker_rlimit_nofile` directive opt-in via
a new `frontend.nginx_raise_ulimit` field (default false). Without us
opting in, nginx runs without the EPERM-prone bump that #108 introduced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…letions

Upstream sa-bench used `--backend dynamo --endpoint /v1/completions`, but
this repo's benchmark_serving.py doesn't have a `dynamo` backend choice
(it has tgi/vllm/lmdeploy/deepspeed-mii/openai/openai-chat/tensorrt-llm/
scalellm/sglang). The dynamo frontend exposes a generic OpenAI-compatible
API regardless of the underlying engine, so `openai` is the right canonical
default. Recipes that need /v1/chat/completions can override via ENDPOINT.

Also drop the unconditional `--endpoint /v1/completions` — bench_serving.py
already defaults to that, and we now only pass --endpoint when ENDPOINT is
non-empty (matches single-node bench scripts that don't pass it at all).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both fixes we wanted are now on origin/main:
  * #110 nginx-rework-ulimit (Ishan): gates the 1M nofile bump behind opt-in
    frontend.nginx_raise_ulimit. Default off, fixes clusters whose container
    RLIMIT_NOFILE hard cap < 1M.
  * #111 (cam): demotes the per-srun command logger.info to logger.debug so
    the 5KB fingerprint heredoc stops dominating orchestrator logs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that the proof-of-life recipe (dsr1-fp4-gb200-dynamo-sglang low-latency,
conc 4/8/32) ran clean end-to-end on a real cluster, sweep the rest of the
tree onto the new shape so all multi-node throughput sweeps drive
utils/bench_serving/benchmark_serving.py via benchmarks/multi_node/srt_bench.sh
instead of srt-slurm's bundled sa-bench client.

Each migrated recipe replaces:

  benchmark:
    type: "sa-bench"
    isl: …
    osl: …
    concurrencies: …
    req_rate: …
    [use_chat_template: false]

with:

  container_mounts:
    "$INFMAX_WORKSPACE": "/infmax-workspace"

  benchmark:
    type: "custom"
    command: "bash /infmax-workspace/benchmarks/multi_node/srt_bench.sh"
    env:
      [MODEL_NAME: "..."]      # only when server's served-model-name diverges
                               # from the master-yaml `model:` value
      PREFILL_GPUS: "..."      # per prefill worker (filename component)
      DECODE_GPUS: "..."       # per decode worker (filename component)
      TOTAL_GPUS: "..."        # sum across all workers (filename component)
      [USE_CHAT_TEMPLATE: "false"]   # only carried over when set in original

GPU counts derived from each recipe's `resources:` block — uses
gpus_per_prefill / gpus_per_decode when set, else falls back to
nodes * gpus_per_node / workers. MODEL_NAME override added on the 59
sglang recipes whose backend.sglang_config.served-model-name is
"deepseek-ai/DeepSeek-R1" while master-yaml `model:` is the more specific
"deepseek-ai/DeepSeek-R1-0528" / "nvidia/DeepSeek-R1-0528-NVFP4-v2"
revision tag.

Skipped:
  - 3 sglang multi-override base files (1k1k.yaml / 8k1k.yaml under
    dsr1/sglang/b200-fp{4,8}/) — their `benchmark:` lives nested under
    `base:` and gets sa-bench-style overrides per `:override[N]` reference.
    Migrating them needs a separate pass that handles the override-merge
    semantics; their 26 master-yaml refs continue to dispatch via srt-slurm's
    bundled sa-bench until then. Tracked as follow-up.

Validation: schema accepts all 81 master-yaml entries, 149/149 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant