Chore/upstream srt slurm by cquil11 · Pull Request #1211 · SemiAnalysisAI/InferenceX

cquil11 · 2026-04-28T15:14:15Z

No description provided.

Recipes referenced from NVIDIA/srt-slurm@sa-submission-q2-2026 are now tracked under benchmarks/multi_node/srt-slurm-recipes/, mirroring the upstream `recipes/` layout. The master-yaml plumbing for selecting one is hoisted out of `prefill.additional-settings: ["CONFIG_FILE=recipes/..."]` into a first-class `recipe:` field on the search-space entry, validated against on-disk paths so unknown recipes fail fast at sweep generation. The benchmark template resolves it to an absolute scratch-copy path passed to launchers as CONFIG_FILE, so launcher behavior is unchanged otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Six launchers each carried a ~22-line copy of the same git-clone, uv-install, venv-create, srtctl-install sequence. Lift it into clone_and_install_srtctl() in benchmarks/benchmark_lib.sh, parameterized by SRT_REPO_URL/SRT_BRANCH and UV_INSTALL_DIR/UV_VENV_DIR env vars so each launcher can keep its workspace- vs-NFS-vs-default-HOME placement decisions explicit at the call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lift the `echo "$IMAGE" | sed 's/[/:@#]/_/g'` slug used to name squash files out of 13 launchers and into sanitize_image_filename() in benchmark_lib.sh. Cluster-specific separator (h100/h200-dgxc-slurm use '+' instead of '_') is expressed as the second arg, and the nvcr.io/-prefix-strip variant becomes `sanitize_image_filename "${IMAGE#nvcr.io/}" +` rather than a sed pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-28T15:14:30Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Restructure benchmarks/multi_node/srt-slurm-recipes/ from the upstream's heterogeneous layout into a uniform tree: <model>/<framework>/<hw>-<precision>/<isl>/<osl>/<agg|disagg>/<stp|mtp>/<recipe>.yaml so a sweep contributor can navigate to dsr1/trtllm/b200-fp4/1k/1k/disagg/mtp/ and immediately see every recipe that fits that cell. The 3 sglang multi-override files that span both stp and mtp are parked one level shallower (no trailing stp|mtp/), since the override section selects spec. 365 files moved, 388 active + 5 commented recipe references rewritten, schema validation + tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per request, drop the awkward `1k/1k/` two-segment intermediate in the recipe tree in favor of `1k1k/`. New shape: <model>/<framework>/<hw>-<precision>/<isl><osl>/<agg|disagg>/<stp|mtp>/<recipe>.yaml 370 files renamed, 393 recipe references in nvidia-master.yaml rewritten, schema validation + tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop the ishandhanani/srt-slurm@sa-submission-q1-2026 fallback in launch_gb200-nv.sh — every launcher now clones NVIDIA/srt-slurm at the pinned commit 52e697d (nginx fd-limit fix on origin/main, Apr 2026). Pinning to a SHA instead of a moving branch keeps benchmark runs reproducible across upstream churn. Rename the helper's SRT_BRANCH env var to SRT_REF for accuracy (it accepts any git ref, not just a branch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop env-var override for SRT_REPO_URL / SRT_REF — every benchmark run must use the same pinned srtctl, no ad-hoc overrides at the call site. Bumping the pin is now a one-line edit to benchmark_lib.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…of-life) Stop relying on srt-slurm's bundled `benchmark.type: sa-bench` (which ships its own copy of bench_serving.py inside the upstream repo) and instead use `benchmark.type: custom` to run *this* repo's utils/bench_serving against the already-ready frontend. Avoids dual-maintaining the bench client. Plumbing: - benchmarks/multi_node/srt_bench.sh: thin wrapper that mirrors sa-bench's per-conc warmup-then-bench loop, writes results to the same /logs/sa-bench_isl_<ISL>_osl_<OSL>/results_concurrency_<N>_gpus_<TOT>_ctx_<P>_gen_<D>.json layout the launcher result-harvesters already grep, with conc list parsed from x-separated env (e.g. "128x256x1024"). - Recipe shape: add `container_mounts: { $INFMAX_WORKSPACE: /infmax-workspace }` + replace `benchmark: { type: sa-bench, ... }` with `benchmark: { type: custom, command: "bash /infmax-workspace/...", env: {...} }`. Migrated as proof-of-life: - dsr1/trtllm/b200-fp4/1k1k mtp ctx1_gen2_dep8_batch64_eplb0_mtp2 (TRT-LLM) - dsr1/sglang/gb200-fp4/1k1k stp low-latency (SGLang) - dsv4/vllm/gb200-fp4/1k1k stp disagg-gb200-1p1d-dep8-tep8 (vLLM) Remaining ~360 recipes still use sa-bench; they migrate in a follow-up once this triplet runs end-to-end on a real cluster. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two pieces, one commit: 1. benchmark_lib.sh's run_benchmark_serving() gains optional pass-throughs for --tokenizer / --endpoint / --dataset-name / --dataset-path so the multi-node srt_bench.sh wrapper can reuse it instead of forking its own command-build. (--request-rate stays hardcoded "inf" — no recipe-level override.) ~50 lines of duplicated shell deleted from srt_bench.sh. 2. Recipe `benchmark.env` blocks lose every variable that is already exported by .github/workflows/benchmark-multinode-tmpl.yml at the workflow step level: MODEL, ISL, OSL, CONC_LIST, DISAGG, RANDOM_RANGE_RATIO. Those propagate down through srtctl → srun (default --export=ALL) → pyxis into the bench container, so srt_bench.sh reads them directly. Recipes now only carry per-recipe topology knobs (PREFILL_GPUS / DECODE_GPUS / TOTAL_GPUS — used in the result filename) plus the rare overrides. Tokenizer is hardcoded to /model — srtctl's RuntimeContext.create unconditionally bind-mounts the local model dir at that path in every container, so AutoTokenizer.from_pretrained("/model") loads from the same files the engine is serving. No HF Hub egress, works for HF-id and alias- only `model:` values alike, no `TOKENIZER_PATH` knob in recipes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er/--endpoint Walked back the --dataset-name / --dataset-path additions to run_benchmark_serving — both default cleanly (random / unset) for every multi-node throughput sweep we run, so the pass-throughs were dead weight. srt_bench.sh stops setting DATASET_NAME / DATASET_PATH from env. Kept --tokenizer (srt_bench points it at /model since --model is the served-model-name alias, not a HF id) and --endpoint (recipes may need /v1/chat/completions for chat-template-only request paths). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same behavior, fewer lines: collapse the two-step suffix split into a single ${RECIPE#"${RECIPE%%:*}"} parameter expansion. 12 active lines become 5. No semantic change — verified parsing for plain paths, :override, and :zip_override_<name>[N] forms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ression Upstream commit 52e697d (#108 "fix(nginx): raise file descriptor limit for nginx workers") prepends `ulimit -n 1048576 &&` to the nginx srun command. On clusters whose container inherits a sub-1M RLIMIT_NOFILE hard limit from slurmd/PAM, the bash builtin's setrlimit fails with EPERM (raising the hard rlimit needs CAP_SYS_RESOURCE in the init user namespace, which pyxis --container-remap-root does not grant). The `&&` short-circuits and nginx never starts — caught when re-running dsr1-fp4-gb200-dynamo-sglang. Pin back to 698590e ("feat(config): cluster-wide default_bash_preamble for ulimits and the like (#104)"), the immediately prior commit, where nginx runs without the chained ulimit. Bump forward once upstream softens the ulimit to `|| true` or makes it opt-in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the temporary rollback (698590e) with the upstream fix branch. 425b486 is the tip of NVIDIA/srt-slurm's `ishan-rework-nginx`, which makes the nginx ulimit + nginx.conf `worker_rlimit_nofile` directive opt-in via a new `frontend.nginx_raise_ulimit` field (default false). Without us opting in, nginx runs without the EPERM-prone bump that #108 introduced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…letions Upstream sa-bench used `--backend dynamo --endpoint /v1/completions`, but this repo's benchmark_serving.py doesn't have a `dynamo` backend choice (it has tgi/vllm/lmdeploy/deepspeed-mii/openai/openai-chat/tensorrt-llm/ scalellm/sglang). The dynamo frontend exposes a generic OpenAI-compatible API regardless of the underlying engine, so `openai` is the right canonical default. Recipes that need /v1/chat/completions can override via ENDPOINT. Also drop the unconditional `--endpoint /v1/completions` — bench_serving.py already defaults to that, and we now only pass --endpoint when ENDPOINT is non-empty (matches single-node bench scripts that don't pass it at all). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both fixes we wanted are now on origin/main: * #110 nginx-rework-ulimit (Ishan): gates the 1M nofile bump behind opt-in frontend.nginx_raise_ulimit. Default off, fixes clusters whose container RLIMIT_NOFILE hard cap < 1M. * #111 (cam): demotes the per-srun command logger.info to logger.debug so the 5KB fingerprint heredoc stops dominating orchestrator logs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Now that the proof-of-life recipe (dsr1-fp4-gb200-dynamo-sglang low-latency, conc 4/8/32) ran clean end-to-end on a real cluster, sweep the rest of the tree onto the new shape so all multi-node throughput sweeps drive utils/bench_serving/benchmark_serving.py via benchmarks/multi_node/srt_bench.sh instead of srt-slurm's bundled sa-bench client. Each migrated recipe replaces: benchmark: type: "sa-bench" isl: … osl: … concurrencies: … req_rate: … [use_chat_template: false] with: container_mounts: "$INFMAX_WORKSPACE": "/infmax-workspace" benchmark: type: "custom" command: "bash /infmax-workspace/benchmarks/multi_node/srt_bench.sh" env: [MODEL_NAME: "..."] # only when server's served-model-name diverges # from the master-yaml `model:` value PREFILL_GPUS: "..." # per prefill worker (filename component) DECODE_GPUS: "..." # per decode worker (filename component) TOTAL_GPUS: "..." # sum across all workers (filename component) [USE_CHAT_TEMPLATE: "false"] # only carried over when set in original GPU counts derived from each recipe's `resources:` block — uses gpus_per_prefill / gpus_per_decode when set, else falls back to nodes * gpus_per_node / workers. MODEL_NAME override added on the 59 sglang recipes whose backend.sglang_config.served-model-name is "deepseek-ai/DeepSeek-R1" while master-yaml `model:` is the more specific "deepseek-ai/DeepSeek-R1-0528" / "nvidia/DeepSeek-R1-0528-NVFP4-v2" revision tag. Skipped: - 3 sglang multi-override base files (1k1k.yaml / 8k1k.yaml under dsr1/sglang/b200-fp{4,8}/) — their `benchmark:` lives nested under `base:` and gets sa-bench-style overrides per `:override[N]` reference. Migrating them needs a separate pass that handles the override-merge semantics; their 26 master-yaml refs continue to dispatch via srt-slurm's bundled sa-bench until then. Tracked as follow-up. Validation: schema accepts all 81 master-yaml entries, 149/149 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cquil11 and others added 3 commits April 28, 2026 09:50

github-project-automation Bot added this to InferenceMAX Board Apr 28, 2026

cquil11 and others added 13 commits April 28, 2026 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chore/upstream srt slurm#1211

Chore/upstream srt slurm#1211
cquil11 wants to merge 16 commits intomainfrom
chore/upstream-srt-slurm

cquil11 commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cquil11 commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant