[vllm] fix: enable multi-node vllm replicas with expert parallelism#5339
Open
guillemgt wants to merge 2 commits intoverl-project:mainfrom
Open
[vllm] fix: enable multi-node vllm replicas with expert parallelism#5339guillemgt wants to merge 2 commits intoverl-project:mainfrom
guillemgt wants to merge 2 commits intoverl-project:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request correctly removes a restrictive assertion to enable multi-node vLLM deployments with expert parallelism where tensor parallelism spans multiple nodes. The logic to handle tensor_model_parallel_size > gpus_per_node is a good addition. However, I've identified a critical missing validation check that could lead to runtime errors or hangs with specific configurations. I've also suggested some minor improvements to the new assertion messages to enhance clarity for future debugging.
- Add critical validation that tensor_model_parallel_size is divisible by gpus_per_node when tensor parallelism spans multiple nodes - Improve assertion messages for clarity by removing misleading "or vice versa" text - Fix extraneous asterisks in assertion error messages Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Removes an overly restrictive assertion that prevented using multi-node vLLM servers when Expert Parallelism (EP) is enabled and tensor parallelism spans multiple nodes. The underlying functionality already worked—this change simply allows it to be used by handling the case where
tensor_model_parallel_size > gpus_per_node.Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI)rolloutfixTest
The previous assertion failure occurred when launching multi-node vLLM servers with:
expert_parallel_size > 1)tensor_model_parallel_size > gpus_per_node)Testing approach: Validated with multi-node vLLM deployments using EP and cross-node TP configurations.
API and Usage Example
No API changes. Internal fix only.
Design & Code Changes
The original code assumed
gpus_per_node % tensor_model_parallel_size == 0, which fails when TP spans multiple nodes. This PR adds conditional logic to computedata_parallel_start_rankcorrectly for both cases:data_parallel_size_local > 0): existing behaviordata_parallel_size_local == 0): compute rank usingnode_rank // nodes_per_data_parallel_workerChecklist Before Submitting
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace.recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main. (N/A)