[train] Add bundle_label_selector to ScalingConfig#58845
Merged
justinvyu merged 8 commits intoray-project:masterfrom Nov 30, 2025
Merged
[train] Add bundle_label_selector to ScalingConfig#58845justinvyu merged 8 commits intoray-project:masterfrom
justinvyu merged 8 commits intoray-project:masterfrom
Conversation
Signed-off-by: Timothy Seah <tseah@anyscale.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a bundle_label_selector to ScalingConfig, enabling more granular control over worker placement by using node labels. The implementation is solid, with appropriate validation and comprehensive unit tests. I've identified a significant issue in the controller logic where creating copies of the bundle selector for multiple workers results in all workers sharing the same dictionary instance. This could lead to hard-to-debug side effects if the selector is modified. I've provided suggestions to correct this by using list comprehensions to create truly independent copies. Apart from this, the changes are well-structured and improve the flexibility of worker placement.
python/ray/train/v2/_internal/execution/controller/controller.py
Outdated
Show resolved
Hide resolved
python/ray/train/v2/_internal/execution/controller/controller.py
Outdated
Show resolved
Hide resolved
python/ray/train/v2/_internal/execution/controller/controller.py
Outdated
Show resolved
Hide resolved
python/ray/train/v2/_internal/execution/controller/controller.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Timothy Seah <tseah@anyscale.com>
Signed-off-by: Timothy Seah <tseah@anyscale.com>
Signed-off-by: Timothy Seah <tseah@anyscale.com>
Signed-off-by: Timothy Seah <tseah@anyscale.com>
…warning Signed-off-by: Timothy Seah <tseah@anyscale.com>
Signed-off-by: Timothy Seah <tseah@anyscale.com>
justinvyu
approved these changes
Nov 26, 2025
python/ray/train/v2/_internal/execution/controller/controller.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Timothy Seah <tseah@anyscale.com>
matthewdeng
added a commit
that referenced
this pull request
Dec 13, 2025
…59414) ## Description Rename `ScalingConfig.bundle_label_selector` to `ScalingConfig.label_selector` for a cleaner API. This matches the `@ray.remote` API, as opposed to the `PlacementGroup` API which uses `bundle_label_selector`. ## Related issues API was introduced in #58845. ## Additional information This change is technically backwards incompatible, but `bundle_label_selector` was just introduced and not part of any minor version releases yet. Also made the same changes to `WorkerGroupContext`, and renamed local variables in `TrainController` and `TPUReservationCallback` Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>
Yicheng-Lu-llll
pushed a commit
to Yicheng-Lu-llll/ray
that referenced
this pull request
Dec 22, 2025
…ay-project#59414) ## Description Rename `ScalingConfig.bundle_label_selector` to `ScalingConfig.label_selector` for a cleaner API. This matches the `@ray.remote` API, as opposed to the `PlacementGroup` API which uses `bundle_label_selector`. ## Related issues API was introduced in ray-project#58845. ## Additional information This change is technically backwards incompatible, but `bundle_label_selector` was just introduced and not part of any minor version releases yet. Also made the same changes to `WorkerGroupContext`, and renamed local variables in `TrainController` and `TPUReservationCallback` Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
This PR adds a `bundle_label_selector` argument to the `ScalingConfig` that allows Ray Train workers to be placed on nodes with particular labels. The previous workaround, namely using `resources_per_worker`, is less flexible. `bundle_label_selector` can either be a single dict, in which case it will apply to all the workers, or a list of length `num_workers`, in which case each item in the list will correspond to one of the workers. I added verification to the controller instead of validating that none of the callbacks have `on_controller_start_worker_group` when `bundle_label_selector` is set because we might change `on_controller_start_worker_group` in the future. We can revisit this issue then. --------- Signed-off-by: Timothy Seah <tseah@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
…ay-project#59414) ## Description Rename `ScalingConfig.bundle_label_selector` to `ScalingConfig.label_selector` for a cleaner API. This matches the `@ray.remote` API, as opposed to the `PlacementGroup` API which uses `bundle_label_selector`. ## Related issues API was introduced in ray-project#58845. ## Additional information This change is technically backwards incompatible, but `bundle_label_selector` was just introduced and not part of any minor version releases yet. Also made the same changes to `WorkerGroupContext`, and renamed local variables in `TrainController` and `TPUReservationCallback` Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a
bundle_label_selectorargument to theScalingConfigthat allows Ray Train workers to be placed on nodes with particular labels. The previous workaround, namely usingresources_per_worker, is less flexible.bundle_label_selectorcan either be a single dict, in which case it will apply to all the workers, or a list of lengthnum_workers, in which case each item in the list will correspond to one of the workers.I added verification to the controller instead of validating that none of the callbacks have
on_controller_start_worker_groupwhenbundle_label_selectoris set because we might changeon_controller_start_worker_groupin the future. We can revisit this issue then.Testing
Unit tests