Skip to content

[train] Add bundle_label_selector to ScalingConfig#58845

Merged
justinvyu merged 8 commits intoray-project:masterfrom
TimothySeah:tseah/bundle-selector-to-scaling-config
Nov 30, 2025
Merged

[train] Add bundle_label_selector to ScalingConfig#58845
justinvyu merged 8 commits intoray-project:masterfrom
TimothySeah:tseah/bundle-selector-to-scaling-config

Conversation

@TimothySeah
Copy link
Contributor

@TimothySeah TimothySeah commented Nov 20, 2025

Summary

This PR adds a bundle_label_selector argument to the ScalingConfig that allows Ray Train workers to be placed on nodes with particular labels. The previous workaround, namely using resources_per_worker, is less flexible.

bundle_label_selector can either be a single dict, in which case it will apply to all the workers, or a list of length num_workers, in which case each item in the list will correspond to one of the workers.

I added verification to the controller instead of validating that none of the callbacks have on_controller_start_worker_group when bundle_label_selector is set because we might change on_controller_start_worker_group in the future. We can revisit this issue then.

Testing

Unit tests

Signed-off-by: Timothy Seah <tseah@anyscale.com>
@TimothySeah TimothySeah requested a review from a team as a code owner November 20, 2025 03:48
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bundle_label_selector to ScalingConfig, enabling more granular control over worker placement by using node labels. The implementation is solid, with appropriate validation and comprehensive unit tests. I've identified a significant issue in the controller logic where creating copies of the bundle selector for multiple workers results in all workers sharing the same dictionary instance. This could lead to hard-to-debug side effects if the selector is modified. I've provided suggestions to correct this by using list comprehensions to create truly independent copies. Apart from this, the changes are well-structured and improve the flexibility of worker placement.

Signed-off-by: Timothy Seah <tseah@anyscale.com>
Signed-off-by: Timothy Seah <tseah@anyscale.com>
@ray-gardener ray-gardener bot added the train Ray Train Related Issue label Nov 20, 2025
Signed-off-by: Timothy Seah <tseah@anyscale.com>
Copy link
Contributor

@liulehui liulehui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty!

Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

Copy link
Contributor

@ryanaoleary ryanaoleary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Timothy Seah <tseah@anyscale.com>
…warning

Signed-off-by: Timothy Seah <tseah@anyscale.com>
Signed-off-by: Timothy Seah <tseah@anyscale.com>
@TimothySeah TimothySeah added the go add ONLY when ready to merge, run all tests label Nov 26, 2025
Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

Signed-off-by: Timothy Seah <tseah@anyscale.com>
@justinvyu justinvyu merged commit 2ab5d20 into ray-project:master Nov 30, 2025
6 checks passed
matthewdeng added a commit that referenced this pull request Dec 13, 2025
…59414)

## Description
Rename `ScalingConfig.bundle_label_selector` to
`ScalingConfig.label_selector` for a cleaner API.

This matches the `@ray.remote` API, as opposed to the `PlacementGroup`
API which uses `bundle_label_selector`.

## Related issues

API was introduced in #58845.

## Additional information

This change is technically backwards incompatible, but
`bundle_label_selector` was just introduced and not part of any minor
version releases yet.

Also made the same changes to `WorkerGroupContext`, and renamed local
variables in `TrainController` and `TPUReservationCallback`

Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>
Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025
…ay-project#59414)

## Description
Rename `ScalingConfig.bundle_label_selector` to
`ScalingConfig.label_selector` for a cleaner API.

This matches the `@ray.remote` API, as opposed to the `PlacementGroup`
API which uses `bundle_label_selector`.

## Related issues

API was introduced in ray-project#58845.

## Additional information

This change is technically backwards incompatible, but
`bundle_label_selector` was just introduced and not part of any minor
version releases yet.

Also made the same changes to `WorkerGroupContext`, and renamed local
variables in `TrainController` and `TPUReservationCallback`

Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
This PR adds a `bundle_label_selector` argument to the `ScalingConfig`
that allows Ray Train workers to be placed on nodes with particular
labels. The previous workaround, namely using `resources_per_worker`, is
less flexible.

`bundle_label_selector` can either be a single dict, in which case it
will apply to all the workers, or a list of length `num_workers`, in
which case each item in the list will correspond to one of the workers.

I added verification to the controller instead of validating that none
of the callbacks have `on_controller_start_worker_group` when
`bundle_label_selector` is set because we might change
`on_controller_start_worker_group` in the future. We can revisit this
issue then.

---------

Signed-off-by: Timothy Seah <tseah@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ay-project#59414)

## Description
Rename `ScalingConfig.bundle_label_selector` to
`ScalingConfig.label_selector` for a cleaner API.

This matches the `@ray.remote` API, as opposed to the `PlacementGroup`
API which uses `bundle_label_selector`.

## Related issues

API was introduced in ray-project#58845.

## Additional information

This change is technically backwards incompatible, but
`bundle_label_selector` was just introduced and not part of any minor
version releases yet.

Also made the same changes to `WorkerGroupContext`, and renamed local
variables in `TrainController` and `TPUReservationCallback`

Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests train Ray Train Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants