Skip to content

[train] Enable v2 for ray/train/tests#56868

Merged
justinvyu merged 30 commits intoray-project:masterfrom
justinvyu:train_enable_v2
Oct 8, 2025
Merged

[train] Enable v2 for ray/train/tests#56868
justinvyu merged 30 commits intoray-project:masterfrom
justinvyu:train_enable_v2

Conversation

@justinvyu
Copy link
Contributor

@justinvyu justinvyu commented Sep 24, 2025


Note

Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.

  • CI (Buildkite):
    • Split Train tests into v1 and v2, add dedicated GPU job for train_v2_gpu.
    • Standardize tags (drop gpu_only, use gpu), adjust include/except tag filters.
  • Tests (BUILD and test code):
    • Gate tests with RAY_TRAIN_V2_ENABLED (on/off per test); retag v2 tests (train_v2, train_v2_gpu).
    • Replace deprecated imports/usages; simplify some assertions to just run fit().
    • Remove legacy tests/examples (e.g., Tune Torch/TensorFlow tuning examples, datasets_train, utils), and adjust v2 HF/XGBoost tests (steps/save intervals).
    • Data tests retagged from team:ml/ray_air to team:data.
  • Code cleanup:
    • Remove construct_path from train/_internal/utils.py and its unit test; update pydoclint baseline accordingly.
  • Docs:
    • Fix outdated example links and remove obsolete Tune example pages/links.
  • Bazel BUILD:
    • Widespread tag/env updates; add/remove tests to reflect v2 enablement and GPU tagging changes.

Written by Cursor Bugbot for commit b88db74. This will update automatically on new commits. Configure here.

Legacy V1 tests

This PR explicitly marked legacy v1 tests by running them with RAY_TRAIN_V2_ENABLED=0. See the BUILD files.

Followups

There are some tests where some tests should be migrated to V2 and some other tests in the file are legacy V1 tests. I've marked these as TODOs in the BUILD files to migrate in a follow-up.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
@justinvyu justinvyu requested review from a team as code owners September 24, 2025 01:01
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant changes to enable Ray Train v2. The modifications include updating CI configurations, migrating numerous tests to use the v2 API by setting RAY_TRAIN_V2_ENABLED=1, and cleaning up obsolete v1 examples and tests. The changes are generally consistent and well-structured.

My main concern is the removal of tests in python/ray/train/tests/test_gpu_auto_transfer.py that verified critical GPU-related functionality. While the tests used v1 APIs, the underlying features are still important for v2. I've left a specific comment on this.

Otherwise, the migration strategy seems sound, with clear use of environment variables to toggle between API versions and TODOs marking areas for future work.

@ray-gardener ray-gardener bot added the train Ray Train Related Issue label Sep 24, 2025
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
@justinvyu justinvyu added the go add ONLY when ready to merge, run all tests label Sep 24, 2025
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
@justinvyu justinvyu requested a review from a team as a code owner September 25, 2025 21:43
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
@justinvyu justinvyu merged commit f19664e into ray-project:master Oct 8, 2025
7 checks passed
@justinvyu justinvyu deleted the train_enable_v2 branch October 8, 2025 00:46
justinvyu added a commit that referenced this pull request Oct 8, 2025
#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
liulehui pushed a commit to liulehui/ray that referenced this pull request Oct 9, 2025
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
liulehui pushed a commit to liulehui/ray that referenced this pull request Oct 9, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
joshkodi pushed a commit to joshkodi/ray that referenced this pull request Oct 13, 2025
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Josh Kodi <joshkodi@gmail.com>
joshkodi pushed a commit to joshkodi/ray that referenced this pull request Oct 13, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Josh Kodi <joshkodi@gmail.com>
ArturNiederfahrenhorst pushed a commit to ArturNiederfahrenhorst/ray that referenced this pull request Oct 13, 2025
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
ArturNiederfahrenhorst pushed a commit to ArturNiederfahrenhorst/ray that referenced this pull request Oct 13, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
justinvyu added a commit that referenced this pull request Oct 16, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: #57534, #57256, #56868, #56820, #56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by #57042 and
#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by ray-project#57042 and
ray-project#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by ray-project#57042 and
ray-project#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: #57534, #57256, #56868, #56820, #56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by #57042 and
#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by ray-project#57042 and
ray-project#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by ray-project#57042 and
ray-project#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by ray-project#57042 and
ray-project#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests train Ray Train Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants