[train] Enable v2 for ray/train/tests#56868
Conversation
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request introduces significant changes to enable Ray Train v2. The modifications include updating CI configurations, migrating numerous tests to use the v2 API by setting RAY_TRAIN_V2_ENABLED=1, and cleaning up obsolete v1 examples and tests. The changes are generally consistent and well-structured.
My main concern is the removal of tests in python/ray/train/tests/test_gpu_auto_transfer.py that verified critical GPU-related functionality. While the tests used v1 APIs, the underlying features are still important for v2. I've left a specific comment on this.
Otherwise, the migration strategy seems sound, with clear use of environment variables to toggle between API versions and TODOs marking areas for future work.
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Josh Kodi <joshkodi@gmail.com>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Josh Kodi <joshkodi@gmail.com>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: #57534, #57256, #56868, #56820, #56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by #57042 and #57121. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by ray-project#57042 and ray-project#57121. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: xgui <xgui@anyscale.com>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: xgui <xgui@anyscale.com>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by ray-project#57042 and ray-project#57121. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: xgui <xgui@anyscale.com>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: #57534, #57256, #56868, #56820, #56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by #57042 and #57121. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by ray-project#57042 and ray-project#57121. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by ray-project#57042 and ray-project#57121. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by ray-project#57042 and ray-project#57121. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Note
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.
train_v2_gpu.gpu_only, usegpu), adjust include/except tag filters.RAY_TRAIN_V2_ENABLED(on/off per test); retag v2 tests (train_v2,train_v2_gpu).fit().team:ml/ray_airtoteam:data.construct_pathfromtrain/_internal/utils.pyand its unit test; update pydoclint baseline accordingly.Written by Cursor Bugbot for commit b88db74. This will update automatically on new commits. Configure here.
Legacy V1 tests
This PR explicitly marked legacy v1 tests by running them with RAY_TRAIN_V2_ENABLED=0. See the BUILD files.
Followups
There are some tests where some tests should be migrated to V2 and some other tests in the file are legacy V1 tests. I've marked these as TODOs in the BUILD files to migrate in a follow-up.