[Train] move collective implementations to train_fn_utils#55689
[Train] move collective implementations to train_fn_utils#55689justinvyu merged 6 commits intoray-project:masterfrom
Conversation
Signed-off-by: xgui <xgui@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request refactors collective operations like barrier and broadcast_from_rank_zero by moving their implementations from the public API module ray.train.collective.collectives to the internal TrainFnUtils class. This is a good architectural improvement that cleans up the public API surface and centralizes training-related utilities. The tests have been updated to reflect these changes. The changes are logical and well-executed. I have a couple of minor suggestions to improve code clarity and avoid redundant function calls.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com>
Signed-off-by: xgui <xgui@anyscale.com>
| @@ -0,0 +1,56 @@ | |||
| import logging | |||
There was a problem hiding this comment.
nit: move to collective folder as per @justinvyu 's comment.
There was a problem hiding this comment.
We discussed offline and it made more sense to put inside the _internal/execution
Signed-off-by: xgui <xgui@anyscale.com>
…t#55689) This PR moves the implementations of collectives to `TrainFnUtils`. This would unblock the local mode that is introduced in ray-project#55487 --------- Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
This PR moves the implementations of collectives to `TrainFnUtils`. This would unblock the local mode that is introduced in #55487 --------- Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
…t#55689) This PR moves the implementations of collectives to `TrainFnUtils`. This would unblock the local mode that is introduced in ray-project#55487 --------- Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Why are these changes needed?
This PR moves the implementations of collectives to
TrainFnUtils. This would unblock the local mode that is introduced in #55487Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.