[data] refactor download expression to use inheritance from AbstractOneToOne#56294
Conversation
|
Kicked off release test build to test it here. |
There was a problem hiding this comment.
Code Review
This pull request correctly refactors the Download logical operator to inherit from AbstractOneToOne instead of AbstractMap. This change is logical and improves the code's structure. Additionally, this refactoring also addresses a minor performance inefficiency by switching to TaskPoolStrategy for the stateless download portion of the operation. The changes are well-implemented. I have one suggestion to improve documentation.
| def __init__( | ||
| self, | ||
| input_op: LogicalOperator, | ||
| uri_column_name: str, | ||
| output_bytes_column_name: str, | ||
| ray_remote_args: Optional[Dict[str, Any]] = None, | ||
| ): | ||
| super().__init__( | ||
| "Download", | ||
| input_op, | ||
| ) | ||
| self._uri_column_name = uri_column_name | ||
| self._output_bytes_column_name = output_bytes_column_name | ||
| self._ray_remote_args = ray_remote_args or {} |
There was a problem hiding this comment.
To improve code clarity and maintainability, consider adding a docstring to the __init__ method. This would help developers understand the purpose of its parameters without needing to inspect the implementation details.
def __init__(
self,
input_op: LogicalOperator,
uri_column_name: str,
output_bytes_column_name: str,
ray_remote_args: Optional[Dict[str, Any]] = None,
):
"""Initialize the Download operator.
Args:
input_op: The upstream logical operator.
uri_column_name: The name of the column containing the URIs to download.
output_bytes_column_name: The name of the column to store the downloaded bytes.
ray_remote_args: Ray remote arguments for the download tasks.
"""
super().__init__(
"Download",
input_op,
)
self._uri_column_name = uri_column_name
self._output_bytes_column_name = output_bytes_column_name
self._ray_remote_args = ray_remote_args or {}
bveeramani
left a comment
There was a problem hiding this comment.
LGTM assuming no regression
| def can_modify_num_rows(self) -> bool: | ||
| return False |
There was a problem hiding this comment.
Doesn't this modify the number of rows? Like, you pass in a URI for a CSV, and then possibly get many rows?
There was a problem hiding this comment.
I think it will depend on our implementation of the decode API. Right now every row will return the bytes of the uri, if we allow the decode function to return multiple rows we will need to handle that separately. For now, I think it is most correct to leave this as is.
python/ray/data/_internal/logical/operators/one_to_one_operator.py
Outdated
Show resolved
Hide resolved
…r.py Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>
Release test results: |
…OneToOne` (ray-project#56294) ## Why are these changes needed? `Download` logical operator isn't really a map, we should use `AbstractOneToOne` as the base class instead. <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Matthew Owen <mowen@anyscale.com> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: sampan <sampan@anyscale.com>
…OneToOne` (ray-project#56294) ## Why are these changes needed? `Download` logical operator isn't really a map, we should use `AbstractOneToOne` as the base class instead. <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Matthew Owen <mowen@anyscale.com> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
…OneToOne` (ray-project#56294) ## Why are these changes needed? `Download` logical operator isn't really a map, we should use `AbstractOneToOne` as the base class instead. <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Matthew Owen <mowen@anyscale.com> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>
…OneToOne` (ray-project#56294) ## Why are these changes needed? `Download` logical operator isn't really a map, we should use `AbstractOneToOne` as the base class instead. <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Matthew Owen <mowen@anyscale.com> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: zac <zac@anyscale.com>
…OneToOne` (#56294) ## Why are these changes needed? `Download` logical operator isn't really a map, we should use `AbstractOneToOne` as the base class instead. <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Matthew Owen <mowen@anyscale.com> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
…OneToOne` (ray-project#56294) ## Why are these changes needed? `Download` logical operator isn't really a map, we should use `AbstractOneToOne` as the base class instead. <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Matthew Owen <mowen@anyscale.com> Signed-off-by: Matthew Owen <omatthew98@berkeley.edu> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Why are these changes needed?
Downloadlogical operator isn't really a map, we should useAbstractOneToOneas the base class instead.Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.