Skip to content

[data] refactor download expression to use inheritance from AbstractOneToOne#56294

Merged
bveeramani merged 2 commits intoray-project:masterfrom
omatthew98:mowen/refactor-download-op-inheritance
Sep 5, 2025
Merged

[data] refactor download expression to use inheritance from AbstractOneToOne#56294
bveeramani merged 2 commits intoray-project:masterfrom
omatthew98:mowen/refactor-download-op-inheritance

Conversation

@omatthew98
Copy link
Contributor

Why are these changes needed?

Download logical operator isn't really a map, we should use AbstractOneToOne as the base class instead.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Matthew Owen <mowen@anyscale.com>
@omatthew98 omatthew98 requested a review from a team as a code owner September 5, 2025 21:55
@omatthew98
Copy link
Contributor Author

Kicked off release test build to test it here.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly refactors the Download logical operator to inherit from AbstractOneToOne instead of AbstractMap. This change is logical and improves the code's structure. Additionally, this refactoring also addresses a minor performance inefficiency by switching to TaskPoolStrategy for the stateless download portion of the operation. The changes are well-implemented. I have one suggestion to improve documentation.

Comment on lines +91 to +104
def __init__(
self,
input_op: LogicalOperator,
uri_column_name: str,
output_bytes_column_name: str,
ray_remote_args: Optional[Dict[str, Any]] = None,
):
super().__init__(
"Download",
input_op,
)
self._uri_column_name = uri_column_name
self._output_bytes_column_name = output_bytes_column_name
self._ray_remote_args = ray_remote_args or {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve code clarity and maintainability, consider adding a docstring to the __init__ method. This would help developers understand the purpose of its parameters without needing to inspect the implementation details.

    def __init__(
        self,
        input_op: LogicalOperator,
        uri_column_name: str,
        output_bytes_column_name: str,
        ray_remote_args: Optional[Dict[str, Any]] = None,
    ):
        """Initialize the Download operator.

        Args:
            input_op: The upstream logical operator.
            uri_column_name: The name of the column containing the URIs to download.
            output_bytes_column_name: The name of the column to store the downloaded bytes.
            ray_remote_args: Ray remote arguments for the download tasks.
        """
        super().__init__(
            "Download",
            input_op,
        )
        self._uri_column_name = uri_column_name
        self._output_bytes_column_name = output_bytes_column_name
        self._ray_remote_args = ray_remote_args or {}

Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming no regression

Comment on lines +106 to +107
def can_modify_num_rows(self) -> bool:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this modify the number of rows? Like, you pass in a URI for a CSV, and then possibly get many rows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will depend on our implementation of the decode API. Right now every row will return the bytes of the uri, if we allow the decode function to return multiple rows we will need to handle that separately. For now, I think it is most correct to leave this as is.

…r.py

Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>
@omatthew98 omatthew98 added the go add ONLY when ready to merge, run all tests label Sep 5, 2025
@omatthew98
Copy link
Contributor Author

omatthew98 commented Sep 5, 2025

LGTM assuming no regression

Release test results:

  • Fixed size: 219.08s (build here)
  • Autoscaling: 488.46 (build here)

Previous PR/benchmark:

  • Fixed size: 229.85s (build)
  • Autoscaling: 500.54s (build)

@bveeramani bveeramani enabled auto-merge (squash) September 5, 2025 23:52
@bveeramani bveeramani merged commit ad906af into ray-project:master Sep 5, 2025
7 checks passed
sampan-s-nayak pushed a commit to sampan-s-nayak/ray that referenced this pull request Sep 8, 2025
…OneToOne` (ray-project#56294)

## Why are these changes needed?
`Download` logical operator isn't really a map, we should use
`AbstractOneToOne` as the base class instead.

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: sampan <sampan@anyscale.com>
jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025
…OneToOne` (ray-project#56294)

## Why are these changes needed?
`Download` logical operator isn't really a map, we should use
`AbstractOneToOne` as the base class instead.

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
wyhong3103 pushed a commit to wyhong3103/ray that referenced this pull request Sep 12, 2025
…OneToOne` (ray-project#56294)

## Why are these changes needed?
`Download` logical operator isn't really a map, we should use
`AbstractOneToOne` as the base class instead.

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>
ZacAttack pushed a commit to ZacAttack/ray that referenced this pull request Sep 24, 2025
…OneToOne` (ray-project#56294)

## Why are these changes needed?
`Download` logical operator isn't really a map, we should use
`AbstractOneToOne` as the base class instead.

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: zac <zac@anyscale.com>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
…OneToOne` (#56294)

## Why are these changes needed?
`Download` logical operator isn't really a map, we should use
`AbstractOneToOne` as the base class instead.

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…OneToOne` (ray-project#56294)

## Why are these changes needed?
`Download` logical operator isn't really a map, we should use
`AbstractOneToOne` as the base class instead.

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Matthew Owen <omatthew98@berkeley.edu>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants