[Data] [1/n] Predicate Expression Support#56313
[Data] [1/n] Predicate Expression Support#56313alexeykudinkin merged 10 commits intoray-project:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces valuable support for predicate expressions in Ray Data, including new unary and binary operations, the UnaryExpr and PredicateExpr classes, and a where() function. The implementation is well-structured and the new features are integrated cleanly into the existing expression system. The test coverage is extensive, which is great. However, I've identified a few test cases where the expected outcomes for expressions involving NULL values seem to be incorrect due to a misunderstanding of three-valued logic. I've left specific comments on these tests. Addressing these will ensure the test suite is robust and accurately reflects the expression engine's behavior.
Signed-off-by: Goutam V. <goutam@anyscale.com>
Signed-off-by: Goutam V. <goutam@anyscale.com>
7692898 to
c806c0c
Compare
| if isinstance(expr, UnaryExpr): | ||
| return ops[expr.op](_eval_expr_recursive(expr.operand, batch, ops)) |
There was a problem hiding this comment.
In a follow-up:
- Let's create a visitor to work with exprs
- Let's make ops part of the visitor state (to avoid passing it as args)
There was a problem hiding this comment.
Just so that I'm clear, are you referring to an ADT?
Signed-off-by: Goutam V. <goutam@anyscale.com>
Signed-off-by: Goutam V. <goutam@anyscale.com>
Signed-off-by: Goutam V. <goutam@anyscale.com>
| _eval_expr_recursive(expr.right, batch, ops), | ||
| ) | ||
| if isinstance(expr, UnaryExpr): | ||
| # TODO: Use Visitor pattern here and store ops in shared state. |
There was a problem hiding this comment.
wdym by storing ops in a shared state?
There was a problem hiding this comment.
In reference to this comment: #56313 (comment)
| SUB: Subtraction operation (-) | ||
| MUL: Multiplication operation (*) | ||
| DIV: Division operation (/) | ||
| FLOORDIV: Floor division operation (//) |
There was a problem hiding this comment.
Pyarrow doesn't have a native mod kernel. Could be a good follow up
Signed-off-by: Goutam V. <goutam@anyscale.com>
| result_df = result_ds.to_pandas() | ||
|
|
||
| expected_df = pd.DataFrame( | ||
| { | ||
| "product": ["A", "B", "C", "D", "E"], | ||
| "quantity": [10, 5, 20, 15, 3], | ||
| "price": [100, 200, 50, 75, 300], | ||
| "region": ["North", "South", "North", "East", "West"], | ||
| "revenue": [1000, 1000, 1000, 1125, 900], | ||
| "is_high_value": [True, True, True, True, False], | ||
| "is_bulk_order": [True, False, True, True, False], | ||
| "is_premium": [True, True, False, False, True], | ||
| "needs_special_handling": [True, True, True, True, False], | ||
| "is_north_region": [True, False, True, False, False], | ||
| } | ||
| ) | ||
|
|
||
| pd.testing.assert_frame_equal(result_df, expected_df, check_dtype=False) |
There was a problem hiding this comment.
There's no filtering in this test
Signed-off-by: Goutam V. <goutam@anyscale.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Adds support for predicate expressions in Ray Data's Expression System. Involves the following: 1. Support for unary operations (NOT, IS_NULL(), IN and their inverses) 2. Add `PredicateExpr` to expression evaluator ## Related issue number ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Zhiqiang Ma <zhiqiang.ma@intel.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Adds support for predicate expressions in Ray Data's Expression System. Involves the following: 1. Support for unary operations (NOT, IS_NULL(), IN and their inverses) 2. Add `PredicateExpr` to expression evaluator ## Related issue number ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: zac <zac@anyscale.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Adds support for predicate expressions in Ray Data's Expression System. Involves the following: 1. Support for unary operations (NOT, IS_NULL(), IN and their inverses) 2. Add `PredicateExpr` to expression evaluator ## Related issue number ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Adds support for predicate expressions in Ray Data's Expression System. Involves the following: 1. Support for unary operations (NOT, IS_NULL(), IN and their inverses) 2. Add `PredicateExpr` to expression evaluator ## Related issue number ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Marco Stephan <marco@magic.dev>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Adds support for predicate expressions in Ray Data's Expression System. Involves the following: 1. Support for unary operations (NOT, IS_NULL(), IN and their inverses) 2. Add `PredicateExpr` to expression evaluator ## Related issue number ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Adds support for predicate expressions in Ray Data's Expression System. Involves the following: 1. Support for unary operations (NOT, IS_NULL(), IN and their inverses) 2. Add `PredicateExpr` to expression evaluator ## Related issue number ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Adds support for predicate expressions in Ray Data's Expression System. Involves the following: 1. Support for unary operations (NOT, IS_NULL(), IN and their inverses) 2. Add `PredicateExpr` to expression evaluator ## Related issue number ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com>
Original PR #56313 by goutamvenkat-anyscale Original: ray-project/ray#56313
Merged from original PR #56313 Original: ray-project/ray#56313
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Adds support for predicate expressions in Ray Data's Expression System. Involves the following: 1. Support for unary operations (NOT, IS_NULL(), IN and their inverses) 2. Add `PredicateExpr` to expression evaluator ## Related issue number ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Adds support for predicate expressions in Ray Data's Expression System. Involves the following: 1. Support for unary operations (NOT, IS_NULL(), IN and their inverses) 2. Add `PredicateExpr` to expression evaluator ## Related issue number ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Why are these changes needed?
Adds support for predicate expressions in Ray Data's Expression System.
Involves the following:
PredicateExprto expression evaluatorRelated issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.