Conversation
…signated signature
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91c417c22d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Expression::Expression(Special special) | ||
| : impl_(std::make_shared<Impl>(std::move(special))) {} |
There was a problem hiding this comment.
Compute hash before storing special expressions
Expression::Special has a cached hash field, but this constructor never initializes it (unlike the Call constructor), so Expression::hash() later reads an indeterminate value. That makes hashing/equality behavior for special forms nondeterministic and invokes undefined behavior when these expressions are used in hashed containers (for example, deduplication in std::unordered_set<Expression>).
Useful? React with 👍 / 👎.
| Take(batch[i], *batch.selection_vector->data(), | ||
| TakeOptions{/*boundcheck=*/false}, exec_context())); |
There was a problem hiding this comment.
Validate selection indices before unbounded Take
This path performs Take(..., boundcheck=false) on batch.selection_vector without any release-mode validation beforehand, so an out-of-range index in a user-provided ExecBatch can lead to out-of-bounds memory access instead of a clean Invalid error. Because CallFunction(..., ExecBatch, ...) is public, this is reachable with malformed inputs and should be guarded by SelectionVector::Validate (or boundcheck=true).
Useful? React with 👍 / 👎.
| if (selection->length() == length) { | ||
| return std::make_shared<AllPassBranchMask>(length); |
There was a problem hiding this comment.
Avoid treating duplicate-filled selections as all-pass
Classifying selection->length() == length as AllPassBranchMask is incorrect when the selection contains duplicates (for example [0, 0, 2]): SelectionVector::Validate currently accepts duplicates, so this branch can incorrectly mark missing rows as selected and evaluate conditional branches on rows that were never in the mask.
Useful? React with 👍 / 👎.
|
@codex thanks for the review, but could you overall rate this change? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91c417c22d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| ARROW_ASSIGN_OR_RAISE(auto result, | ||
| Scatter(dense_result, *batch.selection_vector->data(), | ||
| ScatterOptions{/*max_index=*/batch.length - 1})); |
There was a problem hiding this comment.
Use unfiltered length when scattering dense fallback results
ScalarExecutor::ExecuteSelectiveDense scatters with ScatterOptions{/*max_index=*/batch.length - 1}, but ExecBatch still documents that length equals the selection-vector length when a selection is present. If callers follow that contract (e.g. one selected row at index 42 with batch.length == 1), the scatter step is given max_index=0 and rejects/incorrectly handles valid selected indices. This makes documented ExecBatch usage fail for non-selective kernels in this new path.
Useful? React with 👍 / 👎.
| if (selection->length() == length) { | ||
| return std::make_shared<AllPassBranchMask>(length); |
There was a problem hiding this comment.
Avoid treating same-length selection vectors as all-pass
BranchMask::FromSelectionVector assumes selection->length() == length means every row is selected and upgrades to AllPassBranchMask, but SelectionVector::Validate currently allows duplicate sorted indices. A vector like [0, 0, 2] for length=3 passes validation yet does not cover all rows, so this shortcut can make conditional special forms evaluate unintended rows and produce incorrect branch results.
Useful? React with 👍 / 👎.
Thanks for opening a pull request!
If this is your first pull request you can find detailed information on how to contribute here:
Please remove this line and the above text before creating your pull request.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)
This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)