Skip to content

If_else special form#64

Open
zanmato1984 wants to merge 71 commits intomainfrom
if-else-special
Open

If_else special form#64
zanmato1984 wants to merge 71 commits intomainfrom
if-else-special

Conversation

@zanmato1984
Copy link
Copy Markdown
Owner

Thanks for opening a pull request!

If this is your first pull request you can find detailed information on how to contribute here:

Please remove this line and the above text before creating your pull request.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)

This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Nov 5, 2025

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@zanmato1984
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91c417c22d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +81 to +82
Expression::Expression(Special special)
: impl_(std::make_shared<Impl>(std::move(special))) {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Compute hash before storing special expressions

Expression::Special has a cached hash field, but this constructor never initializes it (unlike the Call constructor), so Expression::hash() later reads an indeterminate value. That makes hashing/equality behavior for special forms nondeterministic and invokes undefined behavior when these expressions are used in hashed containers (for example, deduplication in std::unordered_set<Expression>).

Useful? React with 👍 / 👎.

Comment on lines +894 to +895
Take(batch[i], *batch.selection_vector->data(),
TakeOptions{/*boundcheck=*/false}, exec_context()));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate selection indices before unbounded Take

This path performs Take(..., boundcheck=false) on batch.selection_vector without any release-mode validation beforehand, so an out-of-range index in a user-provided ExecBatch can lead to out-of-bounds memory access instead of a clean Invalid error. Because CallFunction(..., ExecBatch, ...) is public, this is reachable with malformed inputs and should be guarded by SelectionVector::Validate (or boundcheck=true).

Useful? React with 👍 / 👎.

Comment on lines +127 to +128
if (selection->length() == length) {
return std::make_shared<AllPassBranchMask>(length);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid treating duplicate-filled selections as all-pass

Classifying selection->length() == length as AllPassBranchMask is incorrect when the selection contains duplicates (for example [0, 0, 2]): SelectionVector::Validate currently accepts duplicates, so this branch can incorrectly mark missing rows as selected and evaluate conditional branches on rows that were never in the mask.

Useful? React with 👍 / 👎.

@zanmato1984
Copy link
Copy Markdown
Owner Author

@codex thanks for the review, but could you overall rate this change?

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91c417c22d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


ARROW_ASSIGN_OR_RAISE(auto result,
Scatter(dense_result, *batch.selection_vector->data(),
ScatterOptions{/*max_index=*/batch.length - 1}));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use unfiltered length when scattering dense fallback results

ScalarExecutor::ExecuteSelectiveDense scatters with ScatterOptions{/*max_index=*/batch.length - 1}, but ExecBatch still documents that length equals the selection-vector length when a selection is present. If callers follow that contract (e.g. one selected row at index 42 with batch.length == 1), the scatter step is given max_index=0 and rejects/incorrectly handles valid selected indices. This makes documented ExecBatch usage fail for non-selective kernels in this new path.

Useful? React with 👍 / 👎.

Comment on lines +127 to +128
if (selection->length() == length) {
return std::make_shared<AllPassBranchMask>(length);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid treating same-length selection vectors as all-pass

BranchMask::FromSelectionVector assumes selection->length() == length means every row is selected and upgrades to AllPassBranchMask, but SelectionVector::Validate currently allows duplicate sorted indices. A vector like [0, 0, 2] for length=3 passes validation yet does not cover all rows, so this shortcut can make conditional special forms evaluate unintended rows and produce incorrect branch results.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant