Skip to content

GH-35785: [C++] Support null type non-key columns for join operation#38383

Closed
llama90 wants to merge 4 commits intoapache:mainfrom
llama90:feature/support-null-type
Closed

GH-35785: [C++] Support null type non-key columns for join operation#38383
llama90 wants to merge 4 commits intoapache:mainfrom
llama90:feature/support-null-type

Conversation

@llama90
Copy link
Copy Markdown
Contributor

@llama90 llama90 commented Oct 21, 2023

Rationale for this change

Supports join on null types for non-key columns.

What changes are included in this PR?

  • Separated the check of supported types for key and non-key columns during join execution
  • Updated the logic associated with Null types for non-key columns.
  • Modified the random tests for Hash Join (TEST(HashJoin, Random)) to allow the creation of Null as non-key column values.

Are these changes tested?

Yes

Are there any user-facing changes?

For users utilizing the Join operation

@llama90 llama90 requested a review from westonpace as a code owner October 21, 2023 12:23
@llama90 llama90 changed the title GH-35785: [C++] Support null type columns for join operation GH-35785: [C++] Support null type non-key columns for join operation Oct 21, 2023
@github-actions github-actions bot added the awaiting review Awaiting review label Oct 21, 2023
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #35785 has been automatically assigned in GitHub to PR creator.

@llama90
Copy link
Copy Markdown
Contributor Author

llama90 commented Oct 21, 2023

There were policy considerations and a personal lack of understanding regarding the code when it came to supporting Null types for key field columns.

Therefore, although there was an intent to support key columns as well, the initial improvements were made to enable support for non-key columns.

    // Constraints
    RandomDataTypeConstraints key_type_constraints;
    key_type_constraints.Default();
    key_type_constraints.withoutNullColumn();

In the code, when the // key_type_constraints.withoutNullColumn(); part is commented out, the following error can be encountered.

error message

Test 55: RIGHT_OUTER EQ_IS parallel = true bloom_filter = false
/Users/lama/workspace/arrow-2/cpp/src/arrow/acero/hash_join_node_test.cc:1231: Failure
Failed
'_error_or_value45.status()' failed with NotImplemented: Function 'equal' has no kernel matching input types (null, null)
/Users/lama/workspace/arrow-2/cpp/src/arrow/compute/expression.cc:551 call.function->DispatchBest(&types)
/Users/lama/workspace/arrow-2/cpp/src/arrow/acero/hash_join_node.cc:386 filter.Bind(filter_schema, exec_context)
/Users/lama/workspace/arrow-2/cpp/src/arrow/acero/hash_join_node.cc:737 schema_mgr->BindFilter(join_options.filter, left_schema, right_schema, plan->query_context()->exec_context())
/Users/lama/workspace/arrow-2/cpp/src/arrow/acero/exec_plan.cc:587 MakeExecNode(this->factory_name, plan, std::move(inputs), *this->options, registry)
/Users/lama/workspace/arrow-2/cpp/src/arrow/acero/exec_plan.cc:582 std::get(input).AddToPlan(plan, registry)
/Users/lama/workspace/arrow-2/cpp/src/arrow/acero/exec_plan.cc:686 with_sink.AddToPlan(exec_plan.get())
/Users/lama/workspace/arrow-2/cpp/src/arrow/acero/hash_join_node_test.cc:945 DeclarationToExecBatches(std::move(join), parallel)
Google Test trace:
/Users/lama/workspace/arrow-2/cpp/src/arrow/acero/hash_join_node_test.cc:1120: RIGHT_OUTER EQ_IS parallel = true bloom_filter = false

It mentions that an "equal" operator for null, null is needed, which seems to require further investigation (however, I'm actually not sure why a join operation on null key columns would be necessary)

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #35785 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added the Status: stale-warning Issues and PRs flagged as stale which are due to be closed if no indication otherwise label Nov 18, 2025
@thisisnic
Copy link
Copy Markdown
Member

Thank you for your contribution. Unfortunately, this
pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label
or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you
do not have repository permissions to reopen the PR, please tag a maintainer.

@llama90
Copy link
Copy Markdown
Contributor Author

llama90 commented Nov 30, 2025

Hello, I don't have enough time to work on this issue/PR at the moment.
So I will close this PR. If anyone is interested, please feel free to continue the work.
Thank you.

@llama90 llama90 closed this Nov 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review Awaiting review Status: stale-warning Issues and PRs flagged as stale which are due to be closed if no indication otherwise

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Support joinining tables with null columns

2 participants