Skip to content

Enable Parallel-aware Hash Left Anti Semi (Not-In) Join #145

@avamingli

Description

@avamingli

Cloudberry Database version

No response

What happened

We have enabled Enable Parallel Hash Left Anti Semi (Not-In) Join(parallel-oblivious) for a NOT IN sql in CBDB.
But it's parallel-oblivious and the inner table would be duplicately processed, it requires the inner table is relatively very small.
And the parallel-oblivious path will be dropped if we have a lot of data on both sides, and planner tends to choose a non-parallel LASJ instead.
We need to implement Parallel-aware Hash Left Anti Semi (Not-In) Join to parallel join with a shared hash table.
However, it's no easy because it's hard to tell NULL values in which batch. We must refactor the parallel HashJoin to tell the right results.

We will benefit a lot on all kinds of NOT IN sql with this feature no matter what the inner sides are .

Working a branch on this.

What you think should happen instead

No response

How to reproduce

NOT IN sql with large table

Operating System

Ubuntu

Anything else

No response

Are you willing to submit PR?

  • Yes, I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions