-
Notifications
You must be signed in to change notification settings - Fork 200
Description
Cloudberry Database version
No response
What happened
We have enabled Enable Parallel Hash Left Anti Semi (Not-In) Join(parallel-oblivious) for a NOT IN sql in CBDB.
But it's parallel-oblivious and the inner table would be duplicately processed, it requires the inner table is relatively very small.
And the parallel-oblivious path will be dropped if we have a lot of data on both sides, and planner tends to choose a non-parallel LASJ instead.
We need to implement Parallel-aware Hash Left Anti Semi (Not-In) Join to parallel join with a shared hash table.
However, it's no easy because it's hard to tell NULL values in which batch. We must refactor the parallel HashJoin to tell the right results.
We will benefit a lot on all kinds of NOT IN sql with this feature no matter what the inner sides are .
Working a branch on this.
What you think should happen instead
No response
How to reproduce
NOT IN sql with large table
Operating System
Ubuntu
Anything else
No response
Are you willing to submit PR?
- Yes, I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct.