-
Notifications
You must be signed in to change notification settings - Fork 197
Description
Cloudberry Database version
No response
What happened
When enable_parallel is off, we will insert into only one AO segfile even gp_appendonly_insert_files is > 1.
Think about the case: user set enable_parallel to on, have some data inserted, query and reset it to false.
That will make data skew after user set enable_parallel to off, and there are a lot of data inserted later or an online-steaming ETL(all data would be inserted into only one segfile).
And that make our parallel plan has a bottleneck.
We should take it back, insert into multiple files according to gp_appendonly_insert_files whatever enable_parallel is.
In general, we should try to make AO segfiles as much as gp_appendonly_insert_files and avoid data skew for users, no matter users use parallel or not.
And only keep gp_appendonly_insert_files default value to 4 is enough.
What you think should happen instead
No response
How to reproduce
Need to create cases.
Operating System
Ubuntu
Anything else
By fixing this, to make regression pass , we need to set GUC gp_appendonly_insert_files = 0 when deploying CBDB at CI pipeline. Need help from @sandiandian .
Are you willing to submit PR?
- Yes, I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct.