Skip to content

[Bug] Enable AO/AOCO insert to multiple files even enable_parallel is off #39

@avamingli

Description

@avamingli

Cloudberry Database version

No response

What happened

When enable_parallel is off, we will insert into only one AO segfile even gp_appendonly_insert_files is > 1.

Think about the case: user set enable_parallel to on, have some data inserted, query and reset it to false.

That will make data skew after user set enable_parallel to off, and there are a lot of data inserted later or an online-steaming ETL(all data would be inserted into only one segfile).

And that make our parallel plan has a bottleneck.

We should take it back, insert into multiple files according to gp_appendonly_insert_files whatever enable_parallel is.

In general, we should try to make AO segfiles as much as gp_appendonly_insert_files and avoid data skew for users, no matter users use parallel or not.

And only keep gp_appendonly_insert_files default value to 4 is enough.

What you think should happen instead

No response

How to reproduce

Need to create cases.

Operating System

Ubuntu

Anything else

By fixing this, to make regression pass , we need to set GUC gp_appendonly_insert_files = 0 when deploying CBDB at CI pipeline. Need help from @sandiandian .

Are you willing to submit PR?

  • Yes, I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

help wantedExtra attention is neededtype: BugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions