Skip to content

Add table property to disable/enable parquet column statistics #12770#12771

Merged
huaxingao merged 6 commits intoapache:mainfrom
huaxiangsun:column-stats-config
May 12, 2025
Merged

Add table property to disable/enable parquet column statistics #12770#12771
huaxingao merged 6 commits intoapache:mainfrom
huaxiangsun:column-stats-config

Conversation

@huaxiangsun
Copy link
Copy Markdown
Contributor

add table properties to disable/enable parquet column statistics

@github-actions github-actions bot added the docs label Apr 15, 2025
@huaxiangsun
Copy link
Copy Markdown
Contributor Author

@pvary I uploaded a new patch to address review comments, mind to take another look? Thanks.

@huaxiangsun
Copy link
Copy Markdown
Contributor Author

@szehon-ho @dramaticlly @aokolnychyi @flyrain Appreciate if you can review the PR, thanks.

   1). Moved unitest to parquet module.
   2). Added documentation for new table properties.
@huaxiangsun huaxiangsun force-pushed the column-stats-config branch from 8abece2 to 2a60dcf Compare April 15, 2025 21:04
Copy link
Copy Markdown
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did my first pass to review, asking for some help from @huaxingao as she did column level bloom filter config in iceberg

@huaxiangsun
Copy link
Copy Markdown
Contributor Author

huaxiangsun commented Apr 22, 2025

Hi Folks, I uploaded a new patch which addresses the comments, please take a look.

Copy link
Copy Markdown
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late review, mostly fine, leaving a few comments.

…et release with fix is updated in Iceberg release.

2. Addressed the comments to remove fieldIdToParquetPath map, replaced with columnNameToParquetPath map.
3. Added missing code for schema update of the new parquet column statistics enable config.
@huaxiangsun
Copy link
Copy Markdown
Contributor Author

@dramaticlly @huaxingao @pvary @szehon-ho
I uploaded a new patch today, which

  1. use column name to parquet column path map.
  2. removed default column stats config as it is pending on parquet release.
  3. Add schema update change which is missing from the initial patch.
    Can you take a look? Thanks

Copy link
Copy Markdown
Contributor

@pvary pvary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@huaxiangsun
Copy link
Copy Markdown
Contributor Author

Thanks @pvary and @huaxingao.
@dramaticlly and @szehon-ho can you help to review the latest patch?

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @huaxiangsun and @huaxingao ! Thanks for entertaining my suggestion!

Copy link
Copy Markdown
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Do you plan to follow up with default column stats config once parquet change is adopted in iceberg?

@huaxiangsun
Copy link
Copy Markdown
Contributor Author

LGTM! Do you plan to follow up with default column stats config once parquet change is adopted in iceberg?

Yeah, created #13035 to track the default config.

@huaxiangsun
Copy link
Copy Markdown
Contributor Author

@szehon-ho Can you help to merge the patch? Thanks!

@huaxingao huaxingao merged commit d04be9b into apache:main May 12, 2025
42 checks passed
@huaxingao
Copy link
Copy Markdown
Contributor

Thanks @huaxiangsun for the PR! Thanks @pvary @dramaticlly @szehon-ho for the review!

@huaxiangsun
Copy link
Copy Markdown
Contributor Author

Thanks @huaxiangsun for the PR! Thanks @pvary @dramaticlly @szehon-ho for the review!

Thanks @huaxingao!

devendra-nr pushed a commit to devendra-nr/iceberg that referenced this pull request Dec 8, 2025
…#12770 (apache#12771)

* add table properties to disable/enable parquet column statistics

* Address review comments
   1). Moved unitest to parquet module.
   2). Added documentation for new table properties.

* Address more review comments

* Addess review comments about documentation

* 1. Removed default config for column statistics, wait until the parquet release with fix is updated in Iceberg release.
2. Addressed the comments to remove fieldIdToParquetPath map, replaced with columnNameToParquetPath map.
3. Added missing code for schema update of the new parquet column statistics enable config.

* refactor code

---------

Co-authored-by: Huaxiang Sun <huaxiang_sun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants