Add table property to disable/enable parquet column statistics #12770#12771
Add table property to disable/enable parquet column statistics #12770#12771huaxingao merged 6 commits intoapache:mainfrom
Conversation
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetWriter.java
Outdated
Show resolved
Hide resolved
|
@pvary I uploaded a new patch to address review comments, mind to take another look? Thanks. |
|
@szehon-ho @dramaticlly @aokolnychyi @flyrain Appreciate if you can review the PR, thanks. |
1). Moved unitest to parquet module. 2). Added documentation for new table properties.
8abece2 to
2a60dcf
Compare
dramaticlly
left a comment
There was a problem hiding this comment.
Did my first pass to review, asking for some help from @huaxingao as she did column level bloom filter config in iceberg
|
Hi Folks, I uploaded a new patch which addresses the comments, please take a look. |
szehon-ho
left a comment
There was a problem hiding this comment.
Sorry for late review, mostly fine, leaving a few comments.
…et release with fix is updated in Iceberg release. 2. Addressed the comments to remove fieldIdToParquetPath map, replaced with columnNameToParquetPath map. 3. Added missing code for schema update of the new parquet column statistics enable config.
|
@dramaticlly @huaxingao @pvary @szehon-ho
|
|
Thanks @pvary and @huaxingao. |
szehon-ho
left a comment
There was a problem hiding this comment.
Looks great, thanks @huaxiangsun and @huaxingao ! Thanks for entertaining my suggestion!
dramaticlly
left a comment
There was a problem hiding this comment.
LGTM! Do you plan to follow up with default column stats config once parquet change is adopted in iceberg?
Yeah, created #13035 to track the default config. |
|
@szehon-ho Can you help to merge the patch? Thanks! |
|
Thanks @huaxiangsun for the PR! Thanks @pvary @dramaticlly @szehon-ho for the review! |
Thanks @huaxingao! |
…#12770 (apache#12771) * add table properties to disable/enable parquet column statistics * Address review comments 1). Moved unitest to parquet module. 2). Added documentation for new table properties. * Address more review comments * Addess review comments about documentation * 1. Removed default config for column statistics, wait until the parquet release with fix is updated in Iceberg release. 2. Addressed the comments to remove fieldIdToParquetPath map, replaced with columnNameToParquetPath map. 3. Added missing code for schema update of the new parquet column statistics enable config. * refactor code --------- Co-authored-by: Huaxiang Sun <huaxiang_sun@apple.com>
add table properties to disable/enable parquet column statistics