Skip to content

Add parquet-sampling configuration options#19423

Merged
rapids-bot[bot] merged 8 commits intorapidsai:branch-25.08from
rjzamora:sample-config-options
Jul 20, 2025
Merged

Add parquet-sampling configuration options#19423
rapids-bot[bot] merged 8 commits intorapidsai:branch-25.08from
rjzamora:sample-config-options

Conversation

@rjzamora
Copy link
Copy Markdown
Member

Description

Closes #19389

Adds max_footer_samples and max_row_group_samples configuration options to control metadata/row-group sampling. Although these configuration options are only used by the streaming executor, it felt more natural to add these to ParquetOptions (since they are definitely Parquet specific).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rjzamora rjzamora self-assigned this Jul 17, 2025
@rjzamora rjzamora requested a review from a team as a code owner July 17, 2025 21:29
@rjzamora rjzamora added feature request New feature or request 2 - In Progress Currently a work in progress non-breaking Non-breaking change labels Jul 17, 2025
@rjzamora rjzamora added cudf-polars Issues specific to cudf-polars cudf.polars labels Jul 17, 2025
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jul 17, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Jul 17, 2025
Copy link
Copy Markdown
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Could you also add update test_config_option_from_env in test_config.py to test out setting these via environment variables?

@rjzamora
Copy link
Copy Markdown
Member Author

Could you also add update test_config_option_from_env in test_config.py to test out setting these via environment variables?

Good point - I updated test_parquet_options_from_env (since these are ParquetOptions).

@rjzamora rjzamora added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 2 - In Progress Currently a work in progress labels Jul 20, 2025
@rjzamora
Copy link
Copy Markdown
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 39b0f01 into rapidsai:branch-25.08 Jul 20, 2025
92 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Jul 20, 2025
@rjzamora rjzamora deleted the sample-config-options branch July 20, 2025 11:55
@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot bot commented Jul 20, 2025

Failed to merge PR using squash strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge cudf-polars Issues specific to cudf-polars feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[FEA] Make max_file_samples and max_rg_samples configurable in cudf-polars

4 participants