Rename "cardinality_factor" configuration to "unique_fraction"#19273
Merged
rapids-bot[bot] merged 4 commits intorapidsai:branch-25.08from Jul 3, 2025
Merged
Rename "cardinality_factor" configuration to "unique_fraction"#19273rapids-bot[bot] merged 4 commits intorapidsai:branch-25.08from
rapids-bot[bot] merged 4 commits intorapidsai:branch-25.08from
Conversation
3 tasks
TomAugspurger
approved these changes
Jul 2, 2025
Co-authored-by: Tom Augspurger <tom.augspurger88@gmail.com>
3 tasks
Member
Author
|
/merge |
Contributor
|
Failed to merge PR using squash strategy. |
rapids-bot bot
pushed a commit
that referenced
this pull request
Jul 16, 2025
Probably supersedes #19130 The goal of this PR is to define the classes needed to store column statistics for an `IR` node. Some cirteria: - We need the statistics for a column to contain a reference to the underlying datasource information (e.g. unique-value statistics, row-count, and average storage/file size). - We want caching for each datasource and column. - We want the option to perform metadata/data sampling lazily on the datasource. - We want our Parquet partitioning logic to use the same infrastructure (to avoid redundant sampling). - We want to record when a specific statistic is "exact" (rather than estimated). Also related: - #19258 - #19273 Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Tom Augspurger (https://github.com/TomAugspurger) - Matthew Murray (https://github.com/Matt711) URL: #19276
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR splits off some of the changes used by the ongoing column-statistics work (e.g. #19130).
"cardinality_factor"to"unique_fraction", because the original name doesn't really make any sense."cardinality_factor", but this PR adds it (just to be safe)._get_unique_fractionsutility to extract the unique-value statistics for a specific subset of columns. This logic is currently repeated several times, and it will be much easier to incorporate sampled statistics (in a follow-up) if the logic is all in one place.Checklist