Generating statistics for columns of type STRING, BINARY, or for columns not involved in joins can be time-consuming and lead to increased storage usage.
For large tables, it's important to have the ability to skip stats generation for certain columns to improve performance and reduce overhead.
While the ANALYZE command allows specifying which columns to collect statistics for, there is currently no way to control this during data write or update operations.
ref: #17057
ref: https://docs.delta.io/latest/optimizations-oss.html#data-skipping
Generating statistics for columns of type
STRING,BINARY, or for columns not involved in joins can be time-consuming and lead to increased storage usage.For large tables, it's important to have the ability to skip stats generation for certain columns to improve performance and reduce overhead.
While the
ANALYZEcommand allows specifying which columns to collect statistics for, there is currently no way to control this during data write or update operations.ref: #17057
ref: https://docs.delta.io/latest/optimizations-oss.html#data-skipping