[HUDI-2593][WIP] Enabling virtual keys for the metadata table#3871
[HUDI-2593][WIP] Enabling virtual keys for the metadata table#3871manojpec wants to merge 1 commit intoapache:masterfrom
Conversation
- Meta fields like _hoodie_record_key, _hoodie_commit_time are not needed for the metadata table. Disabling it.
|
Virtual keys cannot be enabled for Metadata table as the KeyGenerator needed for virtual key generation doesn't differentiate between the user data or metadata tables and hence it always looks for the meta fields which metadata tables don't have. |
|
@manojpec Can you please give more details of why virtual keys dont work? Is this a limitation of the metadata table schema or of the way virtual key support is implemented? The metadata table records are very small in size so the overhead of the hudi metadata columns is very high. Hence, virtual keys support would greatly reduce the size of the metadata table. |
|
here is the actual reason we punted on it for now. |
|
But partition path for the metadata table are hardcoded. Can that be helpful? Removing the fields will save a lot of storage space from record level index. |
@prashantwason So far we only have |
|
@prashantwason WIP PR for adding virtual keys support for metadata table is at #3968. Thanks for the patience. |
What is the purpose of the pull request
Enabling the virtual keys for metadata table. Meta fields like _hoodie_record_key, _hoodie_commit_time are not needed for the metadata table.
Brief change log
HoodieWriterConfig used for HoodieBackedTableMetadataWriter is now built with meta fields property disabled.
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.