[HUDI-2909] Handle logical type in TimestampBasedKeyGenerator#4203
[HUDI-2909] Handle logical type in TimestampBasedKeyGenerator#4203nsivabalan merged 5 commits intoapache:masterfrom
Conversation
|
@codope : is this good to review. If there are any pending work, I can take it up as I am focusing on all issues and jiras. let me know. |
e12df0f to
5b68cad
Compare
@nsivabalan It is now ready to review. Both row-writer and non-row writer path will emit same value for logical timestamp type column if the config is enabled. |
nsivabalan
left a comment
There was a problem hiding this comment.
Let's see if we can avoid changing public apis in KeyGenerator.
| } | ||
|
|
||
| public boolean isConsistentLogicalTimestampEnabled() { | ||
| if (getBoolean(KeyGeneratorOptions.KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED) == null) { |
There was a problem hiding this comment.
wouldn't getBoolean return the default if not found?
|
@nsivabalan @codope |
@YannByron You bring up a good point. Adding another config in future is tedious. However, the intention behind adding a new config was to avoid discrepancy in existing pipelines. @nsivabalan has explained this in more detail on the jira HUDI-2909. I do not expect such changes to be frequent. Nevertheless, i'll try to avoid making incompatible changes to public APIs. |
5b68cad to
2c0565c
Compare
|
@codope : TestGenericRecordAndRowConsistency test failed in CI. Can you please check. |
fde98bb to
cb2d16c
Compare
|
@hudi-bot run azure |
Add config to enable consistent logical timestamp values Address review comments Add an example in the config doc Cleanup spark context in test
cb2d16c to
9c22d6c
Compare
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
| */ | ||
| public static String getNestedFieldValAsString(GenericRecord record, String fieldName, boolean returnNullIfNotFound) { | ||
| Object obj = getNestedFieldVal(record, fieldName, returnNullIfNotFound); | ||
| public static String getNestedFieldValAsString(GenericRecord record, String fieldName, boolean returnNullIfNotFound, boolean consistentLogicalTimestampEnabled) { |
There was a problem hiding this comment.
@codope @nsivabalan folks, i don't believe this is the right fix to address the problem: bubbling down this config value we're now exposing every user of this API to be aware of it, which, very likely, will have very little to do with it.
…#4203) * [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
…#4203) * [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
…#4203) * [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
What is the purpose of the pull request
#3944 handled logical timestamp type for complex keygen. This PR fixes HUDI-2909 by handing it for timestamp based keygen.
When the new config
hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabledis set to true, consistent value will be generated for a logical timestamp type column like timestamp-millis and timestamp-micros, irrespective of whether row-writer is enabled. Disabled by default so as not to break the pipeline that deploy either fully row-writer path or non row-writer path. For example, if it is kept disabled then record key of timestamp type with value2016-12-29 09:54:00will be written as timestamp2016-12-29 09:54:00.0in row-writer path, while it will be written as long value1483023240000000in non row-writer path. If enabled, then the timestamp value will be written in both the cases.Brief change log
(for example:)
Verify this pull request
Added UT and manually verified with spark datasource.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.