Skip to content

[SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key  #10303

@srinikandi

Description

@srinikandi

Hi we have been facing this issue with Hudi Upserts that are converting a timestamp field which is part of the Composite primary key.
The bulk insert on the table works fine and storing the timestamp in a proper timestamp format. But when the same table has upsert operation (Type 2 SCD), The new row inserted is having Timestamp value is getting converting into EPOCH for the __hoodied_record_key. The actual attribute in the table is having the data in proper timestamp format. This is breaking the type 2 SCD that we are trying to achieve as the subsequent updates are all being treated as new records.

Steps to reproduce the behavior:

  1. Created A COW table using bulk_insert and using a timestamp filed as part of the complex primary key
  2. Performed Upserts on the same time and the primary record key value is having timestamp field value converted to INT

We are using Glue with Hudi 0.12.1

  • Hudi version : 0.12.1

  • Spark version : 3.3

  • Hive version :

  • Hadoop version :

  • Storage (HDFS/S3/GCS..) : S3

  • Running on Docker? (yes/no) : No

Additional context

There was a issue opened about 2 years back and there was no resolution mentioned and the ticket was closed.
#3313

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:writerWrite client and core write operationspriority:criticalProduction degraded; pipelines stalledstatus:triagedIssue has been reviewed and categorized

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions