Skip to content

[Data] Add include_row_hash to read_parquet #61410

@wingkitlee0

Description

@wingkitlee0

Description

Having a row hash column will be useful for checkpointing, e.g., in Ray Data pipeline or Ray Train use cases

Use case

Ray Data (read-map-write): #59409
requires an existing "id" column. This PR will automatically generate a row_hash column that simplifies that.

Ray Train:

Metadata

Metadata

Assignees

No one assigned

    Labels

    community-backlogdataRay Data-related issuesenhancementRequest for new feature and/or capabilitytrainRay Train Related IssuetriageNeeds triage (eg: priority, bug/not-bug, and owning component)usability

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions