Skip to content

Croissant tag missing in some Croissant supported datasets #3135

@fylux

Description

@fylux

For example the following dataset:

https://huggingface.co/datasets/allenai/c4

Lacks a Croissant tag, not just in the UI but also if filtering by "library:mlcroissant" with the API. However, the Croissant file is available in the API:

https://huggingface.co/api/datasets/allenai/c4/croissant

When looking at the 15k most download HF datasets, around 4k were lacking this tag. Sometimes this might be justified due to a faulty DatasetInfo, but that's not always the case as we have seen with allenai/c4.

fyi @lhoestq

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions