Skip to content

feat: Add huggingface.LocalHFDataset to kedro-datasets#1373

Open
iwhalen wants to merge 4 commits intokedro-org:mainfrom
iwhalen:feat/add-local-hf-dataset
Open

feat: Add huggingface.LocalHFDataset to kedro-datasets#1373
iwhalen wants to merge 4 commits intokedro-org:mainfrom
iwhalen:feat/add-local-hf-dataset

Conversation

@iwhalen
Copy link
Copy Markdown
Contributor

@iwhalen iwhalen commented Apr 5, 2026

Description

Adds LocalHFDataset for interacting with datasets.Dataset on a filesystem instead of the remote Hugging Face hub.

This has been a big missing feature for me when using huggingface.HFDataset. We can pull things from the hub, but not save them off to a local file / directory.

Development notes

Added docs, tests, ran in a fresh pipeline.

Iterable and in-memory versions have both been tested as well.

Note

I couldn't figure out a good way to save an IterableDataset without looping through it entirely first.

Maybe there's a better way someone knows about.

Updated jsonschema/kedro-catalog.1.00.json.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Updated jsonschema/kedro-catalog-X.XX.json if necessary
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes
  • Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

iwhalen added 3 commits April 5, 2026 14:51
Signed-off-by: iwhalen <ianpatrickwhalen@gmail.com>
Signed-off-by: iwhalen <ianpatrickwhalen@gmail.com>
Signed-off-by: iwhalen <ianpatrickwhalen@gmail.com>
@iwhalen iwhalen changed the title Feat/add local hf dataset feat: Add huggingface.LocalHFDataset to kedro-datasets Apr 5, 2026
@iwhalen iwhalen marked this pull request as ready for review April 5, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant