Skip to content

typing: accept sequences for Dataset file loaders#8067

Open
biefan wants to merge 1 commit intohuggingface:mainfrom
biefan:typing/use-sequence-for-path-or-paths-5354
Open

typing: accept sequences for Dataset file loaders#8067
biefan wants to merge 1 commit intohuggingface:mainfrom
biefan:typing/use-sequence-for-path-or-paths-5354

Conversation

@biefan
Copy link
Copy Markdown

@biefan biefan commented Mar 14, 2026

Summary

  • update Dataset.from_csv, from_json, from_parquet, and from_text type hints from list[PathLike] to Sequence[PathLike]
  • normalize non-string sequences to lists before passing paths to readers
  • add path-type tests to cover tuple input for all four loaders

Why

list is invariant for static type checkers, so list[str] can be rejected against list[PathLike] even though str itself is valid path input. Sequence[PathLike] is covariant and better matches real usage.

Validation

  • uv run --python 3.11 --with-editable . --with pytest --with setuptools --with absl-py -m pytest tests/test_arrow_dataset.py::test_dataset_from_csv_path_type tests/test_arrow_dataset.py::test_dataset_from_json_path_type tests/test_arrow_dataset.py::test_dataset_from_parquet_path_type tests/test_arrow_dataset.py::test_dataset_from_text_path_type -q
  • uv run --python 3.11 --with ruff --with setuptools ruff check src/datasets/arrow_dataset.py tests/test_arrow_dataset.py

Fixes #5354

Signed-off-by: biefan <70761325+biefan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider using "Sequence" instead of "List"

1 participant