-
-
Notifications
You must be signed in to change notification settings - Fork 212
Closed
Description
Description
Error thrown when loading penguins dataset:
ValueError: Categorical categories cannot be null
Steps/Code to Reproduce
import openml
openml.datasets.get_dataset('penguins')Expected Results
Dataset loads without issue.
Actual Results
>>> openml.datasets.get_dataset('penguins')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/functions.py", line 519, in get_dataset
dataset = _create_dataset_from_description(
File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/functions.py", line 1132, in _create_dataset_from_description
return OpenMLDataset(
File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/dataset.py", line 241, in __init__
) = self._create_pickle_in_cache(data_file)
File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/dataset.py", line 526, in _create_pickle_in_cache
X, categorical, attribute_names = self._parse_data_from_arff(data_file)
File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/dataset.py", line 457, in _parse_data_from_arff
self._unpack_categories(X[column_name], categories_names[column_name])
File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/dataset.py", line 686, in _unpack_categories
raw_cat = pd.Categorical(col, ordered=True, categories=categories)
File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/pandas/core/arrays/categorical.py", line 304, in __init__
dtype = CategoricalDtype._from_values_or_dtype(
File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 273, in _from_values_or_dtype
dtype = CategoricalDtype(categories, ordered)
File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 160, in __init__
self._finalize(categories, ordered, fastpath=False)
File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 314, in _finalize
categories = self.validate_categories(categories, fastpath=fastpath)
File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 508, in validate_categories
raise ValueError("Categorical categories cannot be null")
ValueError: Categorical categories cannot be nullVersions
Linux-5.9.16-1-MANJARO-x86_64-with-glibc2.10
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
Pandas 1.2.2
NumPy 1.20.1
SciPy 1.6.0
Scikit-Learn 0.24.1
OpenML 0.11.0