Skip to content

Error with nulls in categorical columns #1036

@SuryaThiru

Description

@SuryaThiru

Description

Error thrown when loading penguins dataset:
ValueError: Categorical categories cannot be null

Steps/Code to Reproduce

import openml
openml.datasets.get_dataset('penguins')

Expected Results

Dataset loads without issue.

Actual Results

>>> openml.datasets.get_dataset('penguins')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/functions.py", line 519, in get_dataset
    dataset = _create_dataset_from_description(
  File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/functions.py", line 1132, in _create_dataset_from_description
    return OpenMLDataset(
  File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/dataset.py", line 241, in __init__
    ) = self._create_pickle_in_cache(data_file)
  File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/dataset.py", line 526, in _create_pickle_in_cache
    X, categorical, attribute_names = self._parse_data_from_arff(data_file)
  File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/dataset.py", line 457, in _parse_data_from_arff
    self._unpack_categories(X[column_name], categories_names[column_name])
  File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/openml/datasets/dataset.py", line 686, in _unpack_categories
    raw_cat = pd.Categorical(col, ordered=True, categories=categories)
  File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/pandas/core/arrays/categorical.py", line 304, in __init__
    dtype = CategoricalDtype._from_values_or_dtype(
  File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 273, in _from_values_or_dtype
    dtype = CategoricalDtype(categories, ordered)
  File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 160, in __init__
    self._finalize(categories, ordered, fastpath=False)
  File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 314, in _finalize
    categories = self.validate_categories(categories, fastpath=fastpath)
  File "/home/suryak/Projects/sandbox/mlgauge/env/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py", line 508, in validate_categories
    raise ValueError("Categorical categories cannot be null")
ValueError: Categorical categories cannot be null

Versions

Linux-5.9.16-1-MANJARO-x86_64-with-glibc2.10
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
Pandas 1.2.2
NumPy 1.20.1
SciPy 1.6.0
Scikit-Learn 0.24.1
OpenML 0.11.0

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions