Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
f1919e1
Using sklearn docstring as flow descriptions for sklearn flows
Neeratyoy Aug 5, 2019
0b5137f
Extracting parameter type and descriptions
Neeratyoy Aug 5, 2019
b0ad048
Handling certain edge cases
Neeratyoy Aug 6, 2019
d90f333
More robust failure checks + improved docstrings
Neeratyoy Aug 7, 2019
6dc4345
Trimming of all strings to be uploaded
Neeratyoy Aug 7, 2019
64fa568
Re-enable unit test as server issue is resolved.
PGijsbers Aug 13, 2019
80e5b33
pass skipna=False explicitly
TwsThomas Aug 19, 2019
3880d9a
Sync master and development (#768)
mfeurer Aug 20, 2019
4a6c980
Bump version number (#769)
mfeurer Aug 20, 2019
3d08c2d
Mark unit test as flaky (#770)
mfeurer Aug 20, 2019
58a6609
Fixing edge cases to pass tests
Neeratyoy Aug 24, 2019
41549b0
Fixing PEP8
Neeratyoy Aug 25, 2019
235ded8
Leaner implementation for parameter docstring
Neeratyoy Aug 26, 2019
1c9f64d
Add #737 (#772)
sahithyaravi Sep 2, 2019
9b5d382
Making suggested changes
Neeratyoy Sep 2, 2019
7cbf428
add missing whitespace in error message
amueller Sep 3, 2019
33db051
Merge pull request #776 from amueller/whitespace_typo
mfeurer Sep 4, 2019
27521ac
Merge pull request #766 from TwsThomas/patch-1
mfeurer Sep 4, 2019
43bf02d
Version handling and warning log
Neeratyoy Sep 5, 2019
579498a
Debugging
Neeratyoy Sep 5, 2019
52cbdb7
Debugging phase 2
Neeratyoy Sep 5, 2019
3b44e86
Fixing test cases
Neeratyoy Sep 9, 2019
6710b40
Handling different sklearn versions in unit testing
Neeratyoy Sep 9, 2019
7d685e1
Replace logging.info by logging.warning
mfeurer Sep 13, 2019
c39b9f7
Merge pull request #756 from openml/fix_175
mfeurer Sep 13, 2019
afc7445
Merge pull request #761 from openml/reenable_unittest
mfeurer Sep 13, 2019
5cc1638
FIX assign study's id to study_id for uniformity. (#782)
PGijsbers Sep 20, 2019
fe218bc
raise a warning, not an error, when not matching version exactly (#744)
amueller Sep 26, 2019
dcac17e
store predictions_url in runs (#783)
amueller Sep 26, 2019
8eac076
[WIP] Restructuring the examples section (#785)
ArlindKadra Sep 30, 2019
de0335c
Fix 779 (#787)
PGijsbers Sep 30, 2019
4e03906
Instructions to publish new extensions (#778)
Neeratyoy Sep 30, 2019
f461732
Add username (#790)
sahithyaravi Oct 1, 2019
8cc302d
Add example (#791)
mfeurer Oct 2, 2019
5a2830c
added example strang, and more filter options (#793)
janvanrijn Oct 2, 2019
4020c1e
Add manual task iteration tutorial (#788)
mfeurer Oct 7, 2019
04a6b65
Improve the usage of dataframes in examples (#789)
mfeurer Oct 7, 2019
f241cde
Address comment from Arlind (#802)
mfeurer Oct 7, 2019
1dd54bf
#799: fix mistake in the docs of openml.datasets.functions (#801)
mfeurer Oct 7, 2019
382959f
Add new convenience function get_flow_id (#792)
mfeurer Oct 7, 2019
20a7b62
Replace %-formatting by f-strings in code examples (#798)
konrad Oct 8, 2019
a32f556
Rename argument to be more intuitive (#796)
mfeurer Oct 8, 2019
e1b1652
extended
janvanrijn Oct 11, 2019
3e23a3b
Add example rijn (#803)
janvanrijn Oct 11, 2019
9041dc6
strang example update
janvanrijn Oct 11, 2019
1e85bb6
[WIP] An example that loads and visualizes the iris dataset (#808)
ArlindKadra Oct 11, 2019
2f11939
Fix failing simple_datasets_tutorial example (#812)
ArlindKadra Oct 11, 2019
77cd94b
Merge pull request #807 from openml/extend_example_strang
janvanrijn Oct 11, 2019
24c4821
make output of rijn example a bit nicer
amueller Oct 14, 2019
5f86908
Unit test enabled for list_runs (#817)
prabhant Oct 14, 2019
9467ed4
Add additional part of OpenML error message to exception message (#811)
mfeurer Oct 14, 2019
b259a34
maybe fix link (#816)
amueller Oct 14, 2019
3e14267
make sure repr workes with blank / fresh datasets (#820)
amueller Oct 14, 2019
b96c564
fix issue #305 by not requiring external version in the flow xml (#818)
mfeurer Oct 14, 2019
ef3e4d1
add validation for strings in datasets (#822)
amueller Oct 14, 2019
4853d7c
Example for study and suite (#810)
mfeurer Oct 14, 2019
5b0d4dc
only check strings for new datasets (#824)
amueller Oct 15, 2019
23d4e6f
Fixing fetching of categorical sparse data (#823)
Neeratyoy Oct 15, 2019
29a023c
don't warn if we can convert to dataframe (#829)
amueller Oct 15, 2019
2796b9a
Adding Perrone example for building surrogate
Neeratyoy Oct 15, 2019
17657ab
Merge pull request #815 from amueller/rijn_example_cleanup
janvanrijn Oct 16, 2019
40799f9
warn if there's an empty flow description (#831)
amueller Oct 16, 2019
1a3f456
Intermediate changes; pipeline additions remain
Neeratyoy Oct 16, 2019
6395cd7
Adding list_evaluations_setups() to API docs
Neeratyoy Oct 16, 2019
78e7032
also check dependencies for sklearn string (#830)
amueller Oct 16, 2019
e35262c
Merge pull request #840 from openml/neeratyoy-patch-1
amueller Oct 16, 2019
34d784a
Better error message (#837)
mfeurer Oct 16, 2019
c40e474
add new example regarding svm hyperparameter plotting (#834)
mfeurer Oct 16, 2019
43596e0
Create OpenMLBase, have most OpenML objects derive from it (#828)
PGijsbers Oct 17, 2019
547901f
Fix typos and grammatical errors in docs and examples. (#845)
tashay Oct 17, 2019
35dd7d3
Replace code health by appveyor badge (#843)
mfeurer Oct 17, 2019
c59c3b8
Fix 838 (#846)
sahithyaravi Oct 17, 2019
b1dae0b
Improve SVM test (#848)
mfeurer Oct 17, 2019
cfba39d
Finishing the whole example design
Neeratyoy Oct 17, 2019
9ca9d87
Making pandas related changes suggested by Matthias
Neeratyoy Oct 17, 2019
a5b35e6
Allow datasets without qualities to be downloaded. (#847)
PGijsbers Oct 17, 2019
cd3ba29
minor reformatting
mfeurer Oct 17, 2019
f6a2a95
add a print statement
mfeurer Oct 17, 2019
56fa7f9
Merge pull request #832 from openml/transfer_learning_example
Neeratyoy Oct 18, 2019
2a25ed3
Remove OpenMLDemo unit tests. (#850)
PGijsbers Oct 18, 2019
f74b73a
Put shared logic of Publish into OpenMLBase (#849)
PGijsbers Oct 18, 2019
433f1e7
Optimizing Perrone example (#853)
Neeratyoy Oct 23, 2019
1c025db
Convert non-str column names to str when creating a dataset. (#851)
PGijsbers Oct 23, 2019
d321aba
Add long description (#856)
PGijsbers Oct 24, 2019
d312da0
MAINT prepare new release (#855)
mfeurer Oct 25, 2019
4a13100
redirect test to live server (#859)
mfeurer Oct 29, 2019
882b06b
Add debug output (#860)
mfeurer Nov 4, 2019
34d54d9
Fix736 (#861)
PGijsbers Nov 5, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

A python interface for [OpenML](http://openml.org). You can find the documentation on the [openml-python website](https://openml.github.io/openml-python).

Please commit to the right branches following the gitflow pattern:
http://nvie.com/posts/a-successful-git-branching-model/
A python interface for [OpenML](http://openml.org), an online platform for open science collaboration in machine learning.
It can be used to download or upload OpenML data such as datasets and machine learning experiment results.
You can find the documentation on the [openml-python website](https://openml.github.io/openml-python).
If you wish to contribute to the package, please see our [contribution guidelines](https://github.com/openml/openml-python/blob/develop/CONTRIBUTING.md).

Master branch:

[![Build Status](https://travis-ci.org/openml/openml-python.svg?branch=master)](https://travis-ci.org/openml/openml-python)
[![Code Health](https://landscape.io/github/openml/openml-python/master/landscape.svg)](https://landscape.io/github/openml/openml-python/master)
[![Build status](https://ci.appveyor.com/api/projects/status/blna1eip00kdyr25?svg=true)](https://ci.appveyor.com/project/OpenML/openml-python)
[![Coverage Status](https://coveralls.io/repos/github/openml/openml-python/badge.svg?branch=master)](https://coveralls.io/github/openml/openml-python?branch=master)

Development branch:

[![Build Status](https://travis-ci.org/openml/openml-python.svg?branch=develop)](https://travis-ci.org/openml/openml-python)
[![Code Health](https://landscape.io/github/openml/openml-python/master/landscape.svg)](https://landscape.io/github/openml/openml-python/master)
[![Build status](https://ci.appveyor.com/api/projects/status/blna1eip00kdyr25/branch/develop?svg=true)](https://ci.appveyor.com/project/OpenML/openml-python/branch/develop)
[![Coverage Status](https://coveralls.io/repos/github/openml/openml-python/badge.svg?branch=develop)](https://coveralls.io/github/openml/openml-python?branch=develop)
2 changes: 1 addition & 1 deletion appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,4 @@ build: false

test_script:
- "cd C:\\projects\\openml-python"
- "%CMD_IN_ENV% pytest -n 4 --timeout=600 --timeout-method=thread -sv --ignore='test_OpenMLDemo.py'"
- "%CMD_IN_ENV% pytest -n 4 --timeout=600 --timeout-method=thread -sv"
9 changes: 6 additions & 3 deletions ci_scripts/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,13 @@ pip install -e '.[test]'
python -c "import numpy; print('numpy %s' % numpy.__version__)"
python -c "import scipy; print('scipy %s' % scipy.__version__)"

if [[ "$EXAMPLES" == "true" ]]; then
pip install -e '.[examples]'
fi
if [[ "$DOCTEST" == "true" ]]; then
pip install sphinx_bootstrap_theme
fi
if [[ "$DOCPUSH" == "true" ]]; then
conda install --yes gxx_linux-64 gcc_linux-64 swig
pip install -e '.[examples,examples_unix]'
fi
if [[ "$COVERAGE" == "true" ]]; then
pip install codecov pytest-cov
fi
Expand All @@ -52,3 +53,5 @@ fi
# Install scikit-learn last to make sure the openml package installation works
# from a clean environment without scikit-learn.
pip install scikit-learn==$SKLEARN_VERSION

conda list
2 changes: 1 addition & 1 deletion ci_scripts/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ run_tests() {
PYTEST_ARGS=''
fi

pytest -n 4 --durations=20 --timeout=600 --timeout-method=thread -sv --ignore='test_OpenMLDemo.py' $PYTEST_ARGS $test_dir
pytest -n 4 --durations=20 --timeout=600 --timeout-method=thread -sv $PYTEST_ARGS $test_dir
}

if [[ "$RUN_FLAKE8" == "true" ]]; then
Expand Down
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ Modules

list_evaluations
list_evaluation_measures
list_evaluations_setups

:mod:`openml.flows`: Flow Functions
-----------------------------------
Expand Down
74 changes: 68 additions & 6 deletions doc/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,20 @@ you can use github's assign feature, otherwise you can just leave a comment.
Scope of the package
====================

The scope of the OpenML python package is to provide a python interface to
the OpenML platform which integrates well with pythons scientific stack, most
The scope of the OpenML Python package is to provide a Python interface to
the OpenML platform which integrates well with Python's scientific stack, most
notably `numpy <http://www.numpy.org/>`_ and `scipy <https://www.scipy.org/>`_.
To reduce opportunity costs and demonstrate the usage of the package, it also
implements an interface to the most popular machine learning package written
in python, `scikit-learn <http://scikit-learn.org/stable/index.html>`_.
in Python, `scikit-learn <http://scikit-learn.org/stable/index.html>`_.
Thereby it will automatically be compatible with many machine learning
libraries written in Python.

We aim to keep the package as light-weight as possible and we will try to
keep the number of potential installation dependencies as low as possible.
Therefore, the connection to other machine learning libraries such as
*pytorch*, *keras* or *tensorflow* should not be done directly inside this
package, but in a separate package using the OpenML python connector.
package, but in a separate package using the OpenML Python connector.

.. _issues:

Expand All @@ -52,7 +52,7 @@ contains longer-term goals.
How to contribute
=================

There are many ways to contribute to the development of the OpenML python
There are many ways to contribute to the development of the OpenML Python
connector and OpenML in general. We welcome all kinds of contributions,
especially:

Expand Down Expand Up @@ -158,5 +158,67 @@ Happy testing!
Connecting new machine learning libraries
=========================================

Coming soon - please stay tuned!
Content of the Library
~~~~~~~~~~~~~~~~~~~~~~

To leverage support from the community and to tap in the potential of OpenML, interfacing
with popular machine learning libraries is essential. However, the OpenML-Python team does
not have the capacity to develop and maintain such interfaces on its own. For this, we
have built an extension interface to allows others to contribute back. Building a suitable
extension for therefore requires an understanding of the current OpenML-Python support.

`This example <examples/flows_and_runs_tutorial.html>`_
shows how scikit-learn currently works with OpenML-Python as an extension. The *sklearn*
extension packaged with the `openml-python <https://github.com/openml/openml-python>`_
repository can be used as a template/benchmark to build the new extension.


API
+++
* The extension scripts must import the `openml` package and be able to interface with
any function from the OpenML-Python `API <api.html>`_.
* The extension has to be defined as a Python class and must inherit from
:class:`openml.extensions.Extension`.
* This class needs to have all the functions from `class Extension` overloaded as required.
* The redefined functions should have adequate and appropriate docstrings. The
`Sklearn Extension API :class:`openml.extensions.sklearn.SklearnExtension.html`
is a good benchmark to follow.


Interfacing with OpenML-Python
++++++++++++++++++++++++++++++
Once the new extension class has been defined, the openml-python module to
:meth:`openml.extensions.register_extension.html` must be called to allow OpenML-Python to
interface the new extension.


Hosting the library
~~~~~~~~~~~~~~~~~~~

Each extension created should be a stand-alone repository, compatible with the
`OpenML-Python repository <https://github.com/openml/openml-python>`_.
The extension repository should work off-the-shelf with *OpenML-Python* installed.

Create a `public Github repo <https://help.github.com/en/articles/create-a-repo>`_ with
the following directory structure:

::

| [repo name]
| |-- [extension name]
| | |-- __init__.py
| | |-- extension.py
| | |-- config.py (optionally)



Recommended
~~~~~~~~~~~
* Test cases to keep the extension up to date with the `openml-python` upstream changes.
* Documentation of the extension API, especially if any new functionality added to OpenML-Python's
extension design.
* Examples to show how the new extension interfaces and works with OpenML-Python.
* Create a PR to add the new extension to the OpenML-Python API documentation.


Happy contributing!
2 changes: 1 addition & 1 deletion doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Example
# Publish the experiment on OpenML (optional, requires an API key.
# You can get your own API key by signing up to OpenML.org)
run.publish()
print('View the run online: %s/run/%d' % (openml.config.server, run.run_id))
print(f'View the run online: {openml.config.server}/run/{run.run_id}')

You can find more examples in our `examples gallery <examples/index.html>`_.

Expand Down
50 changes: 50 additions & 0 deletions doc/progress.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,57 @@
Changelog
=========

0.10.1
~~~~~~
* ADD #175: Automatically adds the docstring of scikit-learn objects to flow and its parameters.
* ADD #737: New evaluation listing call that includes the hyperparameter settings.
* ADD #744: It is now possible to only issue a warning and not raise an exception if the package
versions for a flow are not met when deserializing it.
* ADD #783: The URL to download the predictions for a run is now stored in the run object.
* ADD #790: Adds the uploader name and id as new filtering options for ``list_evaluations``.
* ADD #792: New convenience function ``openml.flow.get_flow_id``.
* ADD #861: Debug-level log information now being written to a file in the cache directory (at most 2 MB).
* DOC #778: Introduces instructions on how to publish an extension to support other libraries
than scikit-learn.
* DOC #785: The examples section is completely restructured into simple simple examples, advanced
examples and examples showcasing the use of OpenML-Python to reproduce papers which were done
with OpenML-Python.
* DOC #788: New example on manually iterating through the split of a task.
* DOC #789: Improve the usage of dataframes in the examples.
* DOC #791: New example for the paper *Efficient and Robust Automated Machine Learning* by Feurer
et al. (2015).
* DOC #803: New example for the paper *Don’t Rule Out Simple Models Prematurely:
A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML* by Benjamin
Strang et al. (2018).
* DOC #808: New example demonstrating basic use cases of a dataset.
* DOC #810: New example demonstrating the use of benchmarking studies and suites.
* DOC #832: New example for the paper *Scalable Hyperparameter Transfer Learning* by
Valerio Perrone et al. (2019)
* DOC #834: New example showing how to plot the loss surface for a support vector machine.
* FIX #305: Do not require the external version in the flow XML when loading an object.
* FIX #734: Better handling of *"old"* flows.
* FIX #736: Attach a StreamHandler to the openml logger instead of the root logger.
* FIX #758: Fixes an error which made the client API crash when loading a sparse data with
categorical variables.
* FIX #779: Do not fail on corrupt pickle
* FIX #782: Assign the study id to the correct class attribute.
* FIX #819: Automatically convert column names to type string when uploading a dataset.
* FIX #820: Make ``__repr__`` work for datasets which do not have an id.
* MAINT #796: Rename an argument to make the function ``list_evaluations`` more consistent.
* MAINT #811: Print the full error message given by the server.
* MAINT #828: Create base class for OpenML entity classes.
* MAINT #829: Reduce the number of data conversion warnings.
* MAINT #831: Warn if there's an empty flow description when publishing a flow.
* MAINT #837: Also print the flow XML if a flow fails to validate.
* FIX #838: Fix list_evaluations_setups to work when evaluations are not a 100 multiple.
* FIX #847: Fixes an issue where the client API would crash when trying to download a dataset
when there are no qualities available on the server.
* MAINT #849: Move logic of most different ``publish`` functions into the base class.
* MAINt #850: Remove outdated test code.

0.10.0
~~~~~~

* ADD #737: Add list_evaluations_setups to return hyperparameters along with list of evaluations.
* FIX #261: Test server is cleared of all files uploaded during unit testing.
* FIX #447: All files created by unit tests no longer persist in local.
Expand All @@ -25,6 +74,7 @@ Changelog
* ADD #412: The scikit-learn extension populates the short name field for flows.
* MAINT #726: Update examples to remove deprecation warnings from scikit-learn
* MAINT #752: Update OpenML-Python to be compatible with sklearn 0.21
* ADD #790: Add user ID and name to list_evaluations


0.9.0
Expand Down
10 changes: 5 additions & 5 deletions doc/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ Installation & Set up
~~~~~~~~~~~~~~~~~~~~~~

The OpenML Python package is a connector to `OpenML <https://www.openml.org/>`_.
It allows to use and share datasets and tasks, run
It allows you to use and share datasets and tasks, run
machine learning algorithms on them and then share the results online.

The following tutorial gives a short introduction on how to install and set up
the OpenML python connector, followed up by a simple example.
the OpenML Python connector, followed up by a simple example.

* `Introduction <examples/introduction_tutorial.html>`_

Expand All @@ -52,7 +52,7 @@ Working with tasks
~~~~~~~~~~~~~~~~~~

You can think of a task as an experimentation protocol, describing how to apply
a machine learning model to a dataset in a way that it is comparable with the
a machine learning model to a dataset in a way that is comparable with the
results of others (more on how to do that further down). Tasks are containers,
defining which dataset to use, what kind of task we're solving (regression,
classification, clustering, etc...) and which column to predict. Furthermore,
Expand Down Expand Up @@ -86,7 +86,7 @@ predictions of that run. When a run is uploaded to the server, the server
automatically calculates several metrics which can be used to compare the
performance of different flows to each other.

So far, the OpenML python connector works only with estimator objects following
So far, the OpenML Python connector works only with estimator objects following
the `scikit-learn estimator API <http://scikit-learn.org/dev/developers/contributing.html#apis-of-scikit-learn-objects>`_.
Those can be directly run on a task, and a flow will automatically be created or
downloaded from the server if it already exists.
Expand Down Expand Up @@ -114,7 +114,7 @@ requirements and how to download a dataset:
OpenML is about sharing machine learning results and the datasets they were
obtained on. Learn how to share your datasets in the following tutorial:

* `Upload a dataset <examples/create_upload_tutorial.html>`_
* `Upload a dataset <examples/30_extended/create_upload_tutorial.html>`_

~~~~~~~~~~~~~~~~~~~~~~~
Extending OpenML-Python
Expand Down
4 changes: 4 additions & 0 deletions examples/20_basic/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Introductory Examples
=====================

Introductory examples to the usage of the OpenML python connector.
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
"""
Introduction
============
Setup
=====

An introduction to OpenML, followed up by a simple example.
An example how to set up OpenML-Python followed up by a simple example.
"""
############################################################################
# OpenML is an online collaboration platform for machine learning which allows
Expand Down Expand Up @@ -61,7 +61,7 @@
openml.config.start_using_configuration_for_example()

############################################################################
# When using the main server, instead make sure your apikey is configured.
# When using the main server instead, make sure your apikey is configured.
# This can be done with the following line of code (uncomment it!).
# Never share your apikey with others.

Expand Down Expand Up @@ -96,7 +96,7 @@
# For this tutorial, our configuration publishes to the test server
# as to not crowd the main server with runs created by examples.
myrun = run.publish()
print("kNN on %s: http://test.openml.org/r/%d" % (data.name, myrun.run_id))
print(f"kNN on {data.name}: http://test.openml.org/r/{myrun.run_id}")

############################################################################
openml.config.stop_using_configuration_for_example()
68 changes: 68 additions & 0 deletions examples/20_basic/simple_datasets_tutorial.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""
========
Datasets
========

A basic tutorial on how to list, load and visualize datasets.
"""
############################################################################
# In general, we recommend working with tasks, so that the results can
# be easily reproduced. Furthermore, the results can be compared to existing results
# at OpenML. However, for the purposes of this tutorial, we are going to work with
# the datasets directly.

import openml
############################################################################
# List datasets
# =============

datasets_df = openml.datasets.list_datasets(output_format='dataframe')
print(datasets_df.head(n=10))

############################################################################
# Download a dataset
# ==================

# Iris dataset https://www.openml.org/d/61
dataset = openml.datasets.get_dataset(61)

# Print a summary
print(f"This is dataset '{dataset.name}', the target feature is "
f"'{dataset.default_target_attribute}'")
print(f"URL: {dataset.url}")
print(dataset.description[:500])

############################################################################
# Load a dataset
# ==============

# X - An array/dataframe where each row represents one example with
# the corresponding feature values.
# y - the classes for each example
# categorical_indicator - an array that indicates which feature is categorical
# attribute_names - the names of the features for the examples (X) and
# target feature (y)
X, y, categorical_indicator, attribute_names = dataset.get_data(
dataset_format='dataframe',
target=dataset.default_target_attribute
)
############################################################################
# Visualize the dataset
# =====================

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style("darkgrid")


def hide_current_axis(*args, **kwds):
plt.gca().set_visible(False)


# We combine all the data so that we can map the different
# examples to different colors according to the classes.
combined_data = pd.concat([X, y], axis=1)
iris_plot = sns.pairplot(combined_data, hue="class")
iris_plot.map_upper(hide_current_axis)
plt.show()
Loading