Skip to content

DOC Add basic tests for ONNX support#7805

Merged
rapids-bot[bot] merged 14 commits intorapidsai:mainfrom
betatim:onnx-examples
Feb 25, 2026
Merged

DOC Add basic tests for ONNX support#7805
rapids-bot[bot] merged 14 commits intorapidsai:mainfrom
betatim:onnx-examples

Conversation

@betatim
Copy link
Copy Markdown
Member

@betatim betatim commented Feb 16, 2026

This PR adds a short example in the serialisation notebook and some cuml.accel tests to check that we can use skl2onnx to convert models.

The OneVs* meta estimators also get a low match score when I don't use cuml.accel, so I think this is a bug in skl2onnx - or something I don't understand about how to use it.

SVR and SVC fail with cuml.accel because we miss a private attribute (_gamma). Not investigated how hard it would be to add it. https://github.com/scikit-learn/scikit-learn/blob/29cb1589e444a80b44fdd6911404a96a0e1e49ea/sklearn/svm/_base.py#L247-L259 makes me think not too hard.

Fixes #7809

Short example in the serialisation notebook and some tests to check that
we can use cuml.accel with skl2onnx to convert models.
@betatim betatim requested review from a team as code owners February 16, 2026 15:07
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added conda conda issue Cython / Python Cython or Python issue labels Feb 16, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds onnxruntime and skl2onnx to multiple Conda environment and dependency files, adds them to optional test deps, introduces a new ONNX-focused test module with fixtures and parameterized tests, and applies JSON reformatting to a notebook without semantic changes.

Changes

Cohort / File(s) Summary
Conda environment files
conda/environments/all_cuda-129_arch-aarch64.yaml, conda/environments/all_cuda-129_arch-x86_64.yaml, conda/environments/all_cuda-131_arch-aarch64.yaml, conda/environments/all_cuda-131_arch-x86_64.yaml
Inserted onnxruntime and skl2onnx into dependencies lists; no removals or other edits.
Shared dependency declarations
dependencies.yaml
Added anchors/entries for onnxruntime and skl2onnx and referenced them in common build aggregates.
Python project test deps
python/cuml/pyproject.toml
Appended onnxruntime and skl2onnx to the [project.optional-dependencies] test list.
ONNX test module
python/cuml/cuml_accel_tests/test_onnx.py
New comprehensive test module: dataset fixtures, helper functions (to_onnx, onnx_predict), parameterized classifier/regressor/transformer tests, and xfail markers; introduces ~198 added lines.
Documentation / Notebook
docs/source/pickling_cuml_models.ipynb, docs/source/cuml-accel/limitations.rst
Notebook JSON reformatted with no semantic changes; limitations docs updated to note ONNX export is unsupported for listed estimators.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'DOC Add basic tests for ONNX support' is vague and partially misleading. While the PR does add tests, it also adds dependencies, environment configurations, and documentation changes. The 'DOC' prefix suggests documentation-focused changes, but the main additions are functional test code. Consider a more specific title like 'Add ONNX conversion tests and support' or 'Add skl2onnx tests and environment dependencies' to better reflect the multi-faceted nature of the changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description accurately relates to the changeset, describing the addition of ONNX serialization examples, tests for skl2onnx conversion, and documenting known limitations with specific estimators.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/source/pickling_cuml_models.ipynb`:
- Around line 449-473: Add an explicit instruction and activation step for
cuml.accel so users actually train with cuML proxies before exporting to ONNX:
update the markdown around "Exporting to ONNX via cuml.accel" to state that the
notebook must enable cuml.accel (e.g., run the IPython magic %load_ext
cuml.accel or start the notebook with python -m cuml.accel) and add a small code
cell example showing activation prior to training (so that subsequent calls to
skl2onnx.convert_sklearn() operate on cuml.accel proxy estimators).
🧹 Nitpick comments (2)
python/cuml/cuml_accel_tests/test_onnx.py (2)

128-139: Consider guarding options for estimators within pipelines or meta-estimators.

The hasattr(model, "predict_proba") check on line 131 works for simple estimators, but for meta-estimators like OneVsRestClassifier, the predict_proba attribute may exist as a delegated method. Since those are already xfail, this is fine in practice, but worth noting if the test scope expands.


172-196: Type-name dispatch is fragile for future extensibility.

The if name == "NearestNeighbors" / elif name == "KMeans" / else branching (lines 182–196) relies on class name strings. If cuml.accel proxies change the __name__ of the type, or if new estimators with different output semantics are added, this will silently take the wrong branch.

A more robust approach would be to check for capability (e.g., hasattr(estimator, "kneighbors"), hasattr(estimator, "predict") vs hasattr(estimator, "transform")). That said, for a focused test file with a fixed set of estimators, this is acceptable.

@betatim betatim added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Feb 16, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
dependencies.yaml (1)

404-409: Consider adding version constraints for onnxruntime and skl2onnx.

Most core dependencies in this file carry version bounds (e.g., scikit-learn>=1.5, numpy>=1.23,<3.0). skl2onnx releases are tightly coupled to scikit-learn versions, so a new incompatible release could silently break the ONNX tests or doc builds. Even a loose lower bound (e.g., skl2onnx>=1.17) would help with reproducibility.

@viclafargue
Copy link
Copy Markdown
Contributor

viclafargue commented Feb 17, 2026

The OneVs* meta estimators also get a low match score when I don't use cuml.accel, so I think this is a bug in skl2onnx - or something I don't understand about how to use it.

SVR and SVC fail with cuml.accel because we miss a private attribute (_gamma). Not investigated how hard it would be to add it. https://github.com/scikit-learn/scikit-learn/blob/29cb1589e444a80b44fdd6911404a96a0e1e49ea/sklearn/svm/_base.py#L247-L259 makes me think not too hard.

cuML's C++/CUDA SVC API is fundamentally designed to train binary estimators. In multi-class mode, the cuML SVC estimator trains a set of binary estimators (ovo/ovr). Scikit-Learn on the other end now relies on libsvm to compute the multi-class case. The two are very different data structures that cannot be easily translated from one to an other. Because of this, multi-class SVC is part of the limitations of cuml-accel and it will probably also fail conversion with sklearn-onnx.

@betatim
Copy link
Copy Markdown
Member Author

betatim commented Feb 17, 2026

Just to make sure: The OneVs* estimator problems are unrelated to the SVR/SVC ones.

For me it is fine to keep the SVR/SVC marked as xfail.

Copy link
Copy Markdown
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for kicking this off!

@betatim
Copy link
Copy Markdown
Member Author

betatim commented Feb 18, 2026

@coderabbitai the "Docstring Coverage" check is not useful. dismissing that review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 18, 2026

Tip

For best results, initiate chat on the files or code changes.

@betatim, understood! Thank you for letting me know. Since this PR focuses on adding tests and examples rather than introducing new public API methods, I can see why the docstring coverage check isn't relevant here.


🧠 Learnings used
Learnt from: CR
Repo: rapidsai/cuml PR: 0
File: python/agents.md:0-0
Timestamp: 2026-02-04T17:25:12.566Z
Learning: Applies to python/**/**/cuml/**/*.py : Missing input dimension checks (n_samples, n_features) and not handling edge cases (empty datasets, single sample) must be addressed in Python estimators

Learnt from: CR
Repo: rapidsai/cuml PR: 0
File: python/agents.md:0-0
Timestamp: 2026-02-04T17:25:12.566Z
Learning: Applies to python/**/**/cuml/**/*.py : Logic errors in ML algorithm implementations, incorrect distance metrics, kernels, or loss function implementations, numerical instability causing wrong results, and incorrect model parameter initialization must be corrected

Learnt from: CR
Repo: rapidsai/cuml PR: 0
File: python/agents.md:0-0
Timestamp: 2026-02-04T17:25:12.566Z
Learning: Applies to python/**/**/cuml/**/*.py : Missing docstrings for public methods, undocumented hyperparameters, or missing scikit-learn compatibility notes in documentation must be addressed

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
python/cuml/cuml_accel_tests/test_onnx.py (2)

1-37: Copyright header and xfail markers look good.

The strict=True on all xfail markers is the right choice — tests will alert you when upstream fixes resolve currently-expected failures.

One consideration: the top-level imports of hdbscan and umap (lines 4, 8) are only used for the unsupported xfail entries. If either package is missing, the entire module (including all passing tests) becomes uncollectable. Since they're now declared as test dependencies this may be acceptable, but a defensive approach would be to guard them:

hdbscan = pytest.importorskip("hdbscan")
umap = pytest.importorskip("umap")

This would skip only the relevant tests rather than failing the entire module.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml_accel_tests/test_onnx.py` around lines 1 - 37, Top-level
imports of hdbscan and umap make the whole test module uncollectable if those
optional packages are missing; replace the direct imports with
pytest.importorskip calls (i.e., set hdbscan = pytest.importorskip("hdbscan")
and umap = pytest.importorskip("umap")) so only the tests relying on those
packages are skipped; update the module-level names hdbscan and umap (the ones
referenced when defining xfail markers like xfail_unsupported) accordingly.

159-171: Regressor test is reasonable.

The 1e-2 tolerance is fairly generous; for a tighter smoke test you could consider 1e-4 or 1e-5 for linear models like LinearSVR, while keeping 1e-2 for tree ensembles. Not blocking, but worth revisiting if flaky tests appear.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml_accel_tests/test_onnx.py` around lines 159 - 171, The test
test_onnx_regressor currently uses a fixed tolerance of 1e-2 for all regressors;
change it to use a tighter tolerance for linear models by computing a
per-estimator threshold before the assert (e.g., set tol = 1e-4 or 1e-5 for
linear estimators like LinearSVR and tol = 1e-2 for tree/ensemble estimators),
detect the estimator via its class or name (estimator.__class__ or
estimator.__class__.__name__ / isinstance checks), then replace the hard-coded
1e-2 in the assert with this tol variable so different model classes get
appropriate tolerances.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/pickling_cuml_models.ipynb`:
- Around line 449-463: Update the markdown cell under the "Exporting to ONNX"
section to use the HTML output filename for the cuml-accel link: replace the
relative link "cuml-accel/index.rst" with "cuml-accel/index.html" so the
notebook's literal hyperlink points to the built Sphinx dirhtml output; locate
the markdown cell that contains the sentence mentioning "Via `cuml.accel`" and
modify that hyperlink accordingly.

---

Nitpick comments:
In `@python/cuml/cuml_accel_tests/test_onnx.py`:
- Around line 1-37: Top-level imports of hdbscan and umap make the whole test
module uncollectable if those optional packages are missing; replace the direct
imports with pytest.importorskip calls (i.e., set hdbscan =
pytest.importorskip("hdbscan") and umap = pytest.importorskip("umap")) so only
the tests relying on those packages are skipped; update the module-level names
hdbscan and umap (the ones referenced when defining xfail markers like
xfail_unsupported) accordingly.
- Around line 159-171: The test test_onnx_regressor currently uses a fixed
tolerance of 1e-2 for all regressors; change it to use a tighter tolerance for
linear models by computing a per-estimator threshold before the assert (e.g.,
set tol = 1e-4 or 1e-5 for linear estimators like LinearSVR and tol = 1e-2 for
tree/ensemble estimators), detect the estimator via its class or name
(estimator.__class__ or estimator.__class__.__name__ / isinstance checks), then
replace the hard-coded 1e-2 in the assert with this tol variable so different
model classes get appropriate tolerances.

@betatim
Copy link
Copy Markdown
Member Author

betatim commented Feb 18, 2026

One consideration: the top-level imports of hdbscan and umap (lines 4, 8) are only used for the unsupported xfail entries. If either package is missing, the entire module (including all passing tests) becomes uncollectable. Since they're now declared as test dependencies this may be acceptable, but a defensive approach would be to guard them:

hdbscan = pytest.importorskip("hdbscan")
umap = pytest.importorskip("umap")

This would skip only the relevant tests rather than failing the entire module.

@coderabbitai don't comment suggesting such "defensive" imports, or in general with changes that are "defensive". The suggested changes tend to hide mistakes and legitimate errors. hdbscan is a test dependency, importing it at the top of the file is the one and only way to import it. importorskip is not defensive as it would hide a mistake (that hdbscan is not installed. Add this to your learnings

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 18, 2026

Tip

For best results, initiate chat on the files or code changes.

@betatim, understood! Thank you for the clarification. You're absolutely right—if hdbscan is a declared test dependency, it should be imported at the top level. Using pytest.importorskip() would hide a real configuration error (missing dependency) rather than catching it early. I'll avoid suggesting such "defensive" patterns that mask legitimate errors.


✏️ Learnings added
Learnt from: betatim
URL: https://github.com/rapidsai/cuml/pull/7805

Timestamp: 2026-02-18T11:30:11.134Z
Learning: In the rapidsai/cuml repository, do not suggest using pytest.importorskip() for declared test dependencies in test files. Test dependencies should be imported at the top level (e.g., `import hdbscan`), so that missing dependencies cause immediate failures rather than being silently skipped. This ensures configuration errors are caught early.

Learnt from: betatim
URL: https://github.com/rapidsai/cuml/pull/7805

Timestamp: 2026-02-18T11:30:11.134Z
Learning: In the rapidsai/cuml repository, avoid suggesting "defensive" coding patterns that hide mistakes or legitimate errors. Code should fail loudly when there are genuine issues (like missing dependencies) rather than silently working around them.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: csadorf
Repo: rapidsai/cuml PR: 7323
File: python/cuml/tests/test_umap_hypothesis.py:433-462
Timestamp: 2026-02-06T16:17:24.795Z
Learning: In the rapidsai/cuml repository, pytest.importorskip("cuvs") guards in test files are specifically for the optional cuVS Python package dependency, not for GPU/CUDA availability, since GPU is assumed to always be present in the test environment.

Learnt from: CR
Repo: rapidsai/cuml PR: 0
File: python/agents.md:0-0
Timestamp: 2026-02-04T17:25:12.566Z
Learning: Applies to python/**/**/cuml/**/*.py : Missing input dimension checks (n_samples, n_features) and not handling edge cases (empty datasets, single sample) must be addressed in Python estimators

Learnt from: betatim
Repo: rapidsai/cuml PR: 7753
File: python/cuml/tests/test_sklearn_compatibility.py:42-44
Timestamp: 2026-02-04T09:04:31.677Z
Learning: In the rapidsai/cuml repository, it's acceptable for test files to require newer versions of dependencies (e.g., scikit-learn 1.8) via pytest.importorskip(), even when the library itself supports older versions (e.g., scikit-learn 1.5+). This allows newer test features to be conditionally enabled without forcing all users to upgrade.

Learnt from: CR
Repo: rapidsai/cuml PR: 0
File: python/agents.md:0-0
Timestamp: 2026-02-04T17:25:12.566Z
Learning: Applies to python/**/test_sklearn_compatibility.py : New estimators must be added to test_sklearn_compatibility.py estimator list for automatic conformance checking

Learnt from: CR
Repo: rapidsai/cuml PR: 0
File: python/agents.md:0-0
Timestamp: 2026-02-04T17:25:12.566Z
Learning: Applies to python/**/test_*.py : Test files must validate numerical correctness by comparing with scikit-learn, include edge case coverage (empty datasets, single sample, high-dimensional data), test fit/predict/transform consistency, and test different input types (cuDF, pandas, NumPy)

Learnt from: csadorf
Repo: rapidsai/cuml PR: 7323
File: python/cuml/tests/test_umap_hypothesis.py:433-462
Timestamp: 2026-02-06T16:18:07.445Z
Learning: In rapidsai/cuml repository, all tests assume CUDA and GPU availability as a prerequisite. GPU operations (like cuml.datasets.make_blobs()) at module import time are acceptable because the test environment always has GPU/CUDA present.

Learnt from: csadorf
Repo: rapidsai/cuml PR: 7323
File: python/cuml/tests/test_umap_hypothesis.py:433-462
Timestamp: 2026-02-06T16:17:24.795Z
Learning: In the rapidsai/cuml repository, cuML tests assume that CUDA and GPUs are always present in the test environment, so GPU memory allocation at module import time (e.g., in example() decorators) is acceptable and does not require deferral.

Learnt from: CR
Repo: rapidsai/cuml PR: 0
File: python/agents.md:0-0
Timestamp: 2026-02-04T17:25:12.566Z
Learning: Applies to python/**/**/cuml/**/*.py : Logic errors in ML algorithm implementations, incorrect distance metrics, kernels, or loss function implementations, numerical instability causing wrong results, and incorrect model parameter initialization must be corrected

Learnt from: CR
Repo: rapidsai/cuml PR: 0
File: python/agents.md:0-0
Timestamp: 2026-02-04T17:25:12.566Z
Learning: Applies to python/**/**/cuml/**/*.py : Silent data corruption from type coercion, incorrect handling of cuDF vs pandas vs NumPy inputs, or missing validation causing crashes on invalid input must be addressed

Learnt from: CR
Repo: rapidsai/cuml PR: 0
File: python/agents.md:0-0
Timestamp: 2026-02-04T17:25:12.566Z
Learning: Applies to python/**/**/cuml/**/*.py : Missing docstrings for public methods, undocumented hyperparameters, or missing scikit-learn compatibility notes in documentation must be addressed

Learnt from: CR
Repo: rapidsai/cuml PR: 0
File: python/agents.md:0-0
Timestamp: 2026-02-04T17:25:12.566Z
Learning: Applies to python/**/test_*.py : Test files must use synthetic data or bundled datasets and never depend on external resources

@betatim
Copy link
Copy Markdown
Member Author

betatim commented Feb 18, 2026

The docs build fails because we use scikit-learn and then enable the accelerator afterwards. I can't think of a good solution for this (we'd have to restart the kernel mid notebook, too much complexity I think).

Possible fixes: remove the cuml.accel section (it "just works" anyway), make a separate notebook that explains the cuml.accel workflow, or add a short paragraph in the cuml.accel specific docs (but where?). I'm thinking that adding a notebook to https://docs.rapids.ai/api/cuml/stable/cuml-accel/examples/ is the thing to do

@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented Feb 18, 2026

The docs build fails because we use scikit-learn and then enable the accelerator afterwards. I can't think of a good solution for this (we'd have to restart the kernel mid notebook, too much complexity I think).

Possible fixes: remove the cuml.accel section (it "just works" anyway), make a separate notebook that explains the cuml.accel workflow, or add a short paragraph in the cuml.accel specific docs (but where?). I'm thinking that adding a notebook to https://docs.rapids.ai/api/cuml/stable/cuml-accel/examples/ is the thing to do

Right now the documentation is very clearly split between the "native cuML API" and "cuml.accel". I'd suggest that we maintain that clear separation. It might lead to some duplication, but that's ok. We can consider a more holistic reorganisation of our docs in a follow-up.

@betatim
Copy link
Copy Markdown
Member Author

betatim commented Feb 24, 2026

This is ready for reviewing. The failing tests should be fixed by #7830 and are unrelated to this PR

Copy link
Copy Markdown
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file was partially re-formatted. Would be good if we could avoid that in the future to keep the diff meaningful.

Comment on lines +28 to +38
xfail_unsupported = pytest.mark.xfail(
reason="not supported by skl2onnx", strict=True
)
xfail_proxy_private_attr = pytest.mark.xfail(
reason="skl2onnx accesses private attributes not exposed by the proxy",
strict=True,
)
xfail_skl2onnx_bug = pytest.mark.xfail(
reason="skl2onnx conversion error", strict=True
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if we could track the unsupported estimators also in an issue.

@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented Feb 25, 2026

/merge

@rapids-bot rapids-bot bot merged commit cb23122 into rapidsai:main Feb 25, 2026
184 of 186 checks passed
@betatim betatim deleted the onnx-examples branch February 26, 2026 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conda conda issue Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test and document ONNX-based inference workflows for cuml.accel

5 participants