Skip to content

[BLOG] Update on Array API adoption in scikit-learn#946

Merged
rgommers merged 14 commits intoQuansight:mainfrom
lucyleeow:skl-array-api-2026
Mar 4, 2026
Merged

[BLOG] Update on Array API adoption in scikit-learn#946
rgommers merged 14 commits intoQuansight:mainfrom
lucyleeow:skl-array-api-2026

Conversation

@lucyleeow
Copy link
Contributor

@lucyleeow lucyleeow commented Feb 13, 2026

Text styling

  • The blog is written with plain language (where relevant).
  • If there are headers, they use the proper header tags in order to do so (with only one level-one header).
  • All links describe where they link to (for example, check the Quansight labs website).
  • Any kind of styling that the author uses (for example, bold for emphasis) is consistent throughout the blog.

Non-text contents

  • Blog post featured image is in PNG or JPEG format, not SVG.
  • All content is represented as text (for example, images need alt text and videos need captions or descriptive transcripts).
  • If there are emojis, there are not more than three in a row.
  • Don't use flashing gifs or videos.
  • If it were to be read as plain text, the blog still makes sense and no information is missing.

@vercel
Copy link

vercel bot commented Feb 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
labs Ready Ready Preview, Comment Mar 4, 2026 9:07pm

Request Review

Copy link
Contributor

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice post!

Comment on lines +23 to +27
### Vendoring `array-api-compat` and `array-api-extra`

Scikit-learn now vendors both [`array-api-compat`](https://data-apis.org/array-api-compat/) and [`array-api-extra`](https://data-apis.org/array-api-extra/). `array-api-compat` is a wrapper around common array libraries (e.g., PyTorch, CuPy, JAX) that bridges gaps to ensure compatibility with the standard. It enables adoption of backwards incompatible changes while still allowing array libraries time to adopt the standard slowly. `array-api-extra` provides array functions not included in the standard but deemed useful for array-consuming libraries.

We chose to vendor these now much more mature libraries, to avoid the complexity of conditionally handling optional dependencies throughout the codebase. This approach also follows precedent, as also SciPy vendors these packages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of interest, have there been any downsides to vendoring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I was not around for the full discussion but Olivier summarises the pros and cons quite well here:

scikit-learn/scikit-learn#30340 (comment)

There was also some discussion about workflow around getting new functions into array-api-extra - which is handled as before, write private functions and every now and again add them to array-api-extra, so there is no need to wait for array-api-extra to get access to the new functions (scikit-learn/scikit-learn#30340 (comment))

Copy link

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice blog post!

I left a few comments.


The array API standard adopted DLPack as the recommended [data interchange](https://data-apis.org/array-api/latest/design_topics/data_interchange.html#data-interchange) protocol. This protocol is widely implemented in array libraries and offers an efficient, C ABI compatible protocol for array conversion. While this provided us with an easy way to implement these transfers, there were limitations. Cross-device transfer capability was only introduced in DLPack v1, released in September 2024. This meant that only the latest PyTorch and CuPy versions have support for DLPack v1. Moreover, not all array libraries have adopted support yet. We therefore implemented a 'manual' fallback though this requires conversion via NumPy when the transfer involves two non-NumPy arrays. Additionally, there are no DLPack tests in [array-api-tests](https://github.com/data-apis/array-api-tests), a testing suite to verify standard compliance, leaving DLPack implementation bugs easier to overlook. Despite these challenges, scikit-learn will benefit from future improvements, such as addition of a C-level API for DLPack exchange that bypasses Python function calls, offering significant benefit for GPU applications.

Beyond the technical considerations, there were also user interface considerations. How should we inform users that these conversions, which incur memory and performance cost, are occurring? We decided against warnings, which risk being ignored or becoming a nuisance, and instead clearly document this behaviour. Further, we need to determine how to gracefully handle device-specific data type limitations, specifically MPS does not support float64. This requires downcasting which must be clearly communicated to users.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about MPS at the end is interesting, but feels like it got tacked on to a paragraph that is vaguely related. It comes without warning and then ends as quickly as it started. Not sure if it is worth making a whole paragraph about this, if not maybe it is also not worth having it at all? Or have a two-three sentence paragraph about this to explain that different devices have different precision and that how to handle this requires a bit of thought?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point and honestly the MPS downcasting this is probably still a WIP in how it's handled. I think I will do this

or have a two-three sentence paragraph about this to explain that different devices have different precision and that how to handle this requires a bit of thought?

since this is not completely resolved yet 😬

Copy link

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another pass and this reads great to me. Thanks @lucyleeow!

Copy link

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great article, @lucyleeow!

I've found a few typo/wording issues (see comments).

Copy link
Member

@kgryte kgryte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Over to you, @rgommers, for final review and merge.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you @lucyleeow and all reviewers!

Co-authored-by: Pavithra Eswaramoorthy <pavithraes@outlook.com>
@rgommers rgommers merged commit 59b27cc into Quansight:main Mar 4, 2026
2 checks passed
@lucyleeow lucyleeow deleted the skl-array-api-2026 branch March 5, 2026 03:04
@lucyleeow
Copy link
Contributor Author

Thanks all! 🙏

@lucyleeow lucyleeow restored the skl-array-api-2026 branch March 5, 2026 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants