Skip to content

Conversation

@fcharras
Copy link

@fcharras fcharras commented Feb 3, 2023

Fixes #886

I've finally been investigating more in-depth the regression I've seen on our KMeans after #804

The issue lies somewhere in a complicated kernel and I didn't manage to have a minimal reproducer yet,

but at least I've found a hack that is not too complicated that makes it work again, see this PR.

The bug was not a performance bug but a caching bug. Sorry for misleading you into thinking that might be related to performance. The way it surfaced on the benchmarks was with a general slowdown on GPU because rather than segfaulting it rather returns corrupted results that was triggering a slower code branch in the KMeans (cluster relocation). On CPU it gave segfaults, meaning the wrong code was executed, so I could have thought sooner about caching issues.

Basically in sklearn_numba_dpex we factor some code using dpex.func kernel functions that are defined as closures and re-instantiated with different parameters every time it's needed. See https://github.com/soda-inria/sklearn-numba-dpex/blob/main/sklearn_numba_dpex/kmeans/kernels/_base_kmeans_kernel_funcs.py . This is the best solution for factoring redundant code that I've thought of (and it only works as long as there are no local memory allocation or barriers in the factored code) ).

So there we can have dpex.func functions with same code, same __name__, same __qualname__, but that have different globals and id. Globals can include yet other dpex.func functions that have the same characteristics.

It seems that the cache key does not include all the cases we fall into. Adding id(func) solve definitely the issue.

I've said this fix is a "hack" but in fact I don't think it's a bad behavior. It means that every time a kernel or a func are re-instantiated their cache is reset. That could be a feature.

For the record:

  • this code used to work without requiring any hacks around caching with numba < 0.19.0.

  • after the bump to numba == 0.19.0 it broke and I added a similar hack using monkey patching from sklearn_numba_dpex, see https://github.com/soda-inria/sklearn-numba-dpex/blob/main/sklearn_numba_dpex/patches/load_numba_dpex.py#L45

  • after Refactor/kernel interfaces #804 this hack is not necessary anymore , but is must be replaced by the diff shown in this PR

  • Have you provided a meaningful PR description?

  • Have you added a test, reproducer or referred to an issue with a reproducer?

  • Have you tested your changes locally for CPU and GPU devices?

  • Have you made sure that new changes do not introduce compiler warnings?

  • If this PR is a work in progress, are you filing the PR as a draft?

edit: it does not actually fix the issue. But it does seem to be a caching issue on dpex.func functions. Continuing investigation...

I don't have a minimal reproducer yet but I can say that this fixes the issues I've had after IntelPython#804
@fcharras fcharras changed the title Fixing JIT issues after #804 Fixing performance issues after #804 Feb 3, 2023
@fcharras
Copy link
Author

fcharras commented Feb 3, 2023

In fact this does not solve the issue.

@fcharras fcharras closed this Feb 3, 2023
@fcharras fcharras deleted the patch-1 branch February 3, 2023 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perormance regressions introduced by latest changes to main

1 participant