Fixing performance issues after #804 #897
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #886
I've finally been investigating more in-depth the regression I've seen on our KMeans after #804
The issue lies somewhere in a complicated kernel and I didn't manage to have a minimal reproducer yet,
but at least I've found a hack that is not too complicated that makes it work again, see this PR.The bug was not a performance bug but a caching bug. Sorry for misleading you into thinking that might be related to performance. The way it surfaced on the benchmarks was with a general slowdown on GPU because rather than segfaulting it rather returns corrupted results that was triggering a slower code branch in the KMeans (cluster relocation). On CPU it gave segfaults, meaning the wrong code was executed, so I could have thought sooner about caching issues.
Basically in
sklearn_numba_dpexwe factor some code usingdpex.funckernel functions that are defined as closures and re-instantiated with different parameters every time it's needed. See https://github.com/soda-inria/sklearn-numba-dpex/blob/main/sklearn_numba_dpex/kmeans/kernels/_base_kmeans_kernel_funcs.py . This is the best solution for factoring redundant code that I've thought of (and it only works as long as there are no local memory allocation or barriers in the factored code) ).So there we can have
dpex.funcfunctions with same code, same__name__, same__qualname__, but that have different globals andid. Globals can include yet otherdpex.funcfunctions that have the same characteristics.It seems that the cache key does not include all the cases we fall into. Adding
id(func)solve definitely the issue.I've said this fix is a "hack" but in fact I don't think it's a bad behavior. It means that every time a kernel or a func are re-instantiated their cache is reset. That could be a feature.
For the record:
this code used to work without requiring any hacks around caching with
numba < 0.19.0.after the bump to
numba == 0.19.0it broke and I added a similar hack using monkey patching fromsklearn_numba_dpex, see https://github.com/soda-inria/sklearn-numba-dpex/blob/main/sklearn_numba_dpex/patches/load_numba_dpex.py#L45after Refactor/kernel interfaces #804 this hack is not necessary anymore
, but is must be replaced by the diff shown in this PRHave you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
If this PR is a work in progress, are you filing the PR as a draft?
edit: it does not actually fix the issue. But it does seem to be a caching issue on
dpex.funcfunctions. Continuing investigation...