Skip to content
This repository was archived by the owner on May 20, 2025. It is now read-only.
This repository was archived by the owner on May 20, 2025. It is now read-only.

Segfault in multithreaded Cython code in memoryview __dealloc__ in scikit-learn #50

@ogrisel

Description

@ogrisel

Here is the original reproducer:

pip install cython numpy scipy pytest
git clone https://github.com/scikit-learn/scikit-learn
cd scikit-learn
python setup.py develop
gdb --ex r --args python -m pytest -svlx -k "test_parallel[RandomForestClassifier]" sklearn/ensemble/tests/test_forest.py
Fatal Python error: Aborted

Stack (most recent call first):
  File "/home/ogrisel/code/scikit-learn/sklearn/tree/_classes.py", line 964 in fit
  File "/home/ogrisel/code/scikit-learn/sklearn/ensemble/_forest.py", line 189 in _parallel_build_trees
  File "/home/ogrisel/code/scikit-learn/sklearn/utils/fixes.py", line 117 in __call__
  File "/home/ogrisel/nogil-venv/lib/python3.9/site-packages/joblib/parallel.py", line 262 in <listcomp>
  File "/home/ogrisel/nogil-venv/lib/python3.9/site-packages/joblib/parallel.py", line 262 in __call__
  File "/home/ogrisel/nogil-venv/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 595 in __call__
  File "/home/ogrisel/code/nogil/Lib/multiprocessing/pool.py", line 125 in worker
  File "/home/ogrisel/code/nogil/Lib/threading.py", line 886 in run
  File "/home/ogrisel/code/nogil/Lib/threading.py", line 935 in _bootstrap_inner
  File "/home/ogrisel/code/nogil/Lib/threading.py", line 906 in _bootstrap

Note that in this code, joblib is using simple Python-level threads instead of Python worker processes.

with either of the following GDB backtraces:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737175123520) at pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737175123520) at pthread_kill.c:80
#2  __GI___pthread_kill (threadid=140737175123520, signo=signo@entry=6) at pthread_kill.c:91
#3  0x00007ffff7ce6476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7ccc7b7 in __GI_abort () at abort.c:79
#5  0x00007ffff7d2d606 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e7f13d "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7d44afc in malloc_printerr (str=str@entry=0x7ffff7e81d18 "double free or corruption (fasttop)") at malloc.c:5543
#7  0x00007ffff7d463bb in _int_free (av=0x7ffff7ebdc60 <main_arena>, p=0x5555559e8c80, have_lock=0) at malloc.c:4426
#8  0x00007ffff7d48d05 in __GI___libc_free (mem=<optimized out>) at malloc.c:3278
#9  0x00007ffff0fbb0e2 in __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_2__dealloc__ (__pyx_v_self=0x44e5aa10790) at sklearn/tree/_tree.cpp:28305
#10 __pyx_memoryview___dealloc__ (__pyx_v_self=<sklearn.tree._tree.memoryview at remote 0x44e5aa10790>) at sklearn/tree/_tree.cpp:28112
#11 __pyx_tp_dealloc_memoryview (o=<sklearn.tree._tree.memoryview at remote 0x44e5aa10790>) at sklearn/tree/_tree.cpp:39929
#12 0x00007ffff1070177 in _Py_DECREF (op=<optimized out>) at /home/ogrisel/code/nogil/Include/object.h:569
#13 __Pyx_XDEC_MEMVIEW (lineno=24909, have_gil=1, memslice=0x44e5aa00520) at sklearn/tree/_criterion.c:28987
#14 __pyx_tp_dealloc_7sklearn_4tree_10_criterion_Criterion (o=<sklearn.tree._criterion.Gini at remote 0x44e5aa00500>) at sklearn/tree/_criterion.c:24909
#15 __pyx_tp_dealloc_7sklearn_4tree_10_criterion_ClassificationCriterion (o=<sklearn.tree._criterion.Gini at remote 0x44e5aa00500>) at sklearn/tree/_criterion.c:25026

at another run I got the following backtrace:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737191908928) at pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737191908928) at pthread_kill.c:80
#2  __GI___pthread_kill (threadid=140737191908928, signo=signo@entry=6) at pthread_kill.c:91
#3  0x00007ffff7ce6476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7ccc7b7 in __GI_abort () at abort.c:79
#5  0x00007ffff7d2d606 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e7f13d "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7d44afc in malloc_printerr (str=str@entry=0x7ffff7e81cc0 "free(): double free detected in tcache 2") at malloc.c:5543
#7  0x00007ffff7d46a4f in _int_free (av=0x7ffff7ebdc60 <main_arena>, p=0x5555559e8520, have_lock=0) at malloc.c:4360
#8  0x00007ffff7d48d05 in __GI___libc_free (mem=<optimized out>) at malloc.c:3278
#9  0x00007ffff102f352 in __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_2__dealloc__ (__pyx_v_self=0x4fb1d220850) at sklearn/tree/_splitter.c:16407
#10 __pyx_memoryview___dealloc__ (__pyx_v_self=<sklearn.tree._splitter.memoryview at remote 0x4fb1d220850>) at sklearn/tree/_splitter.c:16214
#11 __pyx_tp_dealloc_memoryview (o=<sklearn.tree._splitter.memoryview at remote 0x4fb1d220850>) at sklearn/tree/_splitter.c:27573
#12 0x00007ffff10318b5 in _Py_DECREF (op=<optimized out>) at /home/ogrisel/code/nogil/Include/object.h:569
#13 __Pyx_XDEC_MEMVIEW (lineno=26682, have_gil=1, memslice=0x4fb1d230400) at sklearn/tree/_splitter.c:29899
#14 __pyx_tp_dealloc_7sklearn_4tree_9_splitter_BaseDenseSplitter (o=<sklearn.tree._splitter.BestSplitter at remote 0x4fb1d230290>) at sklearn/tree/_splitter.c:26682
#15 0x000055555581315c in _PyEval_Fast (ts=0x555555ad8540, initial_acc=Register(as_int64 = 8), initial_pc=0x17 <error: Cannot access memory at address 0x17>) at Python/ceval.c:1125

So in both cases this is occurring in the __dealloc__ method of a Cython managed memoryview but from 2 different Cython files in the scikit-learn source code.

Note: I installed the nogil wheels for numpy, scipy and Cython with pip:

  • Cython 0.29.26
  • NumPy 1.22.3
  • SciPy 1.7.1

If you want I can try to spend time to craft a minimal reproducer using only Cython (and probably numpy) without scikit-learn.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions