Memory leak when using nogil Cython #51

ogrisel · 2022-04-20T12:30:41Z

Code to reproduce:

pip install numpy scipy cython psutil
git clone https://github.com/scikit-learn/scikit-learn
cd scikit-learn
pip install -e .

from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.datasets import make_classification
import psutil
import gc


X, y = make_classification(n_samples=int(1e4), random_state=42)
print(f"data size: {X.nbytes / 1e6:.1f} MB")

for i in range(5):
    clf = HistGradientBoostingClassifier(max_iter=100).fit(X, y)
    gc.collect()
    print(f"memory usage: {psutil.Process().memory_info().rss / 1e6:.1f} MB")

Using nogil-Cython build of scikit-learn: CPython

$ OPENMP_NUM_THREADS=1 python ~/code/sanbox/debug_memleak.py 
data size: 1.6 MB
memory usage: 701.8 MB
memory usage: 1290.5 MB
memory usage: 1878.0 MB
memory usage: 2466.0 MB
memory usage: 3053.4 MB

Using scikit-learn installed from conda-forge

$ OPENMP_NUM_THREADS=1 python ~/code/sanbox/debug_memleak.py 
data size: 1.6 MB
memory usage: 124.6 MB
memory usage: 124.6 MB
memory usage: 125.1 MB
memory usage: 125.1 MB
memory usage: 125.1 MB

Note: this code is using OpenMP-based threading but the leak still happens when disabling the OpenMP threading layer by setting OPENMP_NUM_THREADS=1 so this problem is probably not related to OpenMP.

Note sure how to debug this. Maybe I could try to use valgrind.

The text was updated successfully, but these errors were encountered:

ogrisel · 2022-04-20T15:17:33Z

I did a valgrind run with only two outer loop iterations and two inner loop iterations (max_iter=2) because valgrind is much slower.

The result is 26 MB of leaked memory. Those mostly comes from those 2 numpy allocations:

$ valgrind --tool=memcheck   --leak-check=full --show-leak-kinds=all --suppressions=/home/ogrisel/code/valgrind-python.supp python ~/code/sanbox/debug_memleak.py &> ~/valgrind.log.txt
==262873== Memcheck, a memory error detector
==262873== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==262873== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==262873== Command: python /home/ogrisel/code/sanbox/debug_memleak.py
==262873== 
data size: 1.6 MB
memory usage: 460.5 MB
memory usage: 480.4 MB
==262873== 
==262873== HEAP SUMMARY:
==262873==     in use at exit: 26,476,050 bytes in 1,824 blocks
==262873==   total heap usage: 28,635 allocs, 26,811 frees, 56,494,443 bytes allocated
[...]
==260677== 
==260677== 11,264,000 bytes in 110 blocks are still reachable in loss record 571 of 572
==260677==    at 0x4843839: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==260677==    by 0x55AB192: _npy_alloc_cache (alloc.c:95)
==260677==    by 0x55AB192: default_malloc (alloc.c:337)
==260677==    by 0x55AB7CE: PyDataMem_UserNEW (alloc.c:412)
==260677==    by 0x5611554: PyArray_NewFromDescr_int (ctors.c:843)
==260677==    by 0x5612408: PyArray_NewFromDescrAndBase (ctors.c:957)
==260677==    by 0x5612408: PyArray_NewFromDescr (ctors.c:942)
==260677==    by 0x5612408: PyArray_Empty (ctors.c:3002)
==260677==    by 0x56BF02F: array_empty (multiarraymodule.c:1973)
==260677==    by 0x38004C: cfunction_vectorcall_FASTCALL_KEYWORDS (methodobject.c:464)
==260677==    by 0x1793D1: PyVectorcall_Call (call.c:243)
==260677==    by 0xC7150F1: __pyx_pf_7sklearn_8ensemble_23_hist_gradient_boosting_9histogram_16HistogramBuilder_2compute_histograms_brute (histogram.c:3400)
==260677==    by 0xC7150F1: __pyx_pw_7sklearn_8ensemble_23_hist_gradient_boosting_9histogram_16HistogramBuilder_3compute_histograms_brute (histogram.c:3230)
==260677==    by 0x3C48B3: _PyEval_Fast (ceval.c:833)
==260677==    by 0x25B9F0: _PyEval_Eval (ceval_meta.c:2831)
==260677==    by 0x25B9F0: _PyEval_EvalFunc (ceval_meta.c:2892)
==260677==    by 0x25FB25: PyEval_EvalCode (ceval_meta.c:4244)
==260677== 
==260677== 11,264,000 bytes in 110 blocks are still reachable in loss record 572 of 572
==260677==    at 0x4843839: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==260677==    by 0x55AB192: _npy_alloc_cache (alloc.c:95)
==260677==    by 0x55AB192: default_malloc (alloc.c:337)
==260677==    by 0x55AB7CE: PyDataMem_UserNEW (alloc.c:412)
==260677==    by 0x5611554: PyArray_NewFromDescr_int (ctors.c:843)
==260677==    by 0x5612408: PyArray_NewFromDescrAndBase (ctors.c:957)
==260677==    by 0x5612408: PyArray_NewFromDescr (ctors.c:942)
==260677==    by 0x5612408: PyArray_Empty (ctors.c:3002)
==260677==    by 0x56BF02F: array_empty (multiarraymodule.c:1973)
==260677==    by 0x38004C: cfunction_vectorcall_FASTCALL_KEYWORDS (methodobject.c:464)
==260677==    by 0x1793D1: PyVectorcall_Call (call.c:243)
==260677==    by 0xC710D59: __pyx_pf_7sklearn_8ensemble_23_hist_gradient_boosting_9histogram_16HistogramBuilder_4compute_histograms_subtraction (histogram.c:4222)
==260677==    by 0xC710D59: __pyx_pw_7sklearn_8ensemble_23_hist_gradient_boosting_9histogram_16HistogramBuilder_5compute_histograms_subtraction (histogram.c:4127)
==260677==    by 0x3720AA: method_vectorcall_VARARGS_KEYWORDS (descrobject.c:349)
==260677==    by 0x3C1DE1: _PyEval_Fast (ceval.c:737)
==260677==    by 0x25B9F0: _PyEval_Eval (ceval_meta.c:2831)
==260677==    by 0x25B9F0: _PyEval_EvalFunc (ceval_meta.c:2892)
==260677== 
==260677== LEAK SUMMARY:
==260677==    definitely lost: 56 bytes in 1 blocks
==260677==    indirectly lost: 0 bytes in 0 blocks
==260677==      possibly lost: 2,800 bytes in 7 blocks
==260677==    still reachable: 26,473,194 bytes in 1,816 blocks
==260677==         suppressed: 0 bytes in 0 blocks
==260677== 
==260677== For lists of detected and suppressed errors, rerun with: -s
==260677== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

Maybe there is a ref counting problem on those numpy arrays?

ogrisel · 2022-04-20T15:20:10Z

Here is the Cython source:

https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/ensemble/_hist_gradient_boosting/histogram.pyx#L135

ogrisel · 2022-04-20T15:24:48Z

This Cython method is called by Python code. The leaked returned array values are explicitly dereferenced by del calls at those lines:

https://github.com/scikit-learn/scikit-learn/blob/2b0d2aad77693cf2bd2d6b32f9337f8f2baba7ab/sklearn/ensemble/_hist_gradient_boosting/grower.py#L571-L579

ogrisel · 2022-04-20T15:31:28Z

@colesbury I think I can try to craft a standalone reproducer based on the analysis above if you need.

colesbury · 2022-04-20T15:40:51Z

@ogrisel thanks for the bug report. Looks like I messed up the patch to Cython memory views.

colesbury · 2022-04-20T15:57:15Z

I had accidentally introduced a memory leak when addressing the thread-safety issue in #50. It's fixed now:

colesbury/cython@f5740f0

You'll need to re-install Cython and re-build scikit-learn:

pip uninstall cython
pip cache purge
pip install cython

There may still be a much smaller memory leak that I haven't tracked down yet.

ogrisel · 2022-04-20T16:14:10Z

I confirm it works!

$ OPENMP_NUM_THREADS=1 python ~/code/sanbox/debug_memleak.py 
data size: 1.6 MB
memory usage: 119.3 MB
memory usage: 119.9 MB
memory usage: 119.9 MB
memory usage: 119.9 MB
memory usage: 120.5 MB
memory usage: 120.5 MB
memory usage: 120.5 MB
memory usage: 120.5 MB
memory usage: 120.5 MB
memory usage: 120.5 MB

Thanks very much for the quick fix!

ogrisel closed this as completed Apr 20, 2022

ogrisel mentioned this issue Apr 21, 2022

CI Experimental [nogil] build of scikit-learn scikit-learn/scikit-learn#23174

Merged

pnthai88 mentioned this issue Jul 10, 2024

Possible memory leak #145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when using nogil Cython #51

Memory leak when using nogil Cython #51

ogrisel commented Apr 20, 2022 •

edited

Loading

ogrisel commented Apr 20, 2022 •

edited

Loading

ogrisel commented Apr 20, 2022

ogrisel commented Apr 20, 2022 •

edited

Loading

ogrisel commented Apr 20, 2022

colesbury commented Apr 20, 2022

colesbury commented Apr 20, 2022

ogrisel commented Apr 20, 2022

Memory leak when using nogil Cython #51

Memory leak when using nogil Cython #51

Comments

ogrisel commented Apr 20, 2022 • edited Loading

Using nogil-Cython build of scikit-learn: CPython

Using scikit-learn installed from conda-forge

ogrisel commented Apr 20, 2022 • edited Loading

ogrisel commented Apr 20, 2022

ogrisel commented Apr 20, 2022 • edited Loading

ogrisel commented Apr 20, 2022

colesbury commented Apr 20, 2022

colesbury commented Apr 20, 2022

ogrisel commented Apr 20, 2022

ogrisel commented Apr 20, 2022 •

edited

Loading

ogrisel commented Apr 20, 2022 •

edited

Loading

ogrisel commented Apr 20, 2022 •

edited

Loading