Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when using nogil Cython #51

Closed
ogrisel opened this issue Apr 20, 2022 · 7 comments
Closed

Memory leak when using nogil Cython #51

ogrisel opened this issue Apr 20, 2022 · 7 comments

Comments

@ogrisel
Copy link

ogrisel commented Apr 20, 2022

Code to reproduce:

pip install numpy scipy cython psutil
git clone https://github.com/scikit-learn/scikit-learn
cd scikit-learn
pip install -e .
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.datasets import make_classification
import psutil
import gc


X, y = make_classification(n_samples=int(1e4), random_state=42)
print(f"data size: {X.nbytes / 1e6:.1f} MB")

for i in range(5):
    clf = HistGradientBoostingClassifier(max_iter=100).fit(X, y)
    gc.collect()
    print(f"memory usage: {psutil.Process().memory_info().rss / 1e6:.1f} MB")

Using nogil-Cython build of scikit-learn: CPython

$ OPENMP_NUM_THREADS=1 python ~/code/sanbox/debug_memleak.py 
data size: 1.6 MB
memory usage: 701.8 MB
memory usage: 1290.5 MB
memory usage: 1878.0 MB
memory usage: 2466.0 MB
memory usage: 3053.4 MB

Using scikit-learn installed from conda-forge

$ OPENMP_NUM_THREADS=1 python ~/code/sanbox/debug_memleak.py 
data size: 1.6 MB
memory usage: 124.6 MB
memory usage: 124.6 MB
memory usage: 125.1 MB
memory usage: 125.1 MB
memory usage: 125.1 MB

Note: this code is using OpenMP-based threading but the leak still happens when disabling the OpenMP threading layer by setting OPENMP_NUM_THREADS=1 so this problem is probably not related to OpenMP.

Note sure how to debug this. Maybe I could try to use valgrind.

@ogrisel
Copy link
Author

ogrisel commented Apr 20, 2022

I did a valgrind run with only two outer loop iterations and two inner loop iterations (max_iter=2) because valgrind is much slower.

The result is 26 MB of leaked memory. Those mostly comes from those 2 numpy allocations:

$ valgrind --tool=memcheck   --leak-check=full --show-leak-kinds=all --suppressions=/home/ogrisel/code/valgrind-python.supp python ~/code/sanbox/debug_memleak.py &> ~/valgrind.log.txt
==262873== Memcheck, a memory error detector
==262873== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==262873== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==262873== Command: python /home/ogrisel/code/sanbox/debug_memleak.py
==262873== 
data size: 1.6 MB
memory usage: 460.5 MB
memory usage: 480.4 MB
==262873== 
==262873== HEAP SUMMARY:
==262873==     in use at exit: 26,476,050 bytes in 1,824 blocks
==262873==   total heap usage: 28,635 allocs, 26,811 frees, 56,494,443 bytes allocated
[...]
==260677== 
==260677== 11,264,000 bytes in 110 blocks are still reachable in loss record 571 of 572
==260677==    at 0x4843839: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==260677==    by 0x55AB192: _npy_alloc_cache (alloc.c:95)
==260677==    by 0x55AB192: default_malloc (alloc.c:337)
==260677==    by 0x55AB7CE: PyDataMem_UserNEW (alloc.c:412)
==260677==    by 0x5611554: PyArray_NewFromDescr_int (ctors.c:843)
==260677==    by 0x5612408: PyArray_NewFromDescrAndBase (ctors.c:957)
==260677==    by 0x5612408: PyArray_NewFromDescr (ctors.c:942)
==260677==    by 0x5612408: PyArray_Empty (ctors.c:3002)
==260677==    by 0x56BF02F: array_empty (multiarraymodule.c:1973)
==260677==    by 0x38004C: cfunction_vectorcall_FASTCALL_KEYWORDS (methodobject.c:464)
==260677==    by 0x1793D1: PyVectorcall_Call (call.c:243)
==260677==    by 0xC7150F1: __pyx_pf_7sklearn_8ensemble_23_hist_gradient_boosting_9histogram_16HistogramBuilder_2compute_histograms_brute (histogram.c:3400)
==260677==    by 0xC7150F1: __pyx_pw_7sklearn_8ensemble_23_hist_gradient_boosting_9histogram_16HistogramBuilder_3compute_histograms_brute (histogram.c:3230)
==260677==    by 0x3C48B3: _PyEval_Fast (ceval.c:833)
==260677==    by 0x25B9F0: _PyEval_Eval (ceval_meta.c:2831)
==260677==    by 0x25B9F0: _PyEval_EvalFunc (ceval_meta.c:2892)
==260677==    by 0x25FB25: PyEval_EvalCode (ceval_meta.c:4244)
==260677== 
==260677== 11,264,000 bytes in 110 blocks are still reachable in loss record 572 of 572
==260677==    at 0x4843839: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==260677==    by 0x55AB192: _npy_alloc_cache (alloc.c:95)
==260677==    by 0x55AB192: default_malloc (alloc.c:337)
==260677==    by 0x55AB7CE: PyDataMem_UserNEW (alloc.c:412)
==260677==    by 0x5611554: PyArray_NewFromDescr_int (ctors.c:843)
==260677==    by 0x5612408: PyArray_NewFromDescrAndBase (ctors.c:957)
==260677==    by 0x5612408: PyArray_NewFromDescr (ctors.c:942)
==260677==    by 0x5612408: PyArray_Empty (ctors.c:3002)
==260677==    by 0x56BF02F: array_empty (multiarraymodule.c:1973)
==260677==    by 0x38004C: cfunction_vectorcall_FASTCALL_KEYWORDS (methodobject.c:464)
==260677==    by 0x1793D1: PyVectorcall_Call (call.c:243)
==260677==    by 0xC710D59: __pyx_pf_7sklearn_8ensemble_23_hist_gradient_boosting_9histogram_16HistogramBuilder_4compute_histograms_subtraction (histogram.c:4222)
==260677==    by 0xC710D59: __pyx_pw_7sklearn_8ensemble_23_hist_gradient_boosting_9histogram_16HistogramBuilder_5compute_histograms_subtraction (histogram.c:4127)
==260677==    by 0x3720AA: method_vectorcall_VARARGS_KEYWORDS (descrobject.c:349)
==260677==    by 0x3C1DE1: _PyEval_Fast (ceval.c:737)
==260677==    by 0x25B9F0: _PyEval_Eval (ceval_meta.c:2831)
==260677==    by 0x25B9F0: _PyEval_EvalFunc (ceval_meta.c:2892)
==260677== 
==260677== LEAK SUMMARY:
==260677==    definitely lost: 56 bytes in 1 blocks
==260677==    indirectly lost: 0 bytes in 0 blocks
==260677==      possibly lost: 2,800 bytes in 7 blocks
==260677==    still reachable: 26,473,194 bytes in 1,816 blocks
==260677==         suppressed: 0 bytes in 0 blocks
==260677== 
==260677== For lists of detected and suppressed errors, rerun with: -s
==260677== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

Maybe there is a ref counting problem on those numpy arrays?

@ogrisel
Copy link
Author

ogrisel commented Apr 20, 2022

@ogrisel
Copy link
Author

ogrisel commented Apr 20, 2022

This Cython method is called by Python code. The leaked returned array values are explicitly dereferenced by del calls at those lines:

https://github.com/scikit-learn/scikit-learn/blob/2b0d2aad77693cf2bd2d6b32f9337f8f2baba7ab/sklearn/ensemble/_hist_gradient_boosting/grower.py#L571-L579

@ogrisel
Copy link
Author

ogrisel commented Apr 20, 2022

@colesbury I think I can try to craft a standalone reproducer based on the analysis above if you need.

@colesbury
Copy link
Owner

@ogrisel thanks for the bug report. Looks like I messed up the patch to Cython memory views.

@colesbury
Copy link
Owner

I had accidentally introduced a memory leak when addressing the thread-safety issue in #50. It's fixed now:

colesbury/cython@f5740f0

You'll need to re-install Cython and re-build scikit-learn:

pip uninstall cython
pip cache purge
pip install cython

There may still be a much smaller memory leak that I haven't tracked down yet.

@ogrisel
Copy link
Author

ogrisel commented Apr 20, 2022

I confirm it works!

$ OPENMP_NUM_THREADS=1 python ~/code/sanbox/debug_memleak.py 
data size: 1.6 MB
memory usage: 119.3 MB
memory usage: 119.9 MB
memory usage: 119.9 MB
memory usage: 119.9 MB
memory usage: 120.5 MB
memory usage: 120.5 MB
memory usage: 120.5 MB
memory usage: 120.5 MB
memory usage: 120.5 MB
memory usage: 120.5 MB

Thanks very much for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants