Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyThreadState and PyInterpreterState are not freed #99205

Closed
zzzeek opened this issue Nov 7, 2022 · 13 comments
Closed

PyThreadState and PyInterpreterState are not freed #99205

zzzeek opened this issue Nov 7, 2022 · 13 comments
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@zzzeek
Copy link

zzzeek commented Nov 7, 2022

The following test program gains about 10M per second under top, on Python 3.11 only (confirmed for all development stages: 3.11.0a4, 3.11.0b1, 3.11.0rc1, 3.11.0) . 3.10 shows no memory growth.

from threading import Thread

while True:
    t1 = Thread()
    t1.start()

For direct results, here's the same program using the resource module that I just saw used over at #98467:

from threading import Thread
import resource

print("Before", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, "MB")

i = 0

while True:
    i += 1
    t1 = Thread()
    t1.start()
    if i % 10000 == 0:
        print(
            f"memory after {i} iterations:",
            resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024,
            "MB",
        )

The above program under Py 3.11.0 prints:

$ .venv-3.11/bin/python test5.py 
Before 8.14453125 MB
memory after 10000 iterations: 11.6875 MB
memory after 20000 iterations: 15.30078125 MB
memory after 30000 iterations: 18.65234375 MB
memory after 40000 iterations: 22.26171875 MB
memory after 50000 iterations: 25.87109375 MB
memory after 60000 iterations: 29.22265625 MB
memory after 70000 iterations: 32.83203125 MB
memory after 80000 iterations: 36.44140625 MB
memory after 90000 iterations: 39.78125 MB
memory after 100000 iterations: 43.390625 MB
...

under Python 3.10 it prints:

$ .venv-3.10/bin/python test5.py 
Before 9.25 MB
memory after 10000 iterations: 9.46484375 MB
memory after 20000 iterations: 9.47265625 MB
memory after 30000 iterations: 9.47265625 MB
memory after 40000 iterations: 9.47265625 MB
memory after 50000 iterations: 9.47265625 MB
memory after 60000 iterations: 9.47265625 MB
memory after 70000 iterations: 9.47265625 MB
memory after 80000 iterations: 9.47265625 MB

the issue looks extremely similar to another one we just fixed in greenlet, over at python-greenlet/greenlet#328, although this one is much more surprising. Issue #98467 might also be related.

Environment:

$ uname -a
Linux photon3 5.17.14-200.fc35.x86_64 #1 SMP PREEMPT Thu Jun 9 14:02:42 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ python3
Python 3.11.0 (main, Nov  5 2022, 23:27:22) [GCC 11.3.1 20220421 (Red Hat 11.3.1-3)] on linux
@zzzeek zzzeek added the type-bug An unexpected behavior, bug, or error label Nov 7, 2022
@kumaraditya303
Copy link
Contributor

Seems related to the 3.11 new frame handling.

cc @markshannon @brandtbucher

@kumaraditya303 kumaraditya303 added 3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Nov 7, 2022
@brandtbucher
Copy link
Member

I can reproduce this.

At first glance, I'm not certain that its related to the frame stuff: debugging seems to confirm that the allocations and deallocations of stack chunks are balanced, and the issue is present even when changing which allocator we use, etc.

Plus, the leaked memory per thread (~370 bytes) seems too small to be our 16KB chunks. It's almost exactly sizeof(PyThreadState) (360 bytes), though! So my hunch is that we're leaking PyThreadState structs.

@brandtbucher
Copy link
Member

brandtbucher commented Nov 8, 2022

Found it... we don't free the memory if the thread state's _static member is nonzero:

cpython/Python/pystate.c

Lines 754 to 760 in 2eee9d9

static void
free_threadstate(PyThreadState *tstate)
{
if (!tstate->_static) {
PyMem_RawFree(tstate);
}
}

It appears we're not setting it correctly. We initialize each thread by memcpy'ing the initial thread, which is statically allocated and has the member set to 1:

cpython/Python/pystate.c

Lines 847 to 850 in 2eee9d9

// Set to _PyThreadState_INIT.
memcpy(tstate,
&initial._main_interpreter._initial_thread,
sizeof(*tstate));

._initial_thread = _PyThreadState_INIT, \
}
#define _PyThreadState_INIT \
{ \
._static = 1, \

So I think the fix is just to make sure that new threads have tstate->_static initialized to 0.

@ericsnowcurrently: is my understanding correct? I can write a patch, if so.

@brandtbucher
Copy link
Member

Also, it looks like PyInterpreterState has a similar leak.

@brandtbucher brandtbucher self-assigned this Nov 8, 2022
@ericsnowcurrently
Copy link
Member

I want to agree, but something seems off. I'll have to take a look tomorrow.

@brandtbucher
Copy link
Member

Confirmed that setting tstate->_static = false; after the memcpy fixes the issue.

@brandtbucher brandtbucher changed the title creating new threads appears to leak memory rapidly in Python 3.11 / linux PyThreadState and PyInterpreterState are not freed Nov 9, 2022
@gpshead
Copy link
Member

gpshead commented Nov 9, 2022

It might be useful to define APIs to copy a PyInterpreterState and PyThreadStats where all of this can be encapsulated? granted there seems to be only one place in the code that is doing this right now so...

@ericsnowcurrently
Copy link
Member

Having looked over the code again, I agree that tstate->_static = false; is all that needs to happen. Basically, we must consider everything that gets set in _PyThreadState_INIT . _static is the only struct member where we want to set a different value in new_threadstate(). It would be worth adding a note to that effect.

The same applies for PyInterpreterState_New() (interp->_static = false; + note pointing to _PyInterpreterState_INIT ).

(What threw me off last night is that we memcpy() the data out of initial, rather than _PyRuntime. I had missed that.)

@kumaraditya303
Copy link
Contributor

Thinking about this more, we could even get rid of the _static flag entirely and rely on the fact that only the main interpreter and main thread state are statically allocated and the rest are heap allocated. Probably something we can do for 3.12 and later @ericsnowcurrently?

@ericsnowcurrently
Copy link
Member

So you mean check interp == &_PyRuntime._main_interpreter instead of interp->_static (and likewise tstate == &tstate->interp->_initial_thread)? That would work. IIRC, I was planning on having additional statically allocated interpreter states and thread states, where the _static member would be more useful, but decided it wasn't worth it. Looks like _static got left behind.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 9, 2022
…onGH-99268)

(cherry picked from commit 283ab0e)

Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com>
miss-islington added a commit that referenced this issue Nov 9, 2022
(cherry picked from commit 283ab0e)

Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com>
@brandtbucher
Copy link
Member

Fixed, but leaving open to track the above discussion.

@kumaraditya303
Copy link
Contributor

So you mean check interp == &_PyRuntime._main_interpreter instead of interp->_static (and likewise tstate == &tstate->interp->_initial_thread)? That would work. IIRC, I was planning on having additional statically allocated interpreter states and thread states, where the _static member would be more useful, but decided it wasn't worth it. Looks like _static got left behind.

Right, that's what I was thinking, it would make this more obvious when it is statically allocated and yes static allocation only makes sense for main interpreter and thread.

fuzzard pushed a commit to fuzzard/xbmc that referenced this issue Nov 10, 2022
A memleak was found in cpython 3.11.0 that looks to potentially be reasonably serious
Upstream issue is python/cpython#99205
This uses the merged upstream patch python/cpython#99301
until we bump to a newer release (3.11.1+)
CuriousLearner added a commit to CuriousLearner/cpython that referenced this issue Nov 16, 2022
* main: (8272 commits)
  Update Windows readme.txt to clarify Visual Studio required versions (pythonGH-99522)
  pythongh-99460 Emscripten trampolines on optimized METH_O and METH_NOARGS code paths (python#99461)
  pythongh-92647: [Enum] use final status to determine lookup or create (pythonGH-99500)
  pythongh-81057: Move Globals in Core Code to _PyRuntimeState (pythongh-99496)
  Post 3.12.0a2
  pythongh-99300: Use Py_NewRef() in Python/Python-ast.c (python#99499)
  pythongh-93649: Split pytime and datetime tests from _testcapimodule.c (python#99494)
  pythongh-99370: fix test_zippath_from_non_installed_posix (pythonGH-99483)
  pythonGH-99205: remove `_static` field from `PyThreadState` and `PyInterpreterState` (pythonGH-99385)
  pythongh-81057: Move the Remaining Import State Globals to _PyRuntimeState (pythongh-99488)
  pythongh-87604: Avoid publishing list of active per-interpreter audit hooks via the gc module (pythonGH-99373)
  pythongh-93649: Split getargs tests from _testcapimodule.c (python#99346)
  pythongh-81057: Move Global Variables Holding Objects to _PyRuntimeState. (pythongh-99487)
  pythonGH-98219: reduce sleep time in `asyncio` subprocess test (python#99464)
  pythonGH-99388: add `loop_factory` parameter to `asyncio.run` (python#99462)
  pythongh-99300: Use Py_NewRef() in PC/ directory (python#99479)
  pythongh-99300: Use Py_NewRef() in Doc/ directory  (python#99480)
  pythongh-99300: Use Py_NewRef() in Modules/ directory (python#99473)
  pythongh-99300: Use Py_NewRef() in Modules/ directory (python#99469)
  pythongh-99370: Calculate zip path from prefix when in a venv (pythonGH-99371)
  ...
sqlalchemy-bot pushed a commit to sqlalchemy/sqlalchemy that referenced this issue Dec 1, 2022
Py 3.11 is released so I assume we can remove "rc" qualifiers
here

aiosqlite is causing python 3.11 crashes, so I assume this is
due to python/cpython#99205.

I'm only guessing the syntax here, so there may be subsequent
commits if this doesnt work

Change-Id: I6f2ead3e0aca933a972efadf3891edbcdd83501c
@Mekk
Copy link

Mekk commented Jul 31, 2023

I just spent half a day looking for sources of memleaks in py3.11 provided by Ubuntu (22.04 provides python3.11 package containing 3.11.0~rc1-1~22.04 version which seems to contain this bug). Py3.11 final (I tested deadsnakes version) seems to be OK … but with 22.04 fairly widely present chances are I am not the last one.

As a sidenote, in my case problem appeared in code which embeds python and uses PyGILState_Ensure / PyGILState_Release to wrap foreign threads. App which (after simplification):

  • initializes interpreter (Py_Initialize + PyEval_InitThreads)
  • spawns 20 C threads,
  • inside each thread does simple PyGILState_Ensure + PyGILState_Release
  • joins those threads
  • finalizes interpreter (Py_Finalize)

leaked 20 objects of 360 bytes each. Switching number threads to 50 made it leak 50 such objects, etc. ASAN attributed those objecs to PyThreadState_New (called by PyGILState_Ensure).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants