Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrent.futures.ThreadPoolExecutor does not free memory when shutdown #98467

Open
eMeena opened this issue Oct 19, 2022 · 4 comments
Open
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@eMeena
Copy link

eMeena commented Oct 19, 2022

Bug report

Memory allocated in threads is never freed when using a ThreadPoolExecutor. The expected behavior is that when the executor is shut down, all memory allocated in its threads should be freed. The below code demonstrates this leak in that the the memory usage before allocating memory in threads is significantly less than after.

from concurrent.futures import ThreadPoolExecutor, as_completed
import resource

def process_user(x):
    return bytearray(10000000) 

print('Before', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024, 'MB')

def leak_memory():
    with ThreadPoolExecutor(max_workers=20) as executor:
        futures = [executor.submit(process_user, i) for i in range(100)]
        for future in as_completed(futures):
            cur = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024
            print('Step', cur)

leak_memory()

print('After', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024, 'MB')

Your environment

Python 3.9.x/3.10.x. Mac os and debian.

@eMeena eMeena added the type-bug An unexpected behavior, bug, or error label Oct 19, 2022
@iritkatriel
Copy link
Member

The process_user function returns the allocated memory. Don’t you want the return value to be available in the main thread?

@eMeena
Copy link
Author

eMeena commented Oct 20, 2022

Correct, it should be available in the main thread. The primary issue is that allocated memory is never freed though. It should be freed once futures goes out of scope or when the executor is shutdown which should occur when exciting the with block but that doesn't happen.

@thomasyang18
Copy link

thomasyang18 commented Oct 31, 2022

This isn't even because of threads, this is normal python behavior:

import resource

def leak_mem():
    [bytearray(10000000) for i in range(0, 100)]
    return None

print('Before ', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024, 'MB')

leak_mem()

print('After', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024, 'MB')

For example, this fib program "leaks"

import resource

def get_fib():
    fib = [1, 1]
    for i in range(0, 10**7):
        fib.append((fib[-1] + fib[-2])%100000000)
    return None # never using fib

print('Before ', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024, 'MB')

x = get_fib()

print('After', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024, 'MB')

Editing because I thought that python doesn't destroy on destructor scope. It does (both the threaded example and my example) if you write:

def process(i):
    bytearray(10000000)
    return None

@JelleZijlstra
Copy link
Member

I believe the objects are garbage collected, but the memory isn't necessarily returned to the OS because Python uses a layer of abstraction around the system allocator. So you won't see process-level memory usage going back to the previous level.

However, if you keep doing the same thing, memory usage should stabilize and not keep growing.

@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants