Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More thread-safe GC #529

Merged
merged 23 commits into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 85 additions & 27 deletions src/GC/GC.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,77 +3,135 @@

Garbage collection of Python objects.

See `disable` and `enable`.
See [`enable`](@ref), [`disable`](@ref) and [`gc`](@ref).
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
"""
module GC

using ..C: C

const ENABLED = Ref(true)
const QUEUE = C.PyPtr[]
const QUEUE_LOCK = Threads.SpinLock()
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
const HOOK = WeakRef()

"""
PythonCall.GC.disable()

Disable the PythonCall garbage collector.
Do nothing.

This means that whenever a Python object owned by Julia is finalized, it is not immediately
freed but is instead added to a queue of objects to free later when `enable()` is called.
!!! note

Like most PythonCall functions, you must only call this from the main thread.
Historically this would disable the PythonCall garbage collector. This was required
for safety in multi-threaded code but is no longer needed, so this is now a no-op.
"""
function disable()
ENABLED[] = false
return
end
disable() = nothing
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
cjdoris marked this conversation as resolved.
Show resolved Hide resolved

"""
PythonCall.GC.enable()

Re-enable the PythonCall garbage collector.
Do nothing.

This frees any Python objects which were finalized while the GC was disabled, and allows
objects finalized in the future to be freed immediately.
!!! note

Like most PythonCall functions, you must only call this from the main thread.
Historically this would enable the PythonCall garbage collector. This was required
for safety in multi-threaded code but is no longer needed, so this is now a no-op.
"""
function enable()
ENABLED[] = true
if !isempty(QUEUE)
for ptr in QUEUE
if ptr != C.PyNULL
C.Py_DecRef(ptr)
end
enable() = nothing
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
cjdoris marked this conversation as resolved.
Show resolved Hide resolved

"""
PythonCall.GC.gc()

Free any Python objects waiting to be freed.

These are objects that were finalized from a thread that was not holding the Python
GIL at the time.

Like most PythonCall functions, this must only be called from the main thread (i.e. the
thread currently holding the Python GIL.)
"""
function gc()
if C.CTX.is_initialized
unsafe_free_queue()
end
nothing
end

function unsafe_free_queue()
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
lock(QUEUE_LOCK)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the lock(...) do ... API since it will still unlock in the case of an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I considered it but (a) it adds 30ns and (b) the code in between should not throw.

Copy link
Contributor

@MilesCranmer MilesCranmer Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's the additional function call from the closure. In that case I would try

lock(QUEUE_LOCK)
try
    #= do stuff =#
finally
    unlock(QUEUE_LOCK)
end

And hopefully that's a bit faster.

If anything it's just a good habit. I think there's always a possibility for some kind of error in Julia code (until the day we can actually impose these types of constraints on code), so locks should always be safely handled. e.g., perhaps the user could <Ctrl-C> while the lock is in place, or there's an OOM error, or something else we haven't thought of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW Base.@lock (which is what I used in my PR) expands to exactly the code you wrote there @MilesCranmer, and is exported on more recent julia versions

for ptr in QUEUE
if ptr != C.PyNULL
C.Py_DecRef(ptr)
end
end
empty!(QUEUE)
return
unlock(QUEUE_LOCK)
nothing
end

function enqueue(ptr::C.PyPtr)
if ptr != C.PyNULL && C.CTX.is_initialized
if ENABLED[]
if C.PyGILState_Check() == 1
C.Py_DecRef(ptr)
if !isempty(QUEUE)
unsafe_free_queue()
end
else
lock(QUEUE_LOCK)
push!(QUEUE, ptr)
unlock(QUEUE_LOCK)
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
end
end
return
nothing
end

function enqueue_all(ptrs)
if C.CTX.is_initialized
if ENABLED[]
if any(ptr -> ptr != C.PYNULL, ptrs) && C.CTX.is_initialized
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
if C.PyGILState_Check() == 1
for ptr in ptrs
if ptr != C.PyNULL
C.Py_DecRef(ptr)
end
end
if !isempty(QUEUE)
unsafe_free_queue()
end
else
lock(QUEUE_LOCK)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer lock(...) do ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As above)

append!(QUEUE, ptrs)
unlock(QUEUE_LOCK)
end
end
return
nothing
end

"""
GCHook()

An immortal object which frees any pending Python objects when Julia's GC runs.
cjdoris marked this conversation as resolved.
Show resolved Hide resolved

This works by creating it but not holding any strong reference to it, so it is eligible
to be finalized by Julia's GC. The finalizer empties the PythonCall GC queue if
possible. The finalizer also re-attaches itself, so the object does not actually get
collected and so the finalizer will run again at next GC.
"""
mutable struct GCHook
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
function GCHook()
finalizer(_gchook_finalizer, new())
end
end

function _gchook_finalizer(x)
if C.CTX.is_initialized
finalizer(_gchook_finalizer, x)
if !isempty(QUEUE) && C.PyGILState_Check() == 1
unsafe_free_queue()
end
end
nothing
end

function __init__()
HOOK.value = GCHook()
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
nothing
end

end # module GC
9 changes: 9 additions & 0 deletions test/finalize_test_script.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
using PythonCall

# This would consistently segfault pre-GC-thread-safety
let
pyobjs = map(pylist, 1:100)
Threads.@threads for obj in pyobjs
finalize(obj)
end
end
Loading