Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More thread-safe GC #529

Merged
merged 23 commits into from
Aug 2, 2024
Merged

More thread-safe GC #529

merged 23 commits into from
Aug 2, 2024

Conversation

cjdoris
Copy link
Collaborator

@cjdoris cjdoris commented Jul 24, 2024

A draft of an alternative to #520 to ensure Python GC only runs on thread 1.

This version avoids using tasks. Instead, when a Python object is finalized:

  • If GC is enabled and the GIL is currently held, decref the object immediately.
  • Otherwise add it to the queue.

I also added PythonCall.GC.gc() to explicitly clear the queue (if possible).

Furthermore I added an immortal object GCHook() which calls PythonCall.GC.gc() whenever it is finalized, then adds its own finalizer back on. This means the queue can be cleared periodically when Julia GC runs, because it keeps trying to finalize GCHook().

I used PyGILState_Check() == 1 to check that the GIL is currently held. This should make it safe to interact with Python from any thread in the future, provided you use the C API PyGILState_* and PyThreadState_* functions correctly.

Very much a draft. Need to use Channels etc (like #520) to make this totally thread-safe.

@ericphanson
Copy link
Contributor

BTW, could be worth trying this benchmark: main...ericphanson:PythonCall.jl:eph/timing

@cjdoris
Copy link
Collaborator Author

cjdoris commented Jul 31, 2024

@ericphanson @lassepe @MilesCranmer I think this is the version I'm going to merge.

I'd be grateful if you could give it a try to check it solves any issues, and also review the code.

Some timings on my machine:

  • pyint(123456789) (quite a fast operation) takes 45ns on the main branch versus 54ns on this branch if the GIL is held.
  • That slow-down is expected, because the main difference is the addition of a C.PyGILState_Check() call, which is 11ns.
  • If the GIL is not held when the finalizer is called then the finalizer takes an extra 20ns, half being from locking/unlocking the spin lock, the other half being pushing/popping the queue. On main you get a slowdown of 10ns if GC is disabled (from pushing/popping), so this PR adds 10ns (locking/unlocking) in this situation.
  • Using channels instead of vectors and locks adds about 80ns to the no-GIL branch. So I don't use channels.
  • The fastest you can possibly create and free this int (C.Py_DecRef(C.PyLong_FromLongLong(123456789))) takes 19ns.
  • Using pydel! to avoid finalizers (PythonCall.pydel!(pyint(123456789)) takes 30ns (19ns to create/free, 10ns to push/pop).
  • In short, we were 2.37x slower than optimal before and now we are 2.84x slower. Or another way, we could make 22.2M ints per second before, now we can make 18.5M. This seems acceptable to me for the improvement.

I just need to copy over the extra tests from #520.

@MilesCranmer
Copy link
Contributor

PySR test suite is passing 👍

@MilesCranmer
Copy link
Contributor

Cool, this also fixes avik-pal/Wandb.jl#27!

src/GC/GC.jl Outdated Show resolved Hide resolved
Copy link
Contributor

@MilesCranmer MilesCranmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docs also need to be updated with the warning about multithreading (and suggestion that GC be turned off).

Also, how does this interact with the signal handling, does the warning in the docs still apply?

src/GC/GC.jl Outdated Show resolved Hide resolved
src/GC/GC.jl Outdated Show resolved Hide resolved
src/GC/GC.jl Outdated Show resolved Hide resolved
src/GC/GC.jl Outdated
end

function unsafe_free_queue()
lock(QUEUE_LOCK)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the lock(...) do ... API since it will still unlock in the case of an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I considered it but (a) it adds 30ns and (b) the code in between should not throw.

Copy link
Contributor

@MilesCranmer MilesCranmer Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's the additional function call from the closure. In that case I would try

lock(QUEUE_LOCK)
try
    #= do stuff =#
finally
    unlock(QUEUE_LOCK)
end

And hopefully that's a bit faster.

If anything it's just a good habit. I think there's always a possibility for some kind of error in Julia code (until the day we can actually impose these types of constraints on code), so locks should always be safely handled. e.g., perhaps the user could <Ctrl-C> while the lock is in place, or there's an OOM error, or something else we haven't thought of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW Base.@lock (which is what I used in my PR) expands to exactly the code you wrote there @MilesCranmer, and is exported on more recent julia versions

src/GC/GC.jl Outdated
else
lock(QUEUE_LOCK)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer lock(...) do ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As above)

src/GC/GC.jl Outdated Show resolved Hide resolved
@ericphanson
Copy link
Contributor

I think we need SpinLock since we are in a finalizer! We aren’t allowed to yield like ReentrantLock does

Copy link
Contributor

@MilesCranmer MilesCranmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Contributor

@ericphanson ericphanson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One out of date comment, otherwise lgtm (though I should say I’m not expert and was just learning as I went in #520)

pytest/test_all.py Outdated Show resolved Hide resolved
@MilesCranmer
Copy link
Contributor

MilesCranmer commented Aug 2, 2024

Regarding the isempty needing to be locked,

iterate(items) === nothing

This is how Base checks for an array being empty. I think what can happen is: thread A can (1) call the iterate, (2) get the pointer to the first array element, and then (3) a separate thread B removes that element from the array, making that pointer point to invalid memory, (4) thread A accesses the first array element anyways.

In other words, use of isempty could cause undefined behavior if another thread is writing to the collection (and maybe a segfault).

Therefore I think all uses of isempty(QUEUE.items) need to be behind a lock.

Basically anytime QUEUE.items is used in any way it should probably be locked unless we know with absolute certainty there are no races.

@cjdoris
Copy link
Collaborator Author

cjdoris commented Aug 2, 2024

Regarding the isempty needing to be locked,

I commented elsewhere that you're looking at the wrong method of isempty. The method for arrays is

isempty(a::AbstractArray) = (length(a) == 0)

which I believe is safe. It might give wrong answers occasionally in the presence of a race condition, but it's safe.

@MilesCranmer
Copy link
Contributor

Oh yeah, you are totally right, sorry.

@MilesCranmer
Copy link
Contributor

Ok, no other comments from me. I think it's ready to go 👍

@cjdoris cjdoris merged commit bcd2bbb into main Aug 2, 2024
13 checks passed
@cjdoris cjdoris deleted the safer-gc branch August 2, 2024 21:11
@cjdoris
Copy link
Collaborator Author

cjdoris commented Aug 2, 2024

Thanks for your help reviewing!

@MilesCranmer
Copy link
Contributor

No problem 👍

(By the way, could we get a release so we can fix the Wandb.jl issue?)

@cjdoris
Copy link
Collaborator Author

cjdoris commented Aug 2, 2024

Yeah I'll make a release once #535 is merged (going to sleep on it and bikeshed the new function names).

@MilesCranmer
Copy link
Contributor

Sounds good.

In case #535 seems like it will take some more discussion to settle on an API I wonder if we could have a patch release just so we can get the bug fixes through?

@MilesCranmer
Copy link
Contributor

@cjdoris could we have a release before #535? I need the thread safety for something PySR-related and it's slightly urgent (am teaching a summer school this week and want to have them using Wandb.jl)

@cjdoris
Copy link
Collaborator Author

cjdoris commented Aug 7, 2024

It's released

@MilesCranmer
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants