Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Replace bare static exception<T> with gil_safe_call_once_and_store. #4897

Merged
merged 1 commit into from
Oct 23, 2023

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Oct 23, 2023

Description

This is to ensure that Py_DECREF() is not called after the Python interpreter was finalized already:

// 1. Py_DECREF() calls potentially after the Python interpreter was finalized already, and

(Bug noticed by chance while working on #4888.)

Suggested changelog entry:

Removes potential for Undefined Behavior during process teardown.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 23, 2023

Google global testing ID (passed): OCL:575710443:BASE:575809221:1698070476323:98dd25a0 (using this PR @ ca7bdac)

@rwgk rwgk marked this pull request as ready for review October 23, 2023 14:49
@rwgk rwgk requested a review from EthanSteinberg October 23, 2023 14:49
@rwgk
Copy link
Collaborator Author

rwgk commented Oct 23, 2023

@tkoeppe FYI

I never got to systematically looking if there are more bugs like this BTW.

Copy link
Collaborator

@EthanSteinberg EthanSteinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the fix!

I never got to systematically looking if there are more bugs like this BTW.

I bet we could set up some global variables to track this / throw assertions. Not sure if it would be worth the effort though.

// directly in register_exception, but that makes clang <3.5 segfault - issue #1349).
template <typename CppException>
exception<CppException> &get_exception_object() {
static exception<CppException> ex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this default constructor call into CPython (and release/reacquire the GIL)?

The deadlock scenario can only arise if there's some non-trivial lock interaction in the initializer of the static variable.

I suppose this change would also deal with some end-of-program lifetime situations, though, e.g. if this object gets destroyed too early or too late.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this default constructor call into CPython (and release/reacquire the GIL)?

No, it just copies nullptr to m_ptr (

PyObject *m_ptr = nullptr;
).

The deadlock scenario can only arise if there's some non-trivial lock interaction in the initializer of the static variable.

Yes, that was on my mind.

I suppose this change would also deal with some end-of-program lifetime situations, though, e.g. if this object gets destroyed too early or too late.

That was the only reason I got into here.

I considered simply changing this to use new (with a boilerplate "intentional leak" comment), but then decided: the universally safe solution

  1. is easy,
  2. inexpensive,
  3. expressive (call once is exactly what we want),
  4. reads nicely,
  5. future proof (in case the exception<T> implementation is changed),
  6. and a great pattern for people to remember and follow,

let's use it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense! Using the new facility just for the lifetime is certainly also a good move.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 23, 2023

Thanks @EthanSteinberg!

I bet we could set up some global variables to track this / throw assertions.

I don't know how we could do that (going completely blank tbh). Just disclosing & an indication of curiosity, but ...

Not sure if it would be worth the effort though.

... yeah, I'm assuming there is only very little left to be had, simply based on how widely used pybind11 is.

I'm slightly surprised that nobody needed this fix before. I'm guessing this bug bites very rarely, and only during process teardown, so it's just some occasional flakiness maybe that's easy to ignore.

@rwgk rwgk merged commit bf88e29 into pybind:master Oct 23, 2023
81 checks passed
@rwgk rwgk deleted the gil_safe_register_exception_impl branch October 23, 2023 19:51
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Oct 23, 2023
@EthanSteinberg
Copy link
Collaborator

I don't know how we could do that (going completely blank tbh). Just disclosing & an indication of curiosity, but ...

I don't know exactly how to set this up (would need to do more research), but my initial guess would be to have a boolean flag that is flipped true by https://docs.python.org/3/library/atexit.html#atexit.register.

We could then flag on any calls to Py_DECREF or something of that nature ...

The timing here is really tricky. I'm not sure atexit would be at the point we need.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 24, 2023

The timing here is really tricky. I'm not sure atexit would be at the point we need.

I'm thinking, let's not go there (too uncertain, not a lot to gain for it).

But something else crossed my mind (~randomly): How does this static play with embed.h, specifically multiple calls to py::initialize_interpreter() - py::finalize_interpreter() or py::scoped_interpreter guard{}?

Is the stored Exception type still valid after py::finalize_interpreter()?

@tkoeppe
Copy link
Contributor

tkoeppe commented Oct 24, 2023

I don' t know how the CPython API works for multiple interpreters, but just thinking about this on a high level, none of our static approaches can work with multiple interpreters. Instead, the "once" guard would have to be part of the interpreter state, and the once-initializers would need to work on that per-interpreter state, not globally.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 24, 2023

Thanks Thomas for confirming what I was suspecting! — Sounds like we (someone...) should make this warning much stronger:

.. warning::
The interpreter can be restarted by calling `initialize_interpreter` again.
Modules created using pybind11 can be safely re-initialized. However, Python
itself cannot completely unload binary extension modules and there are several
caveats with regard to interpreter restarting. All the details can be found
in the CPython documentation. In short, not all interpreter memory may be
freed, either due to reference cycles or user-created global data.

@henryiii henryiii changed the title Bug fix: Replace bare static exception<T> with gil_safe_call_once_and_store. fix: Replace bare static exception<T> with gil_safe_call_once_and_store. Nov 15, 2023
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants