-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: test_factory_constructors.py failure triggered by test_register_duplicate_class #2564
fix: test_factory_constructors.py failure triggered by test_register_duplicate_class #2564
Conversation
I wonder if this is related to #2558 ? Edit: Sadly it didn't fix this one. |
It would be amazing if it is (and we only need to solve 1 issue instead of 2), but I wouldn't really know why it would be? |
https://github.com/pybind/pybind11/pull/2564/checks?check_run_id=1227843279 looks suspicious to me. Notice the second test that failed (though not a segfault). |
3752dd6
to
d8eb801
Compare
That's a nice catch. One word of caution about |
Hmmm, interesting. I'll still give that a look! I did think of this, but it seemed reasonably deterministic, the way it's described in the Python docs (both normal as well as C API):
So I understood it's not linked to the garbage collector running on the weakref, but on the reference object? (Which is fine, because as long as that one's not collected, Python won't recycle the memory anyway.) But I'll look into #856 and try to convince myself that we're good! |
Scratch all that. Should've read the issue first; this is not really related. @wjakob, @henryiii, @EricCousineau-TRI, I still think we should be safe, because we're not too concerned about order, here? Worst case that could happen is some type gets removed from the internals, a bit before that type is actually destructed, but áfter the last reference to this type is gone? (Except if there's some weird cycle, perhaps, and the type is still used in the destructor? But I'm not sure we could fix that anyway, given the arbitrary destruction order of a cycle?) We could try a similar approach to #856; I thought about that before seeing the |
I think this is fine, as it fixes the issue we are seeing, though if something else comes up, we can revisit? |
Let's give @EricCousineau-TRI still a chance to review? He said he might have some time tomorrow. And I have a feeling he's been playing in these parts of pybind11 before. |
This is in pybind11/include/pybind11/detail/internals.h Lines 97 to 109 in 7c71dd3
What else should we clear out? EDIT(eric): Made it a permalink code ref. |
Thinking a bit further: is that the behavior we actually want? I.e., in this case, once the Python type object (the one of the class_) gets garbage collected, it's unregistered. So it would also mean you can't go from C++ to Python, anymore. So you couldn't register an "anonymous" type with Or, you register a C++ type (To be clear, the above scenario currently results in a segfault, so at least it's better. But the other option would be to keep C++-registered type alive forever.) |
I think this is good for now, as it's at least better, and I don't think deleting class objects is common (except in some tests). And if you go to the trouble of deleting all references to a class, would you expect it to magically reappear if you tried to do a conversion afterwords? |
Agreed with Henry. If a user manages to clear out references, then they probably have a reason to do so, so we shouldn't get in the way. Additionally, a user could prevent that behavior by stashing a reference to the module (or type), preventing garbage colllection:
This is effectively what happens in nominal Python anyways. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just some minor comments.
True, OK. I was just afraid of unexpected surprises where suddenly a functions return type can't be returned anymore. In Python you'd never have that, because there'd still be a reference to it, when creating the object. But I'm fine with waiting for complaints :-) |
After taking another look, I am now convinced that the weak reference will work. That said, it seems far too heavy to create This would imply that the implementation has to be sufficiently generic -- it must e.g. map from the type to |
That's where I got this approach, yes. I got reasonably confused for a while on whý things weren't getting cleaned up. It is used here, though, I think: const std::vector<detail::type_info *> &all_type_info(PyTypeObject *type) {
auto ins = all_type_info_get_cache(type);
if (ins.second)
// New cache entry: populate it
all_type_info_populate(type, ins.first->second);
return ins.first->second;
} |
That is an excellent point, I had forgotten about that. Then how about catching things at a lower level: https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_finalize, in this case in the metaclass used to construct pybind11 types? |
That is the other option, yes. The detail here is that using a custom metaclass through |
By which I mean: going with |
|
…tional changes from pybind#2335" This reverts commit ca33a80.
cfada0c
to
51978a8
Compare
2b8a3f4
to
ae96b37
Compare
@rwgk I fixed the |
I take it the second approach breaks custom meta classes that don't also call this? |
Yes. Then again, they'd only be as broken as the current situation (which has been there for a long time, if not forever, as far as I know?). |
I was going to ask if custom metaclasses ever worked in combination with pybind11. |
Kind of (there is at least the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR passed the Google-global testing.
Okay, either approach is fine with me then. |
Beautiful, thank you for the great work @YannickJadoul. I think that we can likely install a similar callback also for custom metaclasses, but this a much less important point that is not a blocker for v.2.6.0 (this kind issue revolving around garbage collection of pybind11-created types is rare enough in practice that we are just bothered by it now, and custom metaclasses are a super-rarely used feature of pybind11). Pr{A ∩ B}≈ Pr{A}·Pr{B}=tiny number, I hope 🙂. |
To reproduce, this fails on the
centos:8
docker image (with development tools installed - seeci.yml
- but without per se installing nvidia compilers), by runningPYTEST_ADDOPTS="-k 'test_register_duplicate_class or test_init_factory_alias'" cmake --build build_centos8 --target pytest
Cross-reference #2335
EDIT: Figured out what's wrong. Short description of the bug:
test_class.py::test_register_duplicate_class
registerspy::class_
instances in different scopes, a module and a class (to test whether duplicate class names or types are caught; but that's not really the issue here).py::class_
constructor are not referenced anymore after the scope of the test, and they go out of scope and get garbage collected. Since thepy::class_
es are only referenced by these scopes that are getting garbage collected, they also get garbage collected. However, they remain in the internals'registered_types_py
.PyTypeObject
pointer inregistered_types_py
now points to a new Python type, but still associates this with the oldtype_info
(it is likely important that the new type allocated in "recycled" memory is a non-pybind11 type, in the case I debugged; if not, the associatedtype_info
inregistered_types_py
would be overwritten).PyTypeObject*
inregistered_types_py
.Current solution:
weakref
fromdetail::all_type_info_get_cache
. I still want to check if there's a better way of solving this, but this solves the problem.Possible follow-up:
class_
. I'll give this some more thought and I'll make a PR to further discuss this.