fix: test_factory_constructors.py failure triggered by test_register_duplicate_class #2564

YannickJadoul · 2020-10-08T19:26:22Z

To reproduce, this fails on the centos:8 docker image (with development tools installed - see ci.yml - but without per se installing nvidia compilers), by running PYTEST_ADDOPTS="-k 'test_register_duplicate_class or test_init_factory_alias'" cmake --build build_centos8 --target pytest

Cross-reference #2335

EDIT: Figured out what's wrong. Short description of the bug:

test_class.py::test_register_duplicate_class registers py::class_ instances in different scopes, a module and a class (to test whether duplicate class names or types are caught; but that's not really the issue here).
In this test, the actual scopes passed to the py::class_ constructor are not referenced anymore after the scope of the test, and they go out of scope and get garbage collected. Since the py::class_es are only referenced by these scopes that are getting garbage collected, they also get garbage collected. However, they remain in the internals' registered_types_py.
When Python is allocating a new type, it reuses the old memory location of the garbage collected type object from before.
So, the same PyTypeObject pointer in registered_types_py now points to a new Python type, but still associates this with the old type_info (it is likely important that the new type allocated in "recycled" memory is a non-pybind11 type, in the case I debugged; if not, the associated type_info in registered_types_py would be overwritten).
pybind11 doesn't understand things anymore, because it's allocating the new type, but there's now a mismatch between the Python and C++ type, because of the stale PyTypeObject* in registered_types_py.

Current solution:

I copied the weakref from detail::all_type_info_get_cache. I still want to check if there's a better way of solving this, but this solves the problem.

Possible follow-up:

Now that stale classes get removed from the internals, we could think about not checking whether we're overwriting an already existing attribute in class_. I'll give this some more thought and I'll make a PR to further discuss this.

henryiii · 2020-10-08T19:37:03Z

I wonder if this is related to #2558 ? Edit: Sadly it didn't fix this one.

YannickJadoul · 2020-10-08T19:46:45Z

I wonder if this is related to #2558 ?

It would be amazing if it is (and we only need to solve 1 issue instead of 2), but I wouldn't really know why it would be?

henryiii · 2020-10-08T19:50:06Z

https://github.com/pybind/pybind11/pull/2564/checks?check_run_id=1227843279 looks suspicious to me. Notice the second test that failed (though not a segfault).

wjakob · 2020-10-12T20:21:41Z

That's a nice catch. One word of caution about py::weakref: the garbage collector is a complex beast, and although the weakref callback will eventually be invoked, it's very difficult to guarantee that this happens at any particular point in time, or with a particular ordering compared to other stuff that is happening. Although it was a really neat trick, we had to get rid of weak references in the implementation of py::keep_alive (#856) for similar reasons. Based on these experiences, it seems to me that you might just be replacing one kind of undefined behavior by another (albeit likely much more rare).

YannickJadoul · 2020-10-12T20:29:46Z

Hmmm, interesting. I'll still give that a look!

I did think of this, but it seemed reasonably deterministic, the way it's described in the Python docs (both normal as well as C API):

The Python docs for weakref.ref (https://docs.python.org/3/library/weakref.html#weakref.ref):

If callback is provided and not None, and the returned weakref object is still alive, the callback will be called when the object is about to be finalized; the weak reference object will be passed as the only parameter to the callback; the referent will no longer be available.
The Python docs for PyWeakref_NewRef (https://docs.python.org/3/c-api/weakref.html#c.PyWeakref_NewRef):

The second parameter, callback, can be a callable object that receives notification when ob is garbage collected; it should accept a single parameter, which will be the weak reference object itself.

So I understood it's not linked to the garbage collector running on the weakref, but on the reference object? (Which is fine, because as long as that one's not collected, Python won't recycle the memory anyway.)

But I'll look into #856 and try to convince myself that we're good!

YannickJadoul · 2020-10-12T21:58:02Z

I did think of this, but it seemed reasonably deterministic, the way it's described in the Python docs (both normal as well as C API):

Scratch all that. Should've read the issue first; this is not really related.

@wjakob, @henryiii, @EricCousineau-TRI, I still think we should be safe, because we're not too concerned about order, here? Worst case that could happen is some type gets removed from the internals, a bit before that type is actually destructed, but áfter the last reference to this type is gone? (Except if there's some weird cycle, perhaps, and the type is still used in the destructor? But I'm not sure we could fix that anyway, given the arbitrary destruction order of a cycle?)

We could try a similar approach to #856; I thought about that before seeing the weakref approach, but it would involve the metaclass, so we'd again bump into potential issues when using custom metaclasses? :-/

henryiii · 2020-10-12T22:02:25Z

I think this is fine, as it fixes the issue we are seeing, though if something else comes up, we can revisit?

YannickJadoul · 2020-10-12T22:03:35Z

I think this is fine, as it fixes the issue we are seeing, though if something else comes up, we can revisit?

Let's give @EricCousineau-TRI still a chance to review? He said he might have some time tomorrow. And I have a feeling he's been playing in these parts of pybind11 before.

YannickJadoul · 2020-10-12T22:06:47Z

This is in internals:

pybind11/include/pybind11/detail/internals.h

Lines 97 to 109 in 7c71dd3

    
           type_map<type_info *> registered_types_cpp; // std::type_index -> pybind11's type information 
        
           std::unordered_map<PyTypeObject *, std::vector<type_info *>> registered_types_py; // PyTypeObject* -> base type_info(s) 
        
           std::unordered_multimap<const void *, instance*> registered_instances; // void * -> instance* 
        
           std::unordered_set<std::pair<const PyObject *, const char *>, override_hash> inactive_override_cache; 
        
           type_map<std::vector<bool (*)(PyObject *, void *&)>> direct_conversions; 
        
           std::unordered_map<const PyObject *, std::vector<PyObject *>> patients; 
        
           std::forward_list<void (*) (std::exception_ptr)> registered_exception_translators; 
        
           std::unordered_map<std::string, void *> shared_data; // Custom data to be shared across extensions 
        
           std::vector<PyObject *> loader_patient_stack; // Used by `loader_life_support` 
        
           std::forward_list<std::string> static_strings; // Stores the std::strings backing detail::c_str() 
        
           PyTypeObject *static_property_type; 
        
           PyTypeObject *default_metaclass; 
        
           PyObject *instance_base;

What else should we clear out? registered_types_cpp as well, right?

EDIT(eric): Made it a permalink code ref.

YannickJadoul · 2020-10-12T22:57:37Z

Thinking a bit further: is that the behavior we actually want?

I.e., in this case, once the Python type object (the one of the class_) gets garbage collected, it's unregistered. So it would also mean you can't go from C++ to Python, anymore. So you couldn't register an "anonymous" type with class_ that you're only returning from C++ to Python (unless you explicitly keep it alive on the C++ side).

Or, you register a C++ type Foo as my_lovely_module.Bar Python type, and a user manage to clear all references to my_lovely_module.Bar, and suddenly, a function returning Foo will not be able to convert its return type. Are we happy with that?

(To be clear, the above scenario currently results in a segfault, so at least it's better. But the other option would be to keep C++-registered type alive forever.)

henryiii · 2020-10-13T14:16:04Z

I think this is good for now, as it's at least better, and I don't think deleting class objects is common (except in some tests). And if you go to the trouble of deleting all references to a class, would you expect it to magically reappear if you tried to do a conversion afterwords?

EricCousineau-TRI · 2020-10-13T14:18:01Z

Are we happy with that?

Agreed with Henry. If a user manages to clear out references, then they probably have a reason to do so, so we shouldn't get in the way.

Additionally, a user could prevent that behavior by stashing a reference to the module (or type), preventing garbage colllection:

m.attr("_deps") = py::make_tuple(
  py::module::import("my_lovely_module"),
  ...
);

This is effectively what happens in nominal Python anyways.

EricCousineau-TRI

Looks good! Just some minor comments.

include/pybind11/pybind11.h

YannickJadoul · 2020-10-13T19:52:21Z

And if you go to the trouble of deleting all references to a class, would you expect it to magically reappear if you tried to do a conversion afterwords?

Additionally, a user could prevent that behavior by stashing a reference to the module (or type), preventing garbage colllection:

True, OK. I was just afraid of unexpected surprises where suddenly a functions return type can't be returned anymore. In Python you'd never have that, because there'd still be a reference to it, when creating the object.

But I'm fine with waiting for complaints :-)

wjakob · 2020-10-15T11:27:10Z

After taking another look, I am now convinced that the weak reference will work. That said, it seems far too heavy to create cpp_function for each type to set up a unique destruction callback. How about creating this callback method once in internals and then just initializing a single weak reference in cpp_type?

This would imply that the implementation has to be sufficiently generic -- it must e.g. map from the type to std::type_index and then figure stuff out from there.

YannickJadoul · 2020-10-15T12:08:47Z

Tangentially related, it seems that there is a function
inline std::pair<decltype(internals::registered_types_py)::iterator, bool> all_type_info_get_cache(PyTypeObject *type) {
that is not used anywhere in pybind11, and which does some suspiciously related things.

That's where I got this approach, yes. I got reasonably confused for a while on whý things weren't getting cleaned up.

It is used here, though, I think:

const std::vector<detail::type_info *> &all_type_info(PyTypeObject *type) {
    auto ins = all_type_info_get_cache(type);
    if (ins.second)
        // New cache entry: populate it
        all_type_info_populate(type, ins.first->second);

    return ins.first->second;
}

wjakob · 2020-10-15T12:29:34Z

@wjakob We do need access to the specific Python type object, though. And as far as I tried, you can't add arbitrary attributes to a weakref?

https://docs.python.org/3/library/weakref.html#weakref.ref says that

If callback is provided and not None, and the returned weakref object is still alive, the callback will be called when the object is about to be finalized; the weak reference object will be passed as the only parameter to the callback; the referent will no longer be available.

That is an excellent point, I had forgotten about that. Then how about catching things at a lower level: https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_finalize, in this case in the metaclass used to construct pybind11 types?

YannickJadoul · 2020-10-15T12:47:11Z

That is an excellent point, I had forgotten about that. Then how about catching things at a lower level: https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_finalize, in this case in the metaclass used to construct pybind11 types

That is the other option, yes. The detail here is that using a custom metaclass through py::metaclass(...) again will result in issues? And I preferred the non-invasive method of weakref, but maybe it's fine like this, for now?

YannickJadoul · 2020-10-15T12:48:01Z

it's fine like this, for now?

By which I mean: going with tp_finalize, and checking later how we can safely enable custom metaclasses?

YannickJadoul · 2020-10-15T21:19:42Z

But maybe PyWeakref_GET_OBJECT would still work?

This won't: https://github.com/python/cpython/blob/cf693e537dc8aaa14315a7f59baec4a31d1167d3/Include/weakrefobject.h#L70-L80

…hanges from pybind#2335

…tional changes from pybind#2335" This reverts commit ca33a80.

…cope

…metaclass

YannickJadoul · 2020-10-15T22:40:32Z

@wjakob, @rwgk, @EricCousineau-TRI, @henryiii, I moved the cleanup function to tp_dealloc in the metaclass.

Please have a look and compare this PR with and without this last commit ae96b37, to compare the two approaches.

YannickJadoul · 2020-10-15T22:42:27Z

@rwgk I fixed the detail::type_info *tinfo leak you found, in both versions. I've done it in a slightly different way as originally suggested, but I think this is more elegant.

henryiii · 2020-10-16T02:02:01Z

I take it the second approach breaks custom meta classes that don't also call this?

YannickJadoul · 2020-10-16T08:52:00Z

I take it the second approach breaks custom meta classes that don't also call this?

Yes. Then again, they'd only be as broken as the current situation (which has been there for a long time, if not forever, as far as I know?).

bstaletic · 2020-10-16T09:14:55Z

I was going to ask if custom metaclasses ever worked in combination with pybind11.

YannickJadoul · 2020-10-16T09:50:38Z

I was going to ask if custom metaclasses ever worked in combination with pybind11.

Kind of (there is at least the py::metaclass tag), but anything that's defined in our pybind11-metaclass is of course not there by default (as far as I know, these are things like static properties, the extra check of __init__ having been called, ...)

rwgk

This PR passed the Google-global testing.

henryiii · 2020-10-16T14:03:46Z

Then again, they'd only be as broken as the current situation

Okay, either approach is fine with me then.

wjakob · 2020-10-16T14:22:13Z

Beautiful, thank you for the great work @YannickJadoul. I think that we can likely install a similar callback also for custom metaclasses, but this a much less important point that is not a blocker for v.2.6.0 (this kind issue revolving around garbage collection of pybind11-created types is rare enough in practice that we are just bothered by it now, and custom metaclasses are a super-rarely used feature of pybind11). Pr{A ∩ B}≈ Pr{A}·Pr{B}=tiny number, I hope 🙂.

YannickJadoul added the bug label Oct 8, 2020

YannickJadoul force-pushed the factory-invalid-self-argument branch 4 times, most recently from 3752dd6 to d8eb801 Compare October 9, 2020 18:03

YannickJadoul changed the title ~~Demonstrate test_factory_constructors.py failure without functional changes from #2335~~ Fix test_factory_constructors.py failure triggerd by test_register_duplicate_class Oct 9, 2020

YannickJadoul self-assigned this Oct 9, 2020

YannickJadoul changed the title ~~Fix test_factory_constructors.py failure triggerd by test_register_duplicate_class~~ Fix test_factory_constructors.py failure triggered by test_register_duplicate_class Oct 9, 2020

YannickJadoul assigned wjakob Oct 9, 2020

henryiii changed the title ~~Fix test_factory_constructors.py failure triggered by test_register_duplicate_class~~ fix: test_factory_constructors.py failure triggered by test_register_duplicate_class Oct 9, 2020

YannickJadoul mentioned this pull request Oct 11, 2020

Race condition in test_python_to_cpp_to_python_from_process test on 3.9? #2391

Open

henryiii added this to the v2.6.0 milestone Oct 12, 2020

EricCousineau-TRI requested review from EricCousineau-TRI and removed request for EricCousineau-TRI October 12, 2020 19:09

EricCousineau-TRI approved these changes Oct 13, 2020

View reviewed changes

include/pybind11/pybind11.h Outdated Show resolved Hide resolved

include/pybind11/pybind11.h Outdated Show resolved Hide resolved

YannickJadoul mentioned this pull request Oct 14, 2020

Cleanup internals for garbage-collected py::class_ objects #2595

Open

YannickJadoul added 6 commits October 15, 2020 23:50

Demonstrate test_factory_constructors.py failure without functional c…

0e308fe

…hanges from pybind#2335

Revert "Demonstrate test_factory_constructors.py failure without func…

2d8a70c

…tional changes from pybind#2335" This reverts commit ca33a80.

Fix test crash where registered Python type gets garbage collected

55a9914

Clean up some more internal structures when class objects go out of s…

704d6a5

…cope

Reduce length of std::erase_if-in-C++20 comment

35ce747

Clean up code for cleaning up type internals

51978a8

YannickJadoul force-pushed the factory-invalid-self-argument branch from cfada0c to 51978a8 Compare October 15, 2020 22:03

Move cleaning up of type info in internals to tp_dealloc on pybind11_…

ae96b37

…metaclass

YannickJadoul force-pushed the factory-invalid-self-argument branch from 2b8a3f4 to ae96b37 Compare October 15, 2020 22:38

rwgk approved these changes Oct 16, 2020

View reviewed changes

henryiii merged commit 6364b73 into pybind:master Oct 16, 2020

YannickJadoul deleted the factory-invalid-self-argument branch October 17, 2020 14:06

This was referenced Jan 22, 2021

Update 'inactive_override_cache' key #2772

Closed

get_type_overload may encounter false cache hits if derived instances are GC'd #1922

Closed

Child Class Method Not Being Called as Expected, Base Class Method Called Instead #1552

Closed

BetsyMcPhail mentioned this pull request Mar 2, 2021

geometry_test:test_unimplemented_rendering fails sporadically on Mac RobotLocomotion/drake#14686

Closed

rwgk mentioned this pull request Feb 10, 2023

FWD pybind11 google/pybind11clif#2564

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: test_factory_constructors.py failure triggered by test_register_duplicate_class #2564

fix: test_factory_constructors.py failure triggered by test_register_duplicate_class #2564

YannickJadoul commented Oct 8, 2020 •

edited

Loading

henryiii commented Oct 8, 2020 •

edited

Loading

YannickJadoul commented Oct 8, 2020

henryiii commented Oct 8, 2020

wjakob commented Oct 12, 2020

YannickJadoul commented Oct 12, 2020 •

edited

Loading

YannickJadoul commented Oct 12, 2020

henryiii commented Oct 12, 2020

YannickJadoul commented Oct 12, 2020

YannickJadoul commented Oct 12, 2020 •

edited by EricCousineau-TRI

Loading

YannickJadoul commented Oct 12, 2020

henryiii commented Oct 13, 2020

EricCousineau-TRI commented Oct 13, 2020 •

edited

Loading

EricCousineau-TRI left a comment

YannickJadoul commented Oct 13, 2020

wjakob commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020 •

edited

Loading

wjakob commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020

henryiii commented Oct 16, 2020

YannickJadoul commented Oct 16, 2020

bstaletic commented Oct 16, 2020

YannickJadoul commented Oct 16, 2020

rwgk left a comment

henryiii commented Oct 16, 2020

wjakob commented Oct 16, 2020

fix: test_factory_constructors.py failure triggered by test_register_duplicate_class #2564

fix: test_factory_constructors.py failure triggered by test_register_duplicate_class #2564

Conversation

YannickJadoul commented Oct 8, 2020 • edited Loading

henryiii commented Oct 8, 2020 • edited Loading

YannickJadoul commented Oct 8, 2020

henryiii commented Oct 8, 2020

wjakob commented Oct 12, 2020

YannickJadoul commented Oct 12, 2020 • edited Loading

YannickJadoul commented Oct 12, 2020

henryiii commented Oct 12, 2020

YannickJadoul commented Oct 12, 2020

YannickJadoul commented Oct 12, 2020 • edited by EricCousineau-TRI Loading

YannickJadoul commented Oct 12, 2020

henryiii commented Oct 13, 2020

EricCousineau-TRI commented Oct 13, 2020 • edited Loading

EricCousineau-TRI left a comment

Choose a reason for hiding this comment

YannickJadoul commented Oct 13, 2020

wjakob commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020 • edited Loading

wjakob commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020

YannickJadoul commented Oct 15, 2020

henryiii commented Oct 16, 2020

YannickJadoul commented Oct 16, 2020

bstaletic commented Oct 16, 2020

YannickJadoul commented Oct 16, 2020

rwgk left a comment

Choose a reason for hiding this comment

henryiii commented Oct 16, 2020

wjakob commented Oct 16, 2020

YannickJadoul commented Oct 8, 2020 •

edited

Loading

henryiii commented Oct 8, 2020 •

edited

Loading

YannickJadoul commented Oct 12, 2020 •

edited

Loading

YannickJadoul commented Oct 12, 2020 •

edited by EricCousineau-TRI

Loading

EricCousineau-TRI commented Oct 13, 2020 •

edited

Loading

YannickJadoul commented Oct 15, 2020 •

edited

Loading