Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRO: Behavior change from 3.10 to 3.11: multiple inheritance issue with C-API (pybind11) #92678

Closed
Skylion007 opened this issue May 11, 2022 · 64 comments
Assignees
Labels
3.11 only security fixes type-feature A feature request or enhancement

Comments

@Skylion007
Copy link

Skylion007 commented May 11, 2022

Bug report

  • We have started testing the 3.11 beta branch in pybind11 and have found an issue where a class constructed using the C-API that inherits from two base classes that have different tpget_set implementations. Interestingly, this issue occurs when the first base does not have dynamic attributes enabled, but another base class does. I tried making the child class also have dynamic attributes (and therefore a larger size to store the dict etc.), but I still get a PyTypeReady failed. We've been encountering this issue since alpha 3, but I have not found any issues on BPO of other people having similar issues. I am wondering what changed and how we can fix our API usage to allow for full support of Python 3.11 as it enters beta.

  • I suspect it's something very subtle with how we are constructing our Python types, but there was nothing in the 3.11 migration guide that flags this issue. Any thoughts on how to fix issue? Is this known behavior or a bug? Or is it something that should be added to the migration guide?

  • @vstinner I know you are very familiar with the C-API and helped us deal with some of the other API changes, any thoughts?

Here is the failing test: pybind/pybind11#3923

  • Your environment
  • CPython versions tested on: 3.11
  • Operating system and architecture: Ubuntu-latest
@Skylion007 Skylion007 added the type-bug An unexpected behavior, bug, or error label May 11, 2022
@JelleZijlstra
Copy link
Member

Do you have a minimal example that reproduces the issue, or failing that an example of a piece of pybind11 code demonstrating the problem?

You say that the issue first appeared in 3.11a3. Are you able to bisect further to a specific commit?

@Skylion007
Copy link
Author

I have looked through the commit list, but was unable to find any suspicious commits. @henryiii @rwgk Any thoughts on which commit might breaking out test?

@Skylion007
Copy link
Author

@JelleZijlstra The failing test can be found here: pybind/pybind11#3923

@vstinner
Copy link
Member

PyTypeReady failed

Do you get an exception? What is the exception?

@vstinner
Copy link
Member

vstinner commented May 11, 2022

@JelleZijlstra The failing test can be found here: pybind/pybind11#3923

Ah, the "Upstream / 🐍 3.11 dev • ubuntu-latest • x64 (pull_request)" failed with:

ImportError while loading conftest '/home/runner/work/pybind11/pybind11/tests/conftest.py'.
conftest.py:16: in <module>
    import pybind11_tests  # noqa: F401
E   ImportError: VanillaDictMix1: PyType_Ready failed (TypeError: mro() returned base with unsuitable layout ('pybind11_tests.multiple_inheritance.WithDict'))!

This error message comes from the mro_check() function of typeobject.c:

        if (!PyType_IsSubtype(solid, solid_base(base))) {
            PyErr_Format(
                PyExc_TypeError,
                "mro() returned base with unsuitable layout ('%.500s')",
                base->tp_name);
            return -1;
        }

with:

static PyTypeObject *
solid_base(PyTypeObject *type)
{
    PyTypeObject *base;

    if (type->tp_base)
        base = solid_base(type->tp_base);
    else
        base = &PyBaseObject_Type;
    if (extra_ivars(type, base))
        return type;
    else
        return base;
}

@vstinner
Copy link
Member

extra_ivars() used by solid_base() is different in Python 3.11, the following code path was removed in 3.11:

    if (type->tp_dictoffset && base->tp_dictoffset == 0 &&                      
        type->tp_dictoffset + sizeof(PyObject *) == t_size &&    
        type->tp_flags & Py_TPFLAGS_HEAPTYPE)
        t_size -= sizeof(PyObject *);

Python 3.11 changes:

I don't understand well the purpose of the extra_ivars(type, base) function.

@vstinner
Copy link
Member

this issue occurs when the first base does not have dynamic attributes enabled, but another base class does.

pybind11 code:


/// Give instances of this type a `__dict__` and opt into garbage collection.
inline void enable_dynamic_attributes(PyHeapTypeObject *heap_type) {
    auto *type = &heap_type->ht_type;
    type->tp_flags |= Py_TPFLAGS_HAVE_GC;
    type->tp_dictoffset = type->tp_basicsize;           // place dict at the end
    type->tp_basicsize += (ssize_t) sizeof(PyObject *); // and allocate enough space for it
    type->tp_traverse = pybind11_traverse;
    type->tp_clear = pybind11_clear;

    static PyGetSetDef getset[] = {
        {const_cast<char *>("__dict__"), pybind11_get_dict, pybind11_set_dict, nullptr, nullptr},
        {nullptr, nullptr, nullptr, nullptr, nullptr}};
    type->tp_getset = getset;
}

It sets type->tp_dictoffset: it may change extra_ivars() result.

@Skylion007
Copy link
Author

@vstinner Hmm, any idea how to ensure the ivars is happy then?

@vstinner
Copy link
Member

@vstinner Hmm, any idea how to ensure the ivars is happy then?

I have no idea. It would help to have a simpler reproducer than pybind11 test suite.

@Skylion007
Copy link
Author

@JelleZijlstra I have a failing portion of PyBind11 test suite now. Any ideas how to debug this further?

@JelleZijlstra
Copy link
Member

Sorry, I am not familiar with this area of the code so I don't think I can be of more help than Victor. As Victor said, it would be helpful to have a self-contained example that doesn't rely on pybind11.

Also, would be good to know how big the impact is on pybind11: is this something a lot of users are likely to run into, or just an obscure edge case that your test suite happens to cover?

@henryiii
Copy link
Contributor

henryiii commented May 16, 2022

Here's a pybind11 reproducer:

#include <pybind11/pybind11.h>

namespace py = pybind11;

struct Vanilla {};

struct WithDict {};
struct VanillaDictMix2 : WithDict, Vanilla {};
struct VanillaDictMix1 : Vanilla, WithDict {};

PYBIND11_MODULE(example, m) {
    py::class_<Vanilla>(m, "Vanilla").def(py::init<>());
    py::class_<WithDict>(m, "WithDict", py::dynamic_attr()).def(py::init<>());
    py::class_<VanillaDictMix2, WithDict, Vanilla>(m, "VanillaDictMix2").def(py::init<>()); // OK
    py::class_<VanillaDictMix1, Vanilla, WithDict>(m, "VanillaDictMix1").def(py::init<>()); // Broken in 3.11
}

(Names chosen to match the test suite, which is why 2 comes before 1 - 2 works, 1 doesn't)

Compile and run (fish syntax):

$ clang++ -Wall -shared -std=c++14 -undefined dynamic_lookup (pipx run --python=~/.pyenv/versions/3.11.0b1/bin/python pybind11 --includes | string split " ") -g example.cpp -o example(~/.pyenv/versions/3.11.0b1/bin/python3-config --extension-suffix)
$ ~/.pyenv/versions/3.11.0b1/bin/python -c "import example"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: VanillaDictMix1: PyType_Ready failed (TypeError: mro() returned base with unsuitable layout ('example.WithDict'))!

Note that WithDict has a __dict__, and Vanilla does not. Both DictMix's should.

@vstinner
Copy link
Member

@markshannon: You made the two commits changing extra_ivars(). Do you have any idea why Python 3.11 behaves differently? Is it a deliberate choice?

@markshannon markshannon added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels May 17, 2022
@markshannon
Copy link
Member

Strictly speaking, this is a pybind11 bug, not a CPython bug. Happy to help, though 🙂

First all, why is pybind11 doing its own class creation instead of calling the C-API?
If this answer is "there is no C-API for this", then we should fix that.

Regarding extra_ivars, extra_ivars in 3.10 makes space for __dict__ and __weakref__ slots at the end of the object.
In 3.11, the __dict__ slot is implicitly stored before the object, so extra_ivars only makes space for the __weakref__ slot.

If you want to add a __dict__ slot to a class, rather than allocating space at the end, you should set the flag type->tp_flags |= Py_TPFLAGS_MANAGED_DICT; instead.

In 3.12 we might do something similar with the __weakref__ slot as well, so extra_ivars might be removed.

Hope that helps.

@markshannon
Copy link
Member

@Skylion007 does that help?

@henryiii
Copy link
Contributor

henryiii commented May 25, 2022

I tried this locally and it seems to work (it passes the test above, anyway)! We are a little stuck in CI because PySys_SetArgvEx (which is part of the Stable ABI?) was just deprecated and we need to find a way to support Python 3.8+ embedded interpreters without killing 3.6 & 3.7 support. Working on that. Almost working except for not finding local modules and segfaulting. It makes it through the normal (non-embedded) tests, though, so I think the issue above is resolved.

this is a pybind11 bug,

Py_TPFLAGS_MANAGED_DICT was added in 3.11 as far as I can tell, so I'm not sure it can be called a bug in pybind11 unless there was a way to do this in 3.6+ or even 3.10 without resorting to the method we used. ;)

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Jul 21, 2022

I got this same error while trying to port mypyc to 3.11. See python/mypy#13206.

# cat rep.py
from typing import Protocol

class Proto(Protocol):
    pass

class A:
    pass

class B(A, Proto):
    pass
# mypyc rep.py
>>> import rep
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "rep.py", line 9, in <module>
    class B(A, Proto):
  File "/home/ken/Documents/GitHub/cpython/Lib/abc.py", line 106, in __new__
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: multiple bases have instance lay-out conflict
>>>

Snippets of the generated C API code:

/// Proto Class C API code

PyMemberDef Proto_members[] = {
    {"__dict__", T_OBJECT_EX, sizeof(PyObject), 0, NULL},
    {"__weakref__", T_OBJECT_EX, sizeof(PyObject) + sizeof(PyObject *), 0, NULL},
    {0}
};

...

static PyTypeObject CPyType_Proto_template_ = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "Proto",
    .tp_new = Proto_new,
    .tp_getset = Proto_getseters,
    .tp_methods = Proto_methods,
    .tp_members = Proto_members,
    .tp_basicsize = sizeof(PyObject) + 2*sizeof(PyObject *),
    .tp_dictoffset = sizeof(PyObject),
    .tp_weaklistoffset = sizeof(PyObject) + sizeof(PyObject *),
    .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HEAPTYPE | Py_TPFLAGS_BASETYPE,
};
...

/// A is just a normal Python type without __dict__

/// B Class in C API
PyMemberDef B_members[] = {
    {"__dict__", T_OBJECT_EX, sizeof(rep___BObject), 0, NULL},
    {"__weakref__", T_OBJECT_EX, sizeof(rep___BObject) + sizeof(PyObject *), 0, NULL},
    {0}
};

...

static int
B_traverse(rep___BObject *self, visitproc visit, void *arg)
{
    Py_VISIT(*((PyObject **)((char *)self + sizeof(rep___BObject)))); // __dict__
    Py_VISIT(*((PyObject **)((char *)self + sizeof(PyObject *) + sizeof(rep___BObject)))); // __weakref__
    return 0;
}

static int
B_clear(rep___BObject *self)
{
    Py_CLEAR(*((PyObject **)((char *)self + sizeof(rep___BObject)))); // __dict__
    Py_CLEAR(*((PyObject **)((char *)self + sizeof(PyObject *) + sizeof(rep___BObject)))); // __weakref__
    return 0;
}

Is mypyc doing something wrong by setting __dict__ manually at the end? I tried using Py_TPFLAGS_MANAGED_DICT but it can't be GC-ed safely.

@Skylion007
Copy link
Author

@Fidget-Spinner Your B_traverse function needs to also visit the type of the self object after 3.9. Doubt that's the problem though. https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_traverse

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Jul 21, 2022

@Skylion007 thanks for the catch. I didn't notice that.

I see that Pybind11 uses Py_TPFLAGS_MANAGED_DICT . Do your classes get properly GCed? Unless your classes are constructed from Python's type or types.new_class, they won't implement the proper traversal functions for __dict__ (unless you used private functions from CPython).

In mypyc, we manually called Py_VISIT on the __dict__ pre-3.11. The problem now is that the old manually set __dict__ isn't working with CPython and the new auto created __dict__ (via Py_TPFLAGS_MANAGED_DICT) can't be visited by the GC in the usual fashion (again unless we use private CPython functions).

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Jul 25, 2022

@pablogsal can I set this as deferred blocker please? This issue affects breaks mypyc and (previously) pybind11.
(see mypyc/mypyc#943)

I have a fix at #95242.

@pablogsal
Copy link
Member

Yeah, go ahead 👍

Bear in mind that if we mark it as deferred blocker there will be no beta to test this if the fix goes after today's release.

@pablogsal
Copy link
Member

pablogsal commented Jul 25, 2022

Also, seems that @markshannon disagrees that this is a CPython bug. Please check with him before merging anything.

I will try to review the issue and the bug more carefully soon, but today is going to be complicated as I have to do the last beta release.

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Jul 25, 2022

Yes it's not a CPython bug in 3.11 per-se. It relies on certain assumptions (see #95242 (comment)). It can also be solved by setting Py_TPFLAGS_MANAGED_DICT. The problem is that this solution in 3.11 isn't viable. In 3.11, the proper GC functions to clean up the managed __dict__ aren't available (they are hidden in private API). So the C extension class' managed __dict__ can never be GC-ed. These private GC functions are only properly set if we go through class creation by calling type. However, C extension classes will almost never do this.

I had a working fix for mypyc which set the flag properly, but it had no GC traversal for the __dict__ which isn't good.

The fix is a compatibility hack, and it should just stay in place till we get better class creation API in 3.12.

@markshannon
Copy link
Member

markshannon commented Jul 25, 2022

Rather than modify the (difficult to understand) internals of typeobject.c, wouldn't we be better exposing functions to visit and clear the managed dictionary? It seems like a safer alternative.

extern int PyObject_VisitManagedDict(PyObject *self, visitproc visit, void *arg);
extern void PyObject_ClearManagedDict(PyObject *self);

markshannon added a commit to faster-cpython/cpython that referenced this issue Aug 3, 2022
* Add test for inheriting explicit __dict__ and weakref.

* Restore 3.10 behavior for multiple inheritance of C extension classes that store their dictionary at the end of the struct.
@pablogsal
Copy link
Member

@Skylion007 could you confirm that #95596 is enough for pybind11 to work with 3.11 before the better API is provided in 3.12?

markshannon added a commit that referenced this issue Aug 4, 2022
* Add test for inheriting explicit __dict__ and weakref.

* Restore 3.10 behavior for multiple inheritance of C extension classes that store their dictionary at the end of the struct.
@rwgk
Copy link

rwgk commented Aug 4, 2022

@Skylion007 could you confirm that #95596 is enough for pybind11 to work with 3.11 before the better API is provided in 3.12?

We have a GHA workflow that uses https://github.com/actions/setup-python, but that's only pulling the current beta:

I searched but didn't get lucky: is there a GHA recipe for git cloning cpython from a branch (3.11), configure, make install?

I'll try that interactively for now.

@pablogsal
Copy link
Member

I searched but didn't get lucky: is there a GHA recipe for git cloning cpython from a branch (3.11), configure, make install?

I'll try that interactively for now.

Given that we merged this today I don't think there will be anything available, unfortunately.

As we are going to release the RC tomorrow, it would be great if you could validate this for us today even if its done manually :)

@rwgk
Copy link

rwgk commented Aug 4, 2022

@Skylion007 could you confirm that #95596 is enough for pybind11 to work with 3.11 before the better API is provided in 3.12?

TL;DR: Yes, we're good (pybind11).

Details:

I built Python 3.11 from scratch (configure; make install) for two versions:

  • "latest" = 5ac3d0f (happened to be HEAD of 3.11 when I ran git pull)
  • "back" = d8df7e0 (just before GH-92678: Fix tp_dictoffset inheritance. (GH-95596) (GH-95604))

I also backed out the Py_TPFLAGS_MANAGED_DICT changes from pybind11 (based on pybind11 master @ pybind/pybind11@ba5ccd8), diff below.

With that:

  • "latest": all unit pybind11 unit tests pass.
  • "back": the PyType_Ready error is back, see below.

For completeness:

  • WithOUT the diff below, both "latest" and "back" pass the pybind11 unit tests.
  • I'm seeing a test_embed ModuleNotFoundError for both "latest" and "back", with and without the diff below, but that's clearly unrelated to this MRO issue. Need to drill down.
diff --git a/include/pybind11/attr.h b/include/pybind11/attr.h
index db7cd8ef..65e223a3 100644
--- a/include/pybind11/attr.h
+++ b/include/pybind11/attr.h
@@ -345,11 +345,7 @@ struct type_record {

         bases.append((PyObject *) base_info->type);

-#if PY_VERSION_HEX < 0x030B0000
         dynamic_attr |= base_info->type->tp_dictoffset != 0;
-#else
-        dynamic_attr |= (base_info->type->tp_flags & Py_TPFLAGS_MANAGED_DICT) != 0;
-#endif

         if (caster) {
             base_info->implicit_casts.emplace_back(type, caster);
diff --git a/include/pybind11/detail/class.h b/include/pybind11/detail/class.h
index 42720f84..3db8429b 100644
--- a/include/pybind11/detail/class.h
+++ b/include/pybind11/detail/class.h
@@ -549,12 +549,8 @@ extern "C" inline int pybind11_clear(PyObject *self) {
 inline void enable_dynamic_attributes(PyHeapTypeObject *heap_type) {
     auto *type = &heap_type->ht_type;
     type->tp_flags |= Py_TPFLAGS_HAVE_GC;
-#if PY_VERSION_HEX < 0x030B0000
     type->tp_dictoffset = type->tp_basicsize;           // place dict at the end
     type->tp_basicsize += (ssize_t) sizeof(PyObject *); // and allocate enough space for it
-#else
-    type->tp_flags |= Py_TPFLAGS_MANAGED_DICT;
-#endif
     type->tp_traverse = pybind11_traverse;
     type->tp_clear = pybind11_clear;

With "back" version of 3.11:

Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests":
ImportError while loading conftest '/usr/local/google/home/rwgk/forked/pybind11/tests/conftest.py'.
conftest.py:16: in <module>
    import pybind11_tests
E   ImportError: VanillaDictMix1: PyType_Ready failed: TypeError: mro() returned base with unsuitable layout ('pybind11_tests.multiple_inheritance.WithDict')

@pablogsal
Copy link
Member

Thanks a lot for checking!

I'm closing this issue, let's reopen if we discover that we missed anything

@vstinner
Copy link
Member

vstinner commented Aug 5, 2022

"latest": all unit pybind11 unit tests pass.

Yeah! That's cool! Thanks for fixing pybind11 ;-)

@rwgk
Copy link

rwgk commented Aug 5, 2022

"latest": all unit pybind11 unit tests pass.

Yeah! That's cool! Thanks for fixing pybind11 ;-)

To clarify (especially for people looking here later):

rwgk pushed a commit to rwgk/pybind11 that referenced this issue Aug 5, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 9, 2022
…ary offset calculations. (pythonGH-95598)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
(cherry picked from commit 8d37c62)

Co-authored-by: Mark Shannon <mark@hotpy.org>
markshannon added a commit that referenced this issue Aug 9, 2022
…fset calculations. (GH-95598)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
markshannon added a commit that referenced this issue Aug 9, 2022
…fset calculations. (GH-95598) (GH-95821)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
Co-authored-by: Mark Shannon <mark@hotpy.org>
iritkatriel pushed a commit to iritkatriel/cpython that referenced this issue Aug 11, 2022
…ary offset calculations. (pythonGH-95598)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
@vstinner
Copy link
Member

See also issue #96046 which might be related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes type-feature A feature request or enhancement
Projects
Development

No branches or pull requests