Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 630: add disclaimers re. heap types & conversion, add PyType_GetModuleByDef #2319

Merged
merged 2 commits into from
Feb 14, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 93 additions & 24 deletions pep-0630.rst
Original file line number Diff line number Diff line change
@@ -21,9 +21,9 @@ describes problems of such per-process state and efforts to make
per-module state, a better default, possible and easy to use.

The document also describes how to switch to per-module state where
possible. The switch involves allocating space for that state, switching
from static types to heap types, and—perhaps most importantly—accessing
per-module state from code.
possible. The switch involves allocating space for that state, potentially
switching from static types to heap types, and—perhaps most
importantly—accessing per-module state from code.

About this document
-------------------
@@ -40,9 +40,15 @@ development, gaps identified in this text can show where to focus
the effort, and the text can be updated as new features are implemented

Whenever this PEP mentions *extension modules*, the advice also
applies to *built-in* modules, such as the C parts of the standard
library. The standard library is expected to switch to per-module state
early.
applies to *built-in* modules.

.. note::
This PEP contains generic advice. When following it, always take into
account the specifics of your project.

For example, while much of this advice applies to the C parts of
Python's standard library, the PEP does not factor in stdlib specifics
(unusual backward compatibility issues, access to private API, etc.).

PEPs related to this effort are:

@@ -101,7 +107,7 @@ testing the isolation, multiple module objects corresponding to a single
extension can even be loaded in a single interpreter.

Per-module state provides an easy way to think about lifetime and
resource ownership: the extension module author will set up when a
resource ownership: the extension module will initialize when a
module object is created, and clean up when it's freed. In this regard,
a module is just like any other ``PyObject *``; there are no “on
interpreter shutdown” hooks to think about—or forget about.
@@ -235,7 +241,8 @@ Set ``PyModuleDef.m_size`` to a positive number to request that many
bytes of storage local to the module. Usually, this will be set to the
size of some module-specific ``struct``, which can store all of the
module's C-level state. In particular, it is where you should put
pointers to classes (including exceptions) and settings (e.g. ``csv``'s
pointers to classes (including exceptions, but excluding static types)
and settings (e.g. ``csv``'s
`field_size_limit <https://docs.python.org/3.8/library/csv.html#csv.field_size_limit>`__)
which the C code needs to function.

@@ -302,9 +309,9 @@ zero. In your own module, you're in control of ``m_size``, so this is
easy to prevent.)

Heap types
~~~~~~~~~~
----------

Traditionally, types defined in C code were *static*, that is,
Traditionally, types defined in C code are *static*, that is,
``static PyTypeObject`` structures defined directly in code and
initialized using ``PyType_Ready()``.

@@ -321,10 +328,27 @@ the Python level: for example, you can't set ``str.myattribute = 123``.
that shares any Python objects across interpreters implicitly depends
on CPython's current, process-wide GIL.

An alternative to static types is *heap-allocated types*, or heap types
Because they are immutable and process-global, static types cannot access
“their” module state.
If any method of such a type requires access to module state,
the type must be converted to a *heap-allocated type*, or *heap type*
for short. These correspond more closely to classes created by Python’s
``class`` statement.

For new modules, using heap types by default is a good rule of thumb.

Static types can be converted to heap types, but note that
the heap type API was not designed for “lossless” conversion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Lossless" is a little vague IMO. Could you reword this with a clearer language?

from static types -- that is, creating a type that works exactly like a given
static type. Unlike static types, heap type objects are mutable by default.
Also, when rewriting the class definition in a new API,
you are likely to unintentionally change a few details (e.g. pickle-ability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reconsider this wording. Take care not to IMO sounds nicer than You are likely to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is the pickle issue still an issue? Wasn't that fixed by Serhiy long ago? The bpo issue that mentioned this issue is closed, marked as resolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pickle issue was closed, because there's now a mechanism to use if preserving pickle-ability is important for you. In the stdlib, it pretty much always is important, but some third-party libraries won't care.
Same for mutability.
OTOH, there are unexplored issues that stdlib doesn't run into but other libraries do, e.g. ones involving complex inheritance chains or metaclasses.

To keep the PEP general, I don't want to include the specific fixes here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get back to the original comment, I think changing the behavior is a fact of life rather than something to avoid. At least right now.
(It's different in the stdlib.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, there are unexplored issues that stdlib doesn't run into but other libraries do, e.g. ones involving complex inheritance chains or metaclasses.

To keep the PEP general, I don't want to include the specific fixes here.

Sounds good to me.

or inherited slots). Always test the details that are important to you.


Defining Heap Types
~~~~~~~~~~~~~~~~~~~

Heap types can be created by filling a ``PyType_Spec`` structure, a
description or “blueprint” of a class, and calling
``PyType_FromModuleAndSpec()`` to construct a new class object.
@@ -370,7 +394,7 @@ is called on a *subclass* of your type, ``Py_TYPE(self)`` will refer to
that subclass, which may be defined in different module than yours.

.. note::
The following Python code. can illustrate the concept.
The following Python code can illustrate the concept.
``Base.get_defining_class`` returns ``Base`` even
if ``type(self) == Sub``::

@@ -424,6 +448,52 @@ For example::
{NULL},
}

Module State Access from Slot Methods, Getters and Setters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. note::

This is new in Python 3.11.

.. After adding to limited API:
encukou marked this conversation as resolved.
Show resolved Hide resolved

If you use the `limited API <https://docs.python.org/3/c-api/stable.html>__,
you must update ``Py_LIMITED_API`` to ``0x030b0000``, losing ABI
compatibility with earlier versions.

Slot methods -- the fast C equivalents for special methods, such as
`nb_add <https://docs.python.org/3/c-api/typeobj.html#c.PyNumberMethods.nb_add>`__
for ``__add__`` or `tp_new <https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_new>`__
for initialization -- have a very simple API that doesn't allow
passing in the defining class as in ``PyCMethod``.
The same goes for getters and setters defined with
`PyGetSetDef <https://docs.python.org/3/c-api/structures.html#c.PyGetSetDef>`__.

To access the module state in these cases, use the
`PyType_GetModuleByDef <https://docs.python.org/typeobj.html#c.PyType_GetModuleByDef>`__
function, and pass in the module definition.
Once you have the module, call `PyModule_GetState <https://docs.python.org/3/c-api/module.html?highlight=pymodule_getstate#c.PyModule_GetState>`__
to get the state::

PyObject *module = PyType_GetModuleByDef(Py_TYPE(self), &module_def);
my_struct *state = (my_struct*)PyModule_GetState(module);
if (state === NULL) {
return NULL;
}

``PyType_GetModuleByDef`` works by searching the `MRO <https://docs.python.org/3/glossary.html#term-method-resolution-order>`__
(i.e. all superclasses) for the first superclass that has a corresponding
module.

.. note::

In very exotic cases (inheritance chains spanning multiple modules
created from the same definition), ``PyType_GetModuleByDef`` might not
return the module of the true defining class. However, it will always
return a module with the same definition, ensuring a compatible
C memory layout.


Open Issues
-----------

@@ -432,28 +502,18 @@ Several issues around per-module state and heap types are still open.
Discussions about improving the situation are best held on the `capi-sig
mailing list <https://mail.python.org/mailman3/lists/capi-sig.python.org/>`__.

Module State Access from Slot Methods, Getters and Setters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Currently (as of Python 3.9), there is no API to access the module state
from:

- slot methods (meaning type slots, such as ``tp_new``, ``nb_add`` or
``tp_iternext``)
- getters and setters defined with ``tp_getset``

Type Checking
~~~~~~~~~~~~~

Currently (as of Python 3.9), heap types have no good API to write
Currently (as of Python 3.10), heap types have no good API to write
``Py*_Check`` functions (like ``PyUnicode_Check`` exists for ``str``, a
static type), and so it is not easy to ensure whether instances have a
particular C layout.

Metaclasses
~~~~~~~~~~~

Currently (as of Python 3.9), there is no good API to specify the
Currently (as of Python 3.10), there is no good API to specify the
*metaclass* of a heap type, that is, the ``ob_type`` field of the type
object.

@@ -466,6 +526,15 @@ its variable-size storage is currently consumed by slots. Fixing this
is complicated by the fact that several classes in an inheritance
hierarchy may need to reserve some state.

Lossless conversion to heap types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The heap type API was not designed for “lossless” conversion from static types,
that is, creating a type that works exactly like a given static type.
The best way to address it would probably be to write a guide that covers
known “gotchas”.


Copyright
---------