Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C API: Investigate how the PyTypeObject members can be removed from the public C API #105970

Closed
vstinner opened this issue Jun 21, 2023 · 8 comments
Labels
topic-C-API type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

I propose to investigate an incompatible C API change: make the PyTypeObject and PyHeapTypeObject structures opaque, remove their members from the public C API (move them to the internal C API). We have to investigate how it's outside in 3rd party C extensions (ex: Cython, pybind11, etc.), and design a smooth migration plan.

The PyTypeObject structure is exposed as part of the public Python C API. For example, Py_TYPE(obj)->tp_name directly gets a type name (as a UTF-8 encoded byte string, char*).

The PyTypeObject members are NOT part of the limited C API (PEP 384).


In Python 3.9 (2020), I reworked the C API to avoid accessing directly PyTypeObject members at the ABI level: issue #84351. For example, Python 3.8 implements PyObject_IS_GC() as a macro:

/* Test if an object has a GC head */
#define PyObject_IS_GC(o) \
    (PyType_IS_GC(Py_TYPE(o)) \
     && (Py_TYPE(o)->tp_is_gc == NULL || Py_TYPE(o)->tp_is_gc(o)))

whereas Python 3.9 only provides an opaque function call:

/* Test if an object implements the garbage collector protocol */
PyAPI_FUNC(int) PyObject_IS_GC(PyObject *obj);

At the ABI level, the direct access to the PyTypeObject.tp_is_gc member became an opaque function call.


Changing PyTypeObject API and ABI caused a lot of troubles in the past. Example:


For many years, there is a work-in-progress to convert all Python built-in types and types of stdlib extensions from static types to heap types. See for example issue #84258 and PEP 630.

The API and ABI for heap type was also enhanced over the years. Examples:


In the past, other structure members were removed:

  • PyInterpreterState: Python 3.8
  • PyGC_Head: Python 3.9
  • PyFrameObject: Python 3.11

The work was also prepared for:

@vstinner vstinner added the type-bug An unexpected behavior, bug, or error label Jun 21, 2023
@vstinner
Copy link
Member Author

For Py_TYPE(obj)->tp_name pattern, I proposed adding %T format to PyUnicode_FromFormat(): issue GH-78776. See related python-dev discussion. The change was reverted.

@ronaldoussoren
Copy link
Contributor

ronaldoussoren commented Jun 22, 2023

One thing I don't know how to do with the spec-based API is adding additional C fields to a type without exposing them to Python. That is something I have a use-case for in PyObjC and can currently do by embedding a PyTypeObject (or rather PyHeapTypeObject in a larger struct). Storing the data out of line, for example in some kind of hash table, likely results in additional overhead in what is in the end fairly hot code.

I want to try moving PyObjC to the spec-based API during the 3.13 release cycle if I have enough free time, but that's a fairly big "if". The longer term goal with that is to support sub interpreters.

@vstinner
Copy link
Member Author

@ronaldoussoren:

One thing I don't know how to do with the spec-based API is adding additional C fields to a type without exposing them to Python

Isn't it the purpose of PEP 697 – Limited C API for Extending Opaque Types? It's said to be fully implemented in Python 3.12.

cc @encukou

@vstinner
Copy link
Member Author

vstinner commented Jun 23, 2023

Changing PyTypeObject and PyHeapTypeObject API is complex because they have 112 members in total.

In general, most members can be get with PyType_GetSlot() and many can be get and set with PyObject_GetAttrString() and PyObject_SetAttrString(). I listed members with more specialized way to access them.

Attributes (read-only):

  • tp_base: get "__base__" attribute
  • tp_basicsize: _PyObject_SIZE()
  • tp_doc: get/set "__doc__" attribute
  • tp_flags: PyType_GetFlags(), PyType_HasFeature()
  • tp_itemsize: _PyObject_VAR_SIZE()
  • tp_vectorcall_offset: PyVectorcall_Function(), PyObject_Vectorcall()

Attributes:

  • tp_bases: get/set "__bases__" attribute, _PyType_GetBases()
  • tp_name: PyType_GetName(), PyType_GetQualName(), set "__name__" attribute

Internal functions for the GC:

  • tp_clear: used indirectly by Py_Finalize()
  • tp_traverse: call gc.get_objects() function

Internal attributes and methods:

  • tp_dealloc: _Py_Dealloc(), used by Py_DECREF()
  • tp_members
  • tp_mro (read-only): PyObject_GetAttrString(type, "__mro__")
  • tp_subclasses: call __subclasses__() method, _PyType_GetSubclasses()
  • tp_version_tag: method cache, PyUnstable_Type_AssignVersionTag(), _PyType_Lookup()
  • tp_watched: PyType_Watch(), PyType_Unwatch()
  • tp_weaklist: used by tp_weaklistoffset
  • tp_weaklistoffset: get "__weakrefoffset__" attribute
  • tp_vectorcall: used by tp_vectorcall_offset

Async methods:

  • tp_as_async.am_await: call __await__() method, _PyCoro_GetAwaitableIter()
  • tp_as_async.am_aiter: call __aiter__() method
  • tp_as_async.am_anext: call __anext__() method
  • tp_as_async.am_send: PyIter_Send()

Buffer methods:

  • tp_as_buffer.bf_getbuffer: PyObject_GetBuffer()``
  • tp_as_buffer.bf_releasebuffer: PyBuffer_Release()

Mapping methods:

  • tp_as_mapping.mp_length: PyMapping_Size()
  • tp_as_mapping.mp_subscript: PyObject_GetItem()
  • tp_as_mapping.mp_ass_subscript: PyObject_SetItem()

Number methods:

  • tp_as_number.nb_add: PyNumber_Add()
  • tp_as_number.nb_subtract: PyNumber_Subtract()
  • tp_as_number.nb_multiply: PyNumber_Multiply()
  • tp_as_number.nb_remainder: PyNumber_Remainder()
  • tp_as_number.nb_divmod: PyNumber_Divmod()
  • tp_as_number.nb_power: PyNumber_Power()
  • tp_as_number.nb_negative: PyNumber_Negative()
  • tp_as_number.nb_positive: PyNumber_Positive()
  • tp_as_number.nb_absolute: PyNumber_Absolute()
  • tp_as_number.nb_bool: PyObject_IsTrue()
  • tp_as_number.nb_invert: PyNumber_Invert()
  • tp_as_number.nb_lshift: PyNumber_Lshift()
  • tp_as_number.nb_rshift: PyNumber_Rshift()
  • tp_as_number.nb_and: PyNumber_And()
  • tp_as_number.nb_xor: PyNumber_Xor()
  • tp_as_number.nb_or: PyNumber_Or()
  • tp_as_number.nb_int: PyNumber_Long()
  • tp_as_number.nb_float: PyNumber_Float()
  • tp_as_number.nb_inplace_add: PyNumber_InPlaceAdd()
  • tp_as_number.nb_inplace_subtract: PyNumber_InPlaceSubtract()
  • tp_as_number.nb_inplace_multiply: PyNumber_InPlaceMultiply()
  • tp_as_number.nb_inplace_remainder: PyNumber_InPlaceRemainder()
  • tp_as_number.nb_inplace_power: PyNumber_InPlacePower()
  • tp_as_number.nb_inplace_lshift: PyNumber_InPlaceLshift()
  • tp_as_number.nb_inplace_rshift: PyNumber_InPlaceRshift()
  • tp_as_number.nb_inplace_and: PyNumber_InPlaceAnd()
  • tp_as_number.nb_inplace_xor: PyNumber_InPlaceXor()
  • tp_as_number.nb_inplace_or: PyNumber_InPlaceOr()
  • tp_as_number.nb_floor_divide: PyNumber_FloorDivide()
  • tp_as_number.nb_true_divide: PyNumber_TrueDivide()
  • tp_as_number.nb_inplace_floor_divide: PyNumber_InPlaceFloorDivide()
  • tp_as_number.nb_inplace_true_divide: PyNumber_InPlaceTrueDivide()
  • tp_as_number.nb_index: PyNumber_Index()
  • tp_as_number.nb_matrix_multiply: PyNumber_MatrixMultiply()
  • tp_as_number.nb_inplace_matrix_multiply: PyNumber_InPlaceMatrixMultiply()

Sequence methods:

  • tp_as_sequence.sq_length: PySequence_Size()
  • tp_as_sequence.sq_concat: PySequence_Concat()
  • tp_as_sequence.sq_repeat: PySequence_Repeat()
  • tp_as_sequence.sq_item: PySequence_GetItem()
  • tp_as_sequence.sq_ass_item: PySequence_SetItem()
  • tp_as_sequence.sq_contains: PySequence_Contains()
  • tp_as_sequence.sq_inplace_concat: PySequence_InPlaceConcat()
  • tp_as_sequence.sq_inplace_repeat: PySequence_InPlaceRepeat()

Attribute lookup, get/attr instance attributes:

  • tp_descr_get
  • tp_descr_set
  • tp_dict: get "__dict__" attribute, PyObject_GenericGetAttr(), PyObject_GenericSetDict(), _PyObject_GetDictPtr(). See also Python 3.10 Py_TPFLAGS_MANAGED_DICT.
  • tp_dictoffset: get "__dictoffset__" attribute
  • tp_getattr: PyObject_GetAttr()
  • tp_getattro: PyObject_GetAttr()
  • tp_setattr: PyObject_SetAttr()
  • tp_setattro: PyObject_SetAttr()
  • tp_getset
  • tp_methods

Methods and how to call them:

  • tp_alloc: PyType_GenericNew()
  • tp_call: PyObject_Call()
  • tp_del: call __del__() method (?), subtype_dealloc()
  • tp_finalize: call PyObject_CallFinalizer()
  • tp_free: get PyType_GetSlot(Py_tp_free)
  • tp_hash: PyObject_Hash()
  • tp_init: get PyType_GetSlot(Py_tp_init), call a type to create an instance, type_call()
  • tp_is_gc: PyObject_IS_GC()
  • tp_iter: PyObject_GetIter()
  • tp_iternext: PyIter_Next()
  • tp_new: get PyType_GetSlot(Py_tp_new), call a type to create an instance, type_call()
  • tp_repr: PyObject_Repr()
  • tp_richcompare: PyObject_RichCompare()
  • tp_str: PyObject_Str()

PyHeapTypeObject internal members:

  • _ht_tpname
  • _spec_cache
  • ht_cached_keys
  • ht_slots

PyHeapTypeObject members:

  • ht_type: ?
  • as_async: copy of PyTypeObject.tp_as_async
  • as_number: copy of PyTypeObject.tp_as_number
  • as_mapping: copy of PyTypeObject.tp_as_mapping
  • as_sequence: copy of PyTypeObject.tp_as_sequence
  • as_buffer: copy of PyTypeObject.tp_as_buffer
  • ht_name: PyType_GetName(), set "__name__" attribute
  • ht_qualname: PyType_GetQualName()
  • ht_module: PyType_GetModule(), PyType_GetModuleByDef()

Unused members:

  • tp_cache
  • tp_as_number.nb_reserved
  • tp_as_sequence.was_sq_slice
  • tp_as_sequence.was_sq_ass_slice

@vstinner
Copy link
Member Author

Creating a static type PyTypeObject my_type = {...}; requires accessing directly all PyTypeObject members. This API, like PyType_Ready(), should be deprecated in favor of heap types.

Creating a heap type can be done with the PyType_Spec structure and PyType_FromSpec() function. This API doesn't access PyTypeObject or PyHeapTypeObject members on purpose. It helps the ABI backward/forward compatibility: the stable ABI used it for example.

The Python stdlib is a bad example, it still has many static types in Python 3.13: issue #84258. Moreover, some static types may remain implemented as static types since they are exposed in the C API, such as &PyUnicode_Type (issue #84781).

What changed recently is that many data are now "static": generated at build type and constant. "Static" types means more and more something really "static", like it's no longer needed to release their memory at Python exit (issue #103276).

@vstinner
Copy link
Member Author

  • tp_dict: get "dict" attribute, PyObject_GenericGetAttr(), PyObject_GenericSetDict(), _PyObject_GetDictPtr(). See also Python 3.10 Py_TPFLAGS_MANAGED_DICT.
  • tp_dictoffset: get "dictoffset" attribute

PyType_GetDict() is excluded from the limited C API, and _PyObject_GetDictPtr() is part of the internal API.

@vstinner
Copy link
Member Author

vstinner commented Nov 8, 2023

Using tp_name to format an error message is a very common code pattern. I created issue gh-111696 for that.

@vstinner
Copy link
Member Author

Let's revisit that once PEP 737 is accepted and implementated, it's one of the most common consumer of the PyTypeObject.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants