Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_decimal: pi benchmark 31-45% slower compared to 3.9 #114682

Closed
skrah opened this issue Jan 28, 2024 · 46 comments
Closed

_decimal: pi benchmark 31-45% slower compared to 3.9 #114682

skrah opened this issue Jan 28, 2024 · 46 comments
Labels
performance Performance or resource usage topic-subinterpreters type-bug An unexpected behavior, bug, or error

Comments

@skrah
Copy link
Contributor

skrah commented Jan 28, 2024

Bug report

Bug description:

The pi benchmark in Modules/_decimal/tests/bench.py is up to 31% slower with --enable-gil and up to 45% slower with --disable-gil compared to Python 3.9 as the baseline.

Change the number of iterations to 100000 for more stable benchmarks.

This is heavily OS and hardware dependent. The worst case cited above was observed on a six core i5-8600K.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

@skrah skrah added the type-bug An unexpected behavior, bug, or error label Jan 28, 2024
@Eclips4 Eclips4 added the performance Performance or resource usage label Jan 28, 2024
@sunmy2019
Copy link
Member

Can you try --disable-gil with current main?

@skrah
Copy link
Contributor Author

skrah commented Jan 29, 2024

The figures are almost the same, around 30% (enable) and 43% (disable).

@skrah
Copy link
Contributor Author

skrah commented Jan 29, 2024

The base slowdown of 30% is of course due to the module state changes. For comparison, and to see that the i5-8600K is not inherently underperforming with threads:

The ideal situation is implemented in libmpdec++: It has a C++11 thread local context, which is also called for every inline operation. The C++11 TLS adds just 4% overhead compared to pure C (where the context is passed directly to the functions).

This of course would be quite difficult to achieve in CPython. But the thread local context may not even be the primary culprit. In Martin von Loewis' first module state implementation the basic slowdown for _decimal was 20%, and a good part of that was due to the heap types.

@skrah
Copy link
Contributor Author

skrah commented Jan 29, 2024

The ideal situation is implemented in libmpdec++: It has a C++11 thread local context, which is also called for every inline operation. The C++11 TLS adds just 4% overhead compared to pure C (where the context is passed directly to the functions).

This of course would be quite difficult to achieve in CPython.

The wording was not ideal: The key point is that C++ achieves this without a cached context, which is of course now in the module state.

@ericsnowcurrently
Copy link
Member

@encukou, any thoughts on this?

@neonene
Copy link
Contributor

neonene commented Feb 2, 2024

PyModule_GetState() Getting a module state via PyType_GetModuleByDef() seems to be expensive for heavy use.
It would be nice if a heap type could own the module state's address at its creation time to recover more than 10%.

@erlend-aasland
Copy link
Contributor

PyModule_GetState() via PyType_GetModuleByDef() seems to be expensive for heavy use. It would be nice if a heap type could own the module state's address at its creation time to recover more than 10%.

_PyModule_GetState() is used, not PyModule_GetState(); I'd be surprised if this is the culprit:

static inline void* _PyModule_GetState(PyObject* mod) {
assert(PyModule_Check(mod));
return ((PyModuleObject *)mod)->md_state;
}

OTOH, PyType_GetModuleByDef() can be more expensive. Storing the state pointer in the type struct is a possibility.

@neonene
Copy link
Contributor

neonene commented Feb 2, 2024

@ericsnowcurrently Can a module state be accessed correctly with the following?

_Py_thread_local module_state *thread_local_state; // set at the exec phase

If not invalid, it will be useful for the function which currently has no way to get the module state. A _Py_thread_local pointer may not be so fast to access, but it can also bypass a state struct?

@ericsnowcurrently
Copy link
Member

Multiple interpreters can run in the same thread, so a thread-local may be ambiguous in that case.


As to storing a reference to the module state on the type object, that shouldn't be necessary. Every heap type as a ht_module field that points to the module, accessible via PyType_GetModule(). From there you call PyModule_GetState(). In fact, we already have PyType_GetModuleState() that combines those two calls. That's fairly efficient, though obviously not as efficient as a static global.

The only caveat is that PyType_GetModule() (and thus PyType_GetModuleState()) operates relative to the given type. Thus subclasses make things trickier. IIRC, when they're involved, you end up having to walk the MRO one way or another, which is what PyType_GetModuleByDef() does.

FWIW, the common case where PyType_GetModuleByDef() is used is either in "tp slot" functions or intermediate static functions that aren't doing the right thing. In both cases the problem lies in throwing away the expensive information we already have. For tp slot functions the API doesn't pass the module in, even though it could. We're stuck with backward compatibility issues there. For intermediate static functions, I expect it's all too common to figure out the module (or module state) at some "leaf node" function but not pass it through the intermediate functions. Thus the relatively expensive PyType_GetModuleByDef() gets called more than it needs to be.

I suppose another problem case is with callbacks that don't provide a way to pass the module through. I'm not sure that's a common issue though.

Back to the _decimal module, I haven't looked at what usage patterns are resulting in such a significant slowdown.

@erlend-aasland
Copy link
Contributor

Thus subclasses make things trickier. IIRC, when they're involved, you end up having to walk the MRO one way or another, which is what PyType_GetModuleByDef() does.

Yes; heap types that are not base classes can go the fast route via PyType_GetModuleState() (and _PyType_GetModuleState()).

@encukou

This comment was marked as off-topic.

@hauntsaninja

This comment was marked as resolved.

@skrah
Copy link
Contributor Author

skrah commented Feb 4, 2024

The float object gets the luxury of its own freelist and avoids the slow PyObject_GC_* functions altogether. The Decimal constructor is a pure function (the context is only used for recording errors) and Decimals are immutable, just like floats.

It is likely that 15 percentage points can be shaved off just by doing the same as in floatobject.c and then focus on the thread local context (indeed, PyType_GetModuleByDef() is called at dozens of places).

@erlend-aasland
Copy link
Contributor

AFAICS, we can trim away a lot of the PyType_GetModuleByDef calls. For example, a lot of them are happening in methods; we can use the METH_METHOD calling convention on practically all of these, so we can access the module state using _PyType_GetModuleState:

_PyType_GetModuleState(PyTypeObject *type)
{
assert(PyType_Check(type));
assert(type->tp_flags & Py_TPFLAGS_HEAPTYPE);
PyHeapTypeObject *et = (PyHeapTypeObject *)type;
assert(et->ht_module);
PyModuleObject *mod = (PyModuleObject *)(et->ht_module);
assert(mod != NULL);
return mod->md_state;
}

Also (but not performance relevant), the find_state_left_or_right() calls in the Py_tp_richcompare slots can be replaced with get_module_state_by_def() calls.

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Feb 5, 2024

FTR: I did a quick proof-of-concept for some low-hanging fruit, and I already got a 4% 10% 12% speedup.

@erlend-aasland
Copy link
Contributor

So, here's my suggested approach. Let's move the relevant parts of bench.py over to pyperformance. That way, it is easier to keep an eye on performance regressions like this; the faster-cpython team is alert to sudden changes in a particular benchmark.

I'll try to tear out as many PyType_GetModuleByDef as possible and create a PR, hopefully within this week. The diff is going to be huge, but OTOH it is a mechanical change.

@encukou
Copy link
Member

encukou commented Feb 5, 2024

Thank you for getting to this so fast!
Let me know if you need any help, and feel free to assign me to reviews.

@erlend-aasland
Copy link
Contributor

Let me know if you need any help, and feel free to assign me to reviews.

Perhaps you could migrate bench.py over to pyperformance?

For now, I've used this patch for local benchmarking with pyperf1:

diff --git a/Modules/_decimal/tests/bench.py b/Modules/_decimal/tests/bench.py
index 640290f2ec..6b6117f108 100644
--- a/Modules/_decimal/tests/bench.py
+++ b/Modules/_decimal/tests/bench.py
@@ -11,9 +11,14 @@
 from functools import wraps
 from test.support.import_helper import import_fresh_module
 
+import pyperf
+
+
 C = import_fresh_module('decimal', fresh=['_decimal'])
 P = import_fresh_module('decimal', blocked=['_decimal'])
 
+runner = pyperf.Runner()
+
 #
 # NOTE: This is the pi function from the decimal documentation, modified
 # for benchmarking purposes. Since floats do not have a context, the higher
@@ -81,33 +86,20 @@ def wrapper(*args, **kwargs):
     return _increase_int_max_str_digits
 
 def test_calc_pi():
-    print("\n# ======================================================================")
-    print("#                   Calculating pi, 10000 iterations")
-    print("# ======================================================================\n")
-
     to_benchmark = [pi_float, pi_decimal]
     if C is not None:
         to_benchmark.insert(1, pi_cdecimal)
 
     for prec in [9, 19]:
-        print("\nPrecision: %d decimal digits\n" % prec)
         for func in to_benchmark:
-            start = time.time()
             if C is not None:
                 C.getcontext().prec = prec
             P.getcontext().prec = prec
-            for i in range(10000):
-                x = func()
-            print("%s:" % func.__name__.replace("pi_", ""))
-            print("result: %s" % str(x))
-            print("time: %fs\n" % (time.time()-start))
+            name = f"{func.__name__}, precision: {prec} decimal digits"
+            runner.bench_func(name, func)
 
 @increase_int_max_str_digits(maxdigits=10000000)
 def test_factorial():
-    print("\n# ======================================================================")
-    print("#                               Factorial")
-    print("# ======================================================================\n")
-
     if C is not None:
         c = C.getcontext()
         c.prec = C.MAX_PREC
@@ -147,4 +139,4 @@ def test_factorial():
 
 if __name__ == "__main__":
     test_calc_pi()
-    test_factorial()
+    #test_factorial()

Footnotes

  1. for the pi benchmark only!

@neonene
Copy link
Contributor

neonene commented Feb 6, 2024

@skrah Do you have a significant regression test case where subclasses are frequently used? 3.12 (sigle-phase init/static types) was almost as slow as main in the given test for me once D = C.Decimal was replaced with class D(C.Decimal).

@erlend-aasland
Copy link
Contributor

@skrah Do you have a significant regression test case where subclasses are frequently used? 3.12 (sigle-phase init/static types) was almost as slow as main in the given test for me once D = C.Decimal was replaced with class D(C.Decimal).

PyType_GetModuleByDef is slower for subclasses. Reducing the number of PyType_GetModuleByDef will have an impact.

@skrah
Copy link
Contributor Author

skrah commented Feb 6, 2024

@neonene Your test is reasonable! Subclasses are so slow that within 3.9 thatfloat is slowed down by 3.3 times and _decimal by 2.2 times.

The slowdown for subclasses within main is something like 4.5 times for float and 1.9 times for decimal. That seems to suggest, as I mentioned earlier, that certain things are better optimized for base class floats than for base class Decimals.

But in general, compared to 3.9, subclasses themselves seem to have gotten faster in main, so all comparisons are a bit messy.

@neonene
Copy link
Contributor

neonene commented Feb 6, 2024

Regarding the module-state-access after METH_METHOD (PEP 573) is applied to _decimal's methods, the concerns will move to hot tp_* functions, if any. A possible optimization would be:

PyObject *module = PyType_GetModule(type);
if (!module || _PyModule_GetDef(module) != module_def) {
    PyErr_Clear();
    // Subclasses can ignore the overhead for now? Otherwise,
    // the type should cache the module state (refcount-free) with the def.
    module = PyType_GetModuleByDef(type, module_def);
}
module_state *state = _PyModule_GetState(module);

Currently, a faster version of PyType_GetModule() is not offered, right?

@vstinner
Copy link
Member

vstinner commented Feb 7, 2024

Regarding the module-state-access after METH_METHOD (PEP 573) is applied to decimal's methods, the concerns will move to hot tp* functions, if any.

In 2018, I wrote a change to use FASTCALL calling convention which made the telco benchmark 22% faster. But Stefan Krah was the module maintainer and was against the idea, so I gave up. See issue gh-73487.

@vstinner
Copy link
Member

vstinner commented Feb 7, 2024

PyType_GetModuleByDef is slower for subclasses. Reducing the number of PyType_GetModuleByDef will have an impact.

If PyType_GetModuleByDef() overhead (compared to the previous version which didn't use it) is too high, we can cache directly important data (such as decimal_state *state) in instances (ex: PyDecObject), rather than getting it from the instance type.

@erlend-aasland
Copy link
Contributor

[...] we can cache directly important data (such as decimal_state *state) in instances (ex: PyDecObject), rather than getting it from the instance type.

Yes, I already suggested it; we do this in the sqlite3 extension module with great success.

@erlend-aasland
Copy link
Contributor

In 2018, I wrote a change to use FASTCALL calling convention which made the telco benchmark 22% faster. But Stefan Krah was the module maintainer and was against the idea, so I gave up. See issue gh-73487.

I see Serhiy's post point to a bpo post where Stefan disapproved of applying Argument Clinic only. Well, I used Argument Clinic extensively in my patch for speeding up _decimal; it did not break a single test ;)

@encukou
Copy link
Member

encukou commented Feb 7, 2024

Currently, a faster version of PyType_GetModule() is not offered, right?

That would be using PyHeapTypeObject.ht_module directly. If we know it's heap type, the check for Py_TPFLAGS_HEAPTYPE can become an assert.

@skrah
Copy link
Contributor Author

skrah commented Feb 7, 2024

That is an extraordinary comment from Mr. Stinner. It is entirely irrelevant to this issue:

  • This issue is about the slowdown introduced by the module state changes.

  • If AC speeds up _decimal in the fix, that speedup just camouflages the module state slowdown and makes honest benchmarks for this issue impossible!

  • In the linked FASTCALL issue, both @rhettinger and ultimately Mr. Stinner himself concurred with me or at least acquiesced.

  • At that time the new FASTCALL convention was already on the horizon and would have required another change!

  • At that time I planned to use the new FASTCALL convention directly.

  • I was not the "module maintainer", but the author. Big difference.

In short, bringing up a heavily editorialized version of the actual events just distracts from the problems at hand.

@erlend-aasland
Copy link
Contributor

I withdraw myself from this discussion; I am no longer working on this issue.

@neonene
Copy link
Contributor

neonene commented Feb 7, 2024

If PyType_GetModuleByDef() overhead (compared to the previous version which didn't use it) is too high, we can cache directly important data (such as decimal_state *state) in instances (ex: PyDecObject), rather than getting it from the instance type.

The cache in a known object is mainly used to reduce function arguments in the call chains, right? Indeed, many PyType_GetModuleByDef() can be removed there.

I think a cache in the type instance can be used in the tp slot functions where PyType_GetModuleByDef() needs to be placed to check(exact) the given object's type through a module state, which seems not to be beneficial for the current very slow subclasses.

I think we can measure/consider the boost by FASTCALL separately.

@vstinner
Copy link
Member

vstinner commented Feb 7, 2024

That is an extraordinary comment from Mr. Stinner. It is entirely irrelevant to this issue

I reacted to a comment about METH_METHOD. For me, a common way to use METH_METHOD is to use Argument Clinic with cls: defining_class.

@neonene
Copy link
Contributor

neonene commented Feb 9, 2024

@skrah METH_METHOD|METH_FASTCALL might be (worse than) neutral on the pi test. I'm leaving an experimental PR #115196, which was my first AC experience.

@skrah
Copy link
Contributor Author

skrah commented Feb 9, 2024

@neonene Thanks for doing this! AC does not seem to affect the number methods (nb_add etc.) used in the pi benchmark. So it is indeed entirely irrelevant to this issue (as you hinted at).

I cannot reproduce the stated 22% improvement for the telco benchmark (telco.py full) either. I get something around 10%.

If you want to work on a patch without AC, I can review it. The strategy suggested earlier of caching the state in the decimal object seems to be the most fruitful (intuitively, the actual patch will require some experimentation).

Generally, it would be good to have a reference module in Python core that does not use AC. AC is unused/not available in the vast majority of C-extensions. Many modules in the scientific ecosystem may want to use module state and vectorcall but not AC.

@neonene

This comment was marked as resolved.

@neonene
Copy link
Contributor

neonene commented Feb 10, 2024

I hit a crash at the METH_METHOD method whose defining_class argument was NULL

I take back. *_impl(self, cls) has in the module several callers which previously passed a NULL as a dummy.

@neonene
Copy link
Contributor

neonene commented Feb 10, 2024

I got no performance change by switching to _PyType_GetModuleState() in the *_impl functions: 1eeb531. As to PEP-573, applying METH_METHOD only to hot methods would be enough, if any.

@neonene
Copy link
Contributor

neonene commented Feb 10, 2024

def pi():
    import _decimal
    D = _decimal.Decimal
    for i in range(10000):
        lasts, t, s, n, na, d, da = D(0), D(3), D(3), D(1), D(0), D(0), D(24)
        while s != lasts:
            lasts = s
            n, na = n+na, na+8
            d, da = d+da, da+32
            t = (t * n) / d
            s += t

Module state access related? functions:

* python313.dll / pi  (Windows)
                                     entry  static       dynamic     %     run
Function Name                        count   instr         instr  total   total
PyType_GetModuleByDef             17520026      28     350400520   13.2   29.7
PyContextVar_Get                   3440000      66     106639996    4.0   37.8
PyObject_IsTrue                     431188      44       3058140    0.1   96.0
PyBool_FromLong                     432423       6       2174167    0.1   96.8
PyArg_ParseTupleAndKeywords          70160      33       1683905    0.1   97.2
PyObject_TypeCheck                   70845      10        425126    0.0   98.8


* python313.dll / test_decimal
                                     entry  static       dynamic     %     run
Function Name                        count   instr         instr  total   total
PyType_GetModuleByDef               758275      28      18817245    0.1   74.5
PyObject_IsTrue                    1241005      44      17488210    0.1   76.3


* _decimal.pyd / pi
                                     entry  static       dynamic     %     run
Function Name                        count   instr         instr  total   total
convert_op                         5880000      46     119280000    4.8   10.9
nm_mpd_qadd                        2100000      66     105000000    4.3   19.6
PyDecType_New                      3850000      28      84700000    3.4   34.8
get_module_state_by_def           14150005       5      70750025    2.9   40.6
dec_addstatus                      3850000      43      53900000    2.2   55.4
current_context                    3440000      16      41280002    1.7   66.8
PyObject_TypeCheck                 6380000      10      41010000    1.7   68.4
PyDecType_FromLongExact             910000      54      35490000    1.4   74.3
find_state_left_or_right           3370000      13      30330000    1.2   82.4
dec_from_long                       910000      52      29640000    1.2   83.6
nm_mpd_qdiv                         420000      66      21000000    0.9   87.8
nm_mpd_qmul                         420000      66      21000000    0.9   88.6
dec_richcompare                     430000     104      19780000    0.8   91.9
convert_op_cmp                      430000     148      13330000    0.5   94.5
_dec_settriple                      910000      12      10920000    0.4   97.4
dec_new                              70000      46       2380000    0.1   99.8
PyDecType_FromObjectExact            70000      79       1890000    0.1   99.9


* _decimal.pyd / test_decimal
                                     entry  static       dynamic     %     run
Function Name                        count   instr         instr  total   total
exception_as_flag                   389224      16      14121748    1.9   59.6
numeric_as_ascii                     27009      80       4834675    0.7   74.8
context_getattr                     216906      30       4254322    0.6   77.8
signaldict_getitem                  199787      31       3864061    0.5   78.9
get_module_state_by_def             730003       5       3650015    0.5   79.9
list_as_flags                        31123      23       1991184    0.3   89.7
PyDecType_New                        86927      28       1912610    0.3   90.5
dec_new                              54158      46       1793903    0.2   91.0
signaldict_setitem                   65001      44       1624971    0.2   91.5
context_setattrs                     16192     106       1387404    0.2   92.6
PyDecType_FromObjectExact            54156      79       1357695    0.2   92.8
signals_as_list                      15564      26       1353918    0.2   93.0
dec_as_integer_ratio                 12005     149       1304050    0.2   93.4
dec_addstatus                        91578      43       1285473    0.2   93.5
getround                             17391      37       1280117    0.2   93.9
current_context                      92795      16       1113546    0.2   94.6
PyDecType_FromLongExact              28166      54       1098474    0.2   94.7
dec_from_long                        28178      52        929611    0.1   95.1
PyDecType_FromCStringExact           26629      43        905482    0.1   95.3
PyObject_TypeCheck                  119689      10        884052    0.1   95.5
dec_as_long                          14946      96        881027    0.1   95.6
context_init                         16130      52        806500    0.1   95.8
dec_dealloc                          86927       9        782343    0.1   95.9
convert_op                           37315      46        712770    0.1   96.1
context_new                          16130      63        677475    0.1   96.2
dec_richcompare                      14168     104        652109    0.1   96.4
nm_mpd_qdiv                          12017      66        600801    0.1   96.7
PyDecType_FromUnicodeExactWS         26599      20        531980    0.1   96.9
convert_op_cmp                       14168     148        441960    0.1   97.3

@skrah
Copy link
Contributor Author

skrah commented Feb 11, 2024

I'm getting a smaller speedup by using PyType_GetModuleState() in the number methods and a larger one by using PyObject_New for the decimal constructor.

The _threadmodule itself uses PyObject_New in new_thread_handle.

@ericsnowcurrently, is there a reason why PyObject_New cannot be used for the immutable Decimal type?

@vstinner
Copy link
Member

@erlend-aasland:

I withdraw myself from this discussion; I am no longer working on this issue.

@skrah:

That is an extraordinary comment from Mr. Stinner. It is entirely irrelevant to this issue

Well, same than Erlend for me: I unsubscribe. I'm not interested to work on an issue with such tone in the discussion.

@ericsnowcurrently
Copy link
Member

is there a reason why PyObject_New cannot be used for the immutable Decimal type?

I'm not aware of any reason. It isn't a GC type, right?

@skrah
Copy link
Contributor Author

skrah commented Feb 12, 2024

I'm not aware of any reason. It isn't a GC type, right?

Yes, up to 3.12 it wasn't a GC type. It has been made a GC type in main during the module state changes.

I now found some explanations in PEP-630, but I still do not understand it:

Instances of heap types hold a reference to their type.
This ensures that the type isn't destroyed before all its instances are,
but may result in reference cycles that need to be broken by the
garbage collector.

How would one create a cycle?

>>> x = Decimal("3.222")
>>> x.y = x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    x.y = x
    ^^^
AttributeError: 'decimal.Decimal' object has no attribute 'y' and no __dict__ for setting new attributes
>>> x = Decimal("3.222")
>>> t = type(x)
>>> t.y = x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    t.y = x
    ^^^
TypeError: cannot set 'y' attribute of immutable type 'decimal.Decimal'
>>> x.adjusted = 10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    x.adjusted = 10
    ^^^^^^^^^^
AttributeError: 'decimal.Decimal' object attribute 'adjusted' is read-only

@neonene
Copy link
Contributor

neonene commented Feb 13, 2024

The strategy suggested earlier of caching the state in the decimal object seems to be the most fruitful (intuitively, the actual patch will require some experimentation).

@skrah I have posted a draft: gh-115401

@ericsnowcurrently
Copy link
Member

Yes, up to 3.12 it wasn't a GC type. It has been made a GC type in main during the module state changes.

Ah, there are definitely some subtleties involved with GC types, especially when switching from a non-GC type. I don't recall the specifics, but I'm sure @encukou does.

@encukou
Copy link
Member

encukou commented Feb 14, 2024

It might be more interesting to start with static (non-GC) types: these are “immortal” (statically allocated), but they do have a bunch of dynamically allocated, interpreter-specific data (e.g. subclasses, weakrefs). Getting this data properly cleaned up requires rather special hacks, appropriate for core types that really are created/destroyed with the interpreter itself, rather than extensions loaded on demand (see code around static_types[] in Objects/object.c for details).
On the other hand, GC types (heap types) use reference counting like any other Python object. Not saying that getting rid of all the assumptions of static-ness/immortality is/was smooth, but at least the model is much clearer: we have a good idea of what the details should be, even though getting there is a bumpy road full of backcompat issues.


Instances of heap types hold a reference to their type.
This ensures that the type isn't destroyed before all its instances are,
but may result in reference cycles that need to be broken by the
garbage collector.

This is a generic statement. A class with no __dict__ nor user-settable attributes probably can't create a reference cycle.
While we could omit the instance→type reference only in cases where one can prove this, I don't think such a special case would be worth it.

@neonene
Copy link
Contributor

neonene commented Feb 16, 2024

How would one create a cycle?

PEP 573 and type_traverse() says:

Usually, creating a class with ht_module set will create
a reference cycle involving the class and the module.

cpython/Objects/typeobject.c

Lines 5526 to 5534 in 20eaf4d

Py_VISIT(type->tp_dict);
Py_VISIT(type->tp_cache);
Py_VISIT(type->tp_mro);
Py_VISIT(type->tp_bases);
Py_VISIT(type->tp_base);
Py_VISIT(((PyHeapTypeObject *)type)->ht_module);
/* There's no need to visit others because they can't be involved
in cycles:

To make Decimal type a non-GC heap type, at least the module arg needs to be NULL in PyType_FromMetaclass(), like thread_handle_type in _threadmodule?

I agree that PyObject_GC_*() has some overhead, but I'm not sure how to hold a module safely outside Decimal type.

@gpshead
Copy link
Member

gpshead commented Mar 12, 2024

FYI - skrah has been banned from python spaces, quoting details from my message posted on a duplicate-ish issue:
#114922 (comment)

"""
Procedural comment: For all involved on this issue, @skrah was banned from Python spaces per a Conduct Working Group recommendation to the Steering Council in 2020. A configuration oversight on our part in 2020 made that not so on Github. The intended ban of @skrah was reinstated on 2024-02-22. Our apologies to all for the consequences of this mixup (it was the first core dev ban done on our modern systems, we now understand how and why mistakes were made).

Stefan was informed of this privately via email and re-offered the opportunity to apologize for past actions and to accept our code of conduct going forwards per the original 2020 ban's terms and notification. We have not received a reply.

I'm posting this here as we view replying in the same forum where issues recently surfaced a reasonable way to notify past, present, and potential future participants of an incident response.


This says nothing for or against the validity of this issue. A new issue should be opened if people wish to proceed. I'm locking the comments on this one. For information on what the Github concept of a ban means, read the Github org ban docs.

-- The Python Steering Council, 2024 edition
""

@gpshead gpshead closed this as not planned Won't fix, can't repro, duplicate, stale Mar 12, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Subinterpreters Mar 12, 2024
@python python locked as too heated and limited conversation to collaborators Mar 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance Performance or resource usage topic-subinterpreters type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

11 participants