Executors might ignore instrumentation. #109369

markshannon · 2023-09-13T09:28:41Z

Bug report

Code handed to the optimizer may not include instrumentation. If instrumentation is added later, the executor does not see it.
We remove all ENTER_EXECUTORS when instrumenting, but that doesn't fix the problem of executors that are still running.

Example

A loop that calls foo:

while large_number > 0:
    foo()
    large_number -= 1

The loop gets turned into an executor, and sometime before large_number reaches zero, a debugger gets attached and in some callee of foo turns on monitoring of calls. We would expect all subsequent calls to foo() to be monitored, but they will not be, as the executor knows nothing about the call to foo() being instrumented.

Let's not rely on the executor/optimizer to handle this.

We could add a complex de-optimization strategy, so that executors are invalidated when instrumentation occurs, but that is the sort of thing we want to be doing for maximum performance, not for correctness.

It is much safer, and ultimately no slower (once we implement fancy de-optimization) to add extra checks for instrumentation, so that unoptimized traces are correct.

The solution

The solution is quite simple, add a check for instrumentation after every call.

We can make this less slow (and less simple) by combining the eval-breaker check and instrumentation check into one. This makes the check slightly more complex, but should speed up RESUME by less than it slows down every call as every call already has an eval-breaker check.

Combining the two checks reducing the number of bits for versions from 64 to 24 (we need the other bits for eval-breaker, gc, async exceptions, etc). A 64 bit number never overflows (computers don't last long enough), but 24 bit numbers do.

This will have three main effects, beyond the hopefully very small performance impact for all code:

It removes the need for a complete call stack scan of all threads when instrumenting.
Tools that set or change monitoring many times will see significant performance improvements as we no longer need to traverse all stacks to re-instrument the whole call stack whenever monitoring changes
Tools that set or change monitoring many millions of times will break, as we run out of versions. It is likely that these tools were already broken or had performance so bad as to be totally unusable.

Making the check explicit in the bytecode.

We can simplify all the call instructions by removing the CHECK_EVAL_BREAKER check at the end and adding an explicit RESUME instruction after every CALL

Although this has the disadvantage of making the bytecode larger and adding dispatch overhead, it does have the following advantages:

Allows tier 1 to optimize the RESUME to RESUME_CHECK which might be cancel out the additional dispatch overhead.
Makes the check explicit, making it feasible for tier 2 optimizations to remove it.

Linked PRs

The text was updated successfully, but these errors were encountered:

…ne word. (GH-109846)

markshannon · 2023-10-05T10:17:31Z

Because instrumentation can turned on in a finalizer and each Py_DECREF() could potentially turn on instrumentation, it actually makes sense to rely on de-optimization machinery, as it too easy to miss a check otherwise.

With that in mind, we need to do the following:

Implement valid flag to executors, and machinery to unset it if a dependency changes.
Add micro-op to check validity bit and insert it after every impure instruction
Optimize executor implementation, improving hash functions and store in a tree, not a list.

gvanrossum · 2023-10-05T21:13:16Z

Do I read between the lines that we are not proceeding with the plan to reimplement DECREF by putting the objects to be deleted in some kind of queue that is processed e.g. when the eval breaker goes off?

markshannon · 2023-10-06T09:47:40Z

That is still the plan, but we don't want to be relying on it for correctness.

Regardless of whether Py_DECREF can have arbitrary side effects or not, there will be a set of micro-ops that can have arbitrary side effects, so we need to handle them.

faster-cpython/ideas#582 will make that set of micro-ops smaller, meaning fewer guards and better optimizations.

…idually and globally. (GH-110384)

… individually and globally. (pythonGH-110384)

…into one word. (pythonGH-109846)

… individually and globally. (pythonGH-110384)

markshannon added type-bug An unexpected behavior, bug, or error deferred-blocker 3.13 bugs and security fixes labels Sep 13, 2023

ezio-melotti added this to Release and Deferred blockers 🚫 Sep 13, 2023

github-project-automation bot moved this to Todo in Release and Deferred blockers 🚫 Sep 13, 2023

bedevere-app bot mentioned this issue Sep 25, 2023

GH-109369: Merge all eval-breaker flags and monitoring version into one word. #109846

Merged

bedevere-app bot mentioned this issue Oct 4, 2023

GH-109369: Add machinery for deoptimizing tier2 executors, both individually and globally. #110358

Closed

markshannon added a commit that referenced this issue Oct 4, 2023

GH-109369: Merge all eval-breaker flags and monitoring version into o…

bf4bc36

…ne word. (GH-109846)

markshannon mentioned this issue Oct 5, 2023

GH-109369: Add machinery for deoptimizing tier2 executors, both individually and globally. #110384

Merged

markshannon added a commit that referenced this issue Oct 23, 2023

GH-109369: Add machinery for deoptimizing tier2 executors, both indiv…

52e902c

…idually and globally. (GH-110384)

markshannon mentioned this issue Nov 2, 2023

GH-109369 Add vectorcall to PyLong_Type #111642

Merged

markshannon added a commit that referenced this issue Nov 2, 2023

GH-109369 Add vectorcall to PyLong_Type (GH-111642)

0887b9c

markshannon mentioned this issue Nov 2, 2023

GH-109369: Exit tier 2 if executor is invalid #111657

Merged

FullteaR pushed a commit to FullteaR/cpython that referenced this issue Nov 3, 2023

pythonGH-109369 Add vectorcall to PyLong_Type (pythonGH-111642)

545f2be

markshannon closed this as completed in #111657 Nov 9, 2023

github-project-automation bot moved this from Todo to Done in Release and Deferred blockers 🚫 Nov 9, 2023

markshannon added a commit that referenced this issue Nov 9, 2023

GH-109369: Exit tier 2 if executor is invalid (GH-111657)

25c4956

aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024

pythonGH-109369: Add machinery for deoptimizing tier2 executors, both…

e3b4314

… individually and globally. (pythonGH-110384)

aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024

pythonGH-109369 Add vectorcall to PyLong_Type (pythonGH-111642)

0b8ac7f

aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024

pythonGH-109369: Exit tier 2 if executor is invalid (pythonGH-111657)

93cd550

Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024

pythonGH-109369: Merge all eval-breaker flags and monitoring version …

72b9939

…into one word. (pythonGH-109846)

Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024

pythonGH-109369: Add machinery for deoptimizing tier2 executors, both…

d1bb94d

… individually and globally. (pythonGH-110384)

Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024

pythonGH-109369 Add vectorcall to PyLong_Type (pythonGH-111642)

8abfed7

Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024

pythonGH-109369: Exit tier 2 if executor is invalid (pythonGH-111657)

3772ae9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Executors might ignore instrumentation. #109369

Executors might ignore instrumentation. #109369

markshannon commented Sep 13, 2023 •

edited by bedevere-app bot

Loading

markshannon commented Oct 5, 2023 •

edited

Loading

gvanrossum commented Oct 5, 2023

markshannon commented Oct 6, 2023

Executors might ignore instrumentation. #109369

Executors might ignore instrumentation. #109369

Comments

markshannon commented Sep 13, 2023 • edited by bedevere-app bot Loading

Bug report

Example

Let's not rely on the executor/optimizer to handle this.

The solution

Making the check explicit in the bytecode.

Linked PRs

markshannon commented Oct 5, 2023 • edited Loading

gvanrossum commented Oct 5, 2023

markshannon commented Oct 6, 2023

markshannon commented Sep 13, 2023 •

edited by bedevere-app bot

Loading

markshannon commented Oct 5, 2023 •

edited

Loading