Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executors might ignore instrumentation. #109369

Closed
markshannon opened this issue Sep 13, 2023 · 3 comments · Fixed by #111657
Closed

Executors might ignore instrumentation. #109369

markshannon opened this issue Sep 13, 2023 · 3 comments · Fixed by #111657
Labels
3.13 bugs and security fixes deferred-blocker type-bug An unexpected behavior, bug, or error

Comments

@markshannon
Copy link
Member

markshannon commented Sep 13, 2023

Bug report

Code handed to the optimizer may not include instrumentation. If instrumentation is added later, the executor does not see it.
We remove all ENTER_EXECUTORS when instrumenting, but that doesn't fix the problem of executors that are still running.

Example

A loop that calls foo:

while large_number > 0:
    foo()
    large_number -= 1

The loop gets turned into an executor, and sometime before large_number reaches zero, a debugger gets attached and in some callee of foo turns on monitoring of calls. We would expect all subsequent calls to foo() to be monitored, but they will not be, as the executor knows nothing about the call to foo() being instrumented.

Let's not rely on the executor/optimizer to handle this.

We could add a complex de-optimization strategy, so that executors are invalidated when instrumentation occurs, but that is the sort of thing we want to be doing for maximum performance, not for correctness.

It is much safer, and ultimately no slower (once we implement fancy de-optimization) to add extra checks for instrumentation, so that unoptimized traces are correct.

The solution

The solution is quite simple, add a check for instrumentation after every call.

We can make this less slow (and less simple) by combining the eval-breaker check and instrumentation check into one. This makes the check slightly more complex, but should speed up RESUME by less than it slows down every call as every call already has an eval-breaker check.

Combining the two checks reducing the number of bits for versions from 64 to 24 (we need the other bits for eval-breaker, gc, async exceptions, etc). A 64 bit number never overflows (computers don't last long enough), but 24 bit numbers do.

This will have three main effects, beyond the hopefully very small performance impact for all code:

  • It removes the need for a complete call stack scan of all threads when instrumenting.
  • Tools that set or change monitoring many times will see significant performance improvements as we no longer need to traverse all stacks to re-instrument the whole call stack whenever monitoring changes
  • Tools that set or change monitoring many millions of times will break, as we run out of versions. It is likely that these tools were already broken or had performance so bad as to be totally unusable.

Making the check explicit in the bytecode.

We can simplify all the call instructions by removing the CHECK_EVAL_BREAKER check at the end and adding an explicit RESUME instruction after every CALL

Although this has the disadvantage of making the bytecode larger and adding dispatch overhead, it does have the following advantages:

  • Allows tier 1 to optimize the RESUME to RESUME_CHECK which might be cancel out the additional dispatch overhead.
  • Makes the check explicit, making it feasible for tier 2 optimizations to remove it.

Linked PRs

@markshannon
Copy link
Member Author

markshannon commented Oct 5, 2023

Because instrumentation can turned on in a finalizer and each Py_DECREF() could potentially turn on instrumentation, it actually makes sense to rely on de-optimization machinery, as it too easy to miss a check otherwise.

With that in mind, we need to do the following:

  • Implement valid flag to executors, and machinery to unset it if a dependency changes.
  • Add micro-op to check validity bit and insert it after every impure instruction
  • Optimize executor implementation, improving hash functions and store in a tree, not a list.

@gvanrossum
Copy link
Member

Do I read between the lines that we are not proceeding with the plan to reimplement DECREF by putting the objects to be deleted in some kind of queue that is processed e.g. when the eval breaker goes off?

@markshannon
Copy link
Member Author

That is still the plan, but we don't want to be relying on it for correctness.

Regardless of whether Py_DECREF can have arbitrary side effects or not, there will be a set of micro-ops that can have arbitrary side effects, so we need to handle them.

faster-cpython/ideas#582 will make that set of micro-ops smaller, meaning fewer guards and better optimizations.

markshannon added a commit that referenced this issue Oct 23, 2023
FullteaR pushed a commit to FullteaR/cpython that referenced this issue Nov 3, 2023
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes deferred-blocker type-bug An unexpected behavior, bug, or error
Projects
Development

Successfully merging a pull request may close this issue.

2 participants