-
-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Refactor cancellation for great justice #910
Conversation
Relevant to python-trio#886, python-trio#606, python-trio#285, python-trio#147, python-trio#70, python-trio#58, maybe others. I was continuing my effort to shoehorn linked cancel scopes and graceful cancellation into `CancelScope` earlier today and it was feeling too much of a mess, so I decided to explore other options. This PR is the result. It makes major changes to Trio's cancellation internals, but barely any to Trio's cancellation semantics -- all tests pass except for one that is especially persnickety about `cancel_called`. No new tests or docs yet as I wanted to get feedback on the approach before polishing. An overview: * New class `CancelBinding` manages a single lexical context (a `with` block or a task) that might get a different cancellation treatment than its surroundings. "All plumbing, no policy." * Each cancel binding has an effective deadline, a _single_ task, and links to parent and child bindings. Each parent lexically encloses its children. The only cancel bindings with multiple children are the ones immediately surrounding nurseries, and they have one child binding per nursery child task plus maybe one in the nested child. * Each cancel binding calculates its effective deadline based on its parent's effective deadline and some additional data. The actual calculation is performed by an associated `CancelLogic` instance (a small ABC). * `CancelScope` now implements `CancelLogic`, providing the deadline/shield semantics we know and love. It manages potentially-multiple `CancelBinding`s. * Cancel stacks are gone. Instead, each task has an "active" (innermost) cancel binding, which changes as the task moves in and out of cancellation regions. The active cancel binding's effective deadline directly determines whether and when `Cancelled` is raised in the task. * `Runner.deadlines` stores tasks instead of cancel scopes. There is no longer a meaningful state of "deadline is in the past but scope isn't cancelled yet" (this is what the sole failing test doesn't like). If the effective deadline of a task's active cancel binding is non-infinite and in the future, it goes in Runner.deadlines. If it's in the past, the task has a pending cancellation by definition. Potential advantages: * Cancellation becomes extensible without changes to _core, via users writing their own CancelLogic and wrapping a core CancelBinding(s) around it. We could even move CancelScope out of _core if we want to make a point. * Nursery.start() is much simpler. * Splitting shielding into a separate object from cancellation becomes trivial (they'd be two kinds of CancelLogic). * Most operations that are performed frequently take constant time: checking whether you're cancelled, checking what your deadline is, entering and leaving a cancel binding. I haven't benchmarked, so it's possible we're losing on constant factors or something, but in theory this should be faster than the old approach. * Since tasks now have well-defined root cancel bindings, I think python-trio#606 becomes straightforward via providing a way to spawn a system task whose cancel binding is a child of something other than the system nursery's cancel binding. Caveats: * We call `current_time()` a lot. Not sure if this is worth worrying about, and could probably be cached if so. * There are probably bugs, because aren't there always? Current cancel logic: ``` def compute_effective_deadline( self, parent_effective_deadline, parent_extra_info, task ): incoming_deadline = inf if self._shield else parent_effective_deadline my_deadline = -inf if self._cancel_called else self._deadline return min(incoming_deadline, my_deadline), parent_extra_info ``` Want to support a grace period? I'm pretty sure it would work with something like ``` def compute_effective_deadline( self, parent_effective_deadline, parent_extra_info, task ): parent_cleanup_deadline = parent_extra_info.get("effective_cleanup_deadline", parent_effective_deadline) if self._shield: parent_effective_deadline = parent_cleanup_deadline = inf my_cleanup_start = min(self._deadline, self._cancel_called_at) merged_cleanup_deadline = min(parent_cleanup_deadline, my_cleanup_start + self._grace_period) my_extra_info = parent_extra_info.set("effective_cleanup_deadline", merged_cleanup_deadline) if self._shield_during_cleanup: effective_deadline = merged_cleanup_deadline else: effective_deadline = min(parent_effective_deadline, my_cleanup_start) return effective_deadline, my_extra_info ``` Maybe that's not quite _simple_ but it is miles better than what I was looking at before. :-)
Some more thoughts on this after sleeping on it:
then the shield means the cancellation gets caught by |
As usual for my first comment on these large diffs, I have spent a bunch of time thinking about the high-level idea but haven't actually read the diff, so apologies if anything I say is way off. It took me a while to wrap my head around it, but I think the basic idea of having separate objects that track the aggregate cancel status for each extent is pretty good. But, I think it's probably simpler and more performant to keep deadlines separate from the cancel status. So what I'm imagining is something like:
There are two reasons I like this overall approach:
What do you think of this, at a high level? A few more thoughts: Here's an idea for managing deadline entries in the global table: we only put
I hope at this point you've heard enough of my design philosophy that you won't be too surprised when I say this sounds like a negative to me, not a positive :-). (But whatever, we don't have to expose it even if it is nice and clean.)
Yes. Hallelujah. |
Many thanks for the detailed feedback!
That was exactly what I was hoping for in this case - the code is still pretty hacked together at this point. The system you've detailed is uncannily close to the one I was imagining / tried to implement, and many of the differences (especially wrt tracking cancelled-ness explicitly as opposed to well-I-guess-this-deadline-is-in-the-past-ly) are changes I was planning to make as well. (Your
I think tracking graceful cancellation will require more state than this. For example:
So far I've been imagining each cancel status tracking a tuple (effective deadline, effective cleanup deadline), and the type of cancel status inserted by Apart from that detail I really like this approach, and I agree with you about the simplicity of reasoning and the algorithmic probably-ideality.
This is the same thing we're doing for
I suspected you would say that, but I had to try :P. I'm fine not exposing the flexibility; I think even having the internal architecture be flexible will make it a lot easier to prototype improvements to the cancellation system in the future. |
Well, this is borrowing trouble, since we don't know if we'll implement soft cancels yet :-). But my intuition would be: if we do, then we extend
Yeah, I guess I just thought of it because with the PR's approach, using the deadline itself to track cancellation, then you sort of automagically get the deadline/cancel-status in sync all the time, so I was wondering if there was any way to keep that on the outside while using simpler state tracking internally.
Yeah, for sure. |
This synthesizes the ideas that arose in the discussion on python-trio#910. Each CancelScope `with` block now creates a CancelStatus object (not exposed publicly); the CancelStatus objects know their parent/child relationships in the lexical nesting tree of CancelScope contexts, and communicate to propagate cancellation information eagerly. The upshot is that the question "is this task in a cancelled scope right now?" can now be answered in O(1) time, eliminating a notable inefficiency in Trio's run loop. As a nice side benefit, manipulations of the cancellation tree such as are required by `nursery.start()` become much easier to reason about.
This synthesizes the ideas that arose in the discussion on python-trio#910. Each CancelScope `with` block now creates a CancelStatus object (not exposed publicly); the CancelStatus objects know their parent/child relationships in the lexical nesting tree of CancelScope contexts, and communicate to propagate cancellation information eagerly. The upshot is that the question "is this task in a cancelled scope right now?" can now be answered in O(1) time, eliminating a notable inefficiency in Trio's run loop. As a nice side benefit, manipulations of the cancellation tree such as are required by `nursery.start()` become much easier to reason about.
Closing this in favor of the more polished and mild-mannered #958. |
Relevant to #886, #606, #285, #147, #70, #58, maybe others.
I was continuing my effort to shoehorn linked cancel scopes and graceful cancellation into
CancelScope
earlier today and it was feeling too much of a mess, so I decided to explore other options. This PR is the result. It makes major changes to Trio's cancellation internals, but barely any to Trio's cancellation semantics -- all tests pass except for one that is especially persnickety aboutcancel_called
. No new tests or docs yet as I wanted to get feedback on the approach before polishing.An overview:
CancelBinding
manages a single lexical context (awith
block or a task) that might get a different cancellation treatment than its surroundings. "All plumbing, no policy."CancelLogic
instance (a small ABC).CancelScope
now implementsCancelLogic
, providing the deadline/shield semantics we know and love. It manages potentially-multipleCancelBinding
s.Cancelled
is raised in the task.Runner.deadlines
stores tasks instead of cancel scopes. There is no longer a meaningful state of "deadline is in the past but scope isn't cancelled yet" (this is what the sole failing test doesn't like). If the effective deadline of a task's active cancel binding is non-infinite and in the future, it goes in Runner.deadlines. If it's in the past, the task has a pending cancellation by definition.Potential advantages:
Caveats:
current_time()
rather a lot. Not sure if this is worth worrying about, and could probably be cached if so.Current core cancel logic expressed in the new system:
I think the extension to support a grace period would just be:
Maybe that's not quite simple but it is miles better than what I was looking at before. :-)