gh-91887: Store strong references to pending tasks #121264

alexhartl · 2024-07-02T12:46:55Z

This adds a _pending_tasks set to BaseEventLoop. On Task creation, a (strong) reference to the task is added to this set in _register_task. When a task completes, the respective reference is removed from _pending_tasks in _unregister_task. See the discussion at #91887.

Issue: asyncio: Use strong references for free-flying tasks #91887

cpython-cla-bot · 2024-07-02T12:46:57Z

All commit authors signed the Contributor License Agreement.

ZeroIntensity

Very nice PR! Just a few nitpicks.

Misc/NEWS.d/next/Library/2024-07-02-14-07-32.gh-issue-91887.eWzc5E.rst

ZeroIntensity · 2024-07-10T22:15:54Z

Lib/asyncio/futures.py

        return True

-    def __schedule_callbacks(self):
-        """Internal: Ask the event loop to call all callbacks.
+    def _finish_execution(self):


For clarification, what's the reason for this name change? This is still, effectively, scheduling all callbacks -- "finish execution" is a bit more ambiguous to me.

Yes, in Future, "_schedule_callbacks" is perfectly fine to describe what the function does. When overriding this function in Task, I think _finish_execution is more meaningful to indicate that it will be called when the task completes.

ZeroIntensity · 2024-07-10T22:25:59Z

Lib/test/test_asyncio/test_tasks.py

It might be a good idea to add a test to make sure that this actually fixes #91887, to make sure that someone doesn't accidentally break this in the future.

Likely would be something like:

def test_strong_task_references(self): called = False async def coro(): nonlocal called called = True async def main(): asyncio.create_task(coro()) loop = asyncio.new_event_loop() try: loop.run_until_complete(main()) finally: loop.close() self.assertTrue(called)

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>

ZeroIntensity

LGTM

1st1 · 2024-09-12T03:29:19Z

I can ponder on this during the core sprint (and think how this will play with uvloop).

freakboy3742 · 2024-09-27T16:53:03Z

@alexhartl I'm at the CPython core team sprint, so I've taken the liberty of merging with main so I can discuss this with @1st1 and others. There were some conflicts introduced as a result of #120974.

alexhartl · 2024-09-27T20:28:35Z

Thank you for picking this up @freakboy3742 ! Yes, the last time I checked, the asyncio C code appeared to be in the middle of some restructuring. Let me know if I can help with anything.

Store strong references into a global set as well. Hopefully can get removed one day, as python/cpython#121264 was merged just this week :)

1st1

I'm blocking this PR from being merged to ponder on it (feel free to dismiss the block in a week). For one I'm really not sure I like the event loop to have new APIs, I think this is better be solved at asyncio level in an event loop independent way.

cc @ambv @pablogsal

bedevere-app · 2024-10-09T03:00:46Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

1st1 · 2024-10-09T03:05:53Z

@alexhartl have you considered just making _scheduled_tasks a regular set instead of using weakref.WeakSet()? What's the downside of that? IMO it doesn't matter much where a strong reference is stored -- in the event loop or in the asyncio module. Either way a proper cleanup is required, but using a regular set would be a trivial change and everything in the ecosystem would just work.

gvanrossum · 2024-10-09T05:01:03Z

@alexhartl have you considered just making _scheduled_tasks a regular set instead of using weakref.WeakSet()? What's the downside of that? IMO it doesn't matter much where a strong reference is stored -- in the event loop or in the asyncio module. Either way a proper cleanup is required, but using a regular set would be a trivial change and everything in the ecosystem would just work.

@1st1 But that would keep the awkward design where all loops share the global "all tasks" set, which is not how it's ever used. (I believe this design was just a historical accident; I've never found an explanation.)

1st1 · 2024-10-09T18:52:01Z

@1st1 But that would keep the awkward design where all loops share the global "all tasks" set, which is not how it's ever used. (I believe this design was just a historical accident; I've never found an explanation.)

Yeah, but what's awkward about it? There's a low level API that manages the all tasks set and other event loops (at least uvloop) already use that API. Adding yet another tracking API for running tasks is harder than switching the tracking API we already have to just use a regular set. The additional tracking will only introduce additional, albeit minuscule, overhead. At least this is how I see this. I do think it would be quite elegant to make the existing APIs asyncio._register_task and asyncio._unregister_task have some additional functionality.

Adding new API (like what this PR is doing) means that other loops will have to always implement it (or be broken). Which I obviously can do for uvloop, quite easily, but it will grow the API surface which is already huge.

Lastly, a minor point: event loop doesn't have a lot to do with tasks. The loop is mostly concerned with running callbacks. Task is a self-contained primitive that just schedules callbacks to the event loop. So it rubs me the wrong way to introduce tracking to the loop for Tasks, I believe a global threadlocal mapping is a better solution, which is what "all tasks" can be.

alexhartl · 2024-10-10T11:19:32Z

I'd prefer to avoid holding strong references in global scope to reduce the potential for memory leaks. I.e. the loop owns these references and we can be sure that everything is cleaned up at latest when the loop is destructed.

Also, the modifications that @kumaraditya303 did lately were a lot about performance improvements. Registering all tasks in a dict, and unregistering in a done callback is quite likely to negate these improvements to some extent, no matter where this dict is stored. Having this set and weakset separate might give us the opportunity to make this behaviour optional, so that the user is able to retain the performance improvements.

Adding new API (like what this PR is doing) means that other loops will have to always implement it (or be broken). Which I obviously can do for uvloop, quite easily, but it will grow the API surface which is already huge.

Yes, true.

Lastly, a minor point: event loop doesn't have a lot to do with tasks. The loop is mostly concerned with running callbacks. Task is a self-contained primitive that just schedules callbacks to the event loop. So it rubs me the wrong way to introduce tracking to the loop for Tasks, I believe a global threadlocal mapping is a better solution, which is what "all tasks" can be.

Technically, that is true. But from a user's perspective it's the loop that basically represents the state of asyncio. I think it would make a lot of sense if tasks are owned by this loop.

1st1 · 2024-10-11T23:56:37Z

I'd prefer to avoid holding strong references in global scope to reduce the potential for memory leaks. I.e. the loop owns these references and we can be sure that everything is cleaned up at latest when the loop is destructed.

IMO this isn't a strong argument. Actual applications rarely have more than one event loop during the whole program run (be it a short program or a web server). Now, if there's a bug not cleaning up tasks properly, then the bug is better to be actually fixed, otherwise the fact that event loop clears its tasks won't help at all.

Also, the modifications that @kumaraditya303 did lately were a lot about performance improvements.

What are those modifications? Link?

Registering all tasks in a dict, and unregistering in a done callback is quite likely to negate these improvements to some extent, no matter where this dict is stored.

This is such a minor micro-performance thing in a context of the relatively heavy cost of 'await' and the Task abstraction itself that IMO it's not worth talking about it. Adding/removing a Task from a dict will not be detectable even in a micro-benchmark, it will be below noise floor.

Technically, that is true. But from a user's perspective it's the loop that basically represents the state of asyncio. I think it would make a lot of sense if tasks are owned by this loop.

I see the logic here, but I still think that adding this API to the loop does not make sense in light of other APIs to track tasks that already exist in asyncio: _register_task(task), _enter_task(task).

Also I find adding _-leading APIs to the event loop inelegant.

So bottom line, I'm -1 on merging this PR as is. Please let's work together to re-align it and make it better fit with the existing asyncio APIs.

I suggest we keep in place the part of this PR that deterministically calls _unregister_task. _scheduled_tasks becomes a set(). That's it.

alexhartl · 2024-12-01T10:39:03Z

Sure. Are there other opinions? Otherwise, I can create a PR on this suggestion.

alexhartl added 2 commits July 2, 2024 10:00

Store strong references to pending tasks

c229526

NEWS, ACKs

bbdbce1

alexhartl requested review from 1st1, asvetlov, gvanrossum, kumaraditya303 and willingc as code owners July 2, 2024 12:46

bedevere-app bot mentioned this pull request Jul 2, 2024

asyncio: Use strong references for free-flying tasks #91887

Open

bedevere-app bot added the awaiting review label Jul 2, 2024

kumaraditya303 added the DO-NOT-MERGE label Jul 3, 2024

ZeroIntensity requested changes Jul 10, 2024

View reviewed changes

bedevere-app bot added awaiting core review and removed awaiting review labels Jul 10, 2024

alexhartl and others added 2 commits July 11, 2024 08:30

Improved news item

9a5ada0

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>

Add test code to ensure python#91887 is fixed

5d2b57e

ZeroIntensity approved these changes Jul 12, 2024

View reviewed changes

hugovk added the sprint label Sep 12, 2024

Merge branch 'main' into strong-refs-for-bg-tasks

eb89d4c

freakboy3742 force-pushed the strong-refs-for-bg-tasks branch from 9c5073b to eb89d4c Compare September 27, 2024 16:54

1st1 requested changes Oct 9, 2024

View reviewed changes

bedevere-app bot removed the awaiting core review label Oct 9, 2024

bedevere-app bot added the awaiting changes label Oct 9, 2024

1st1 mentioned this pull request Oct 9, 2024

gh-124309: fix staggered race on eager tasks #124847

Merged

ZeroIntensity mentioned this pull request Oct 20, 2024

Adding an internal C-API for dynamic arrays #125543

Closed

jakkdl mentioned this pull request Nov 17, 2024

New asyncio rule: directly passing coroutine to gather, shield, wait_for, wait, or as_completed python-trio/flake8-async#319

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-91887: Store strong references to pending tasks #121264

gh-91887: Store strong references to pending tasks #121264

alexhartl commented Jul 2, 2024 •

edited by bedevere-app bot

Loading

cpython-cla-bot bot commented Jul 2, 2024 •

edited

Loading

ZeroIntensity left a comment

ZeroIntensity Jul 10, 2024

alexhartl Jul 11, 2024

ZeroIntensity Jul 10, 2024

ZeroIntensity left a comment

1st1 commented Sep 12, 2024

freakboy3742 commented Sep 27, 2024

alexhartl commented Sep 27, 2024

1st1 left a comment •

edited

Loading

bedevere-app bot commented Oct 9, 2024

1st1 commented Oct 9, 2024

gvanrossum commented Oct 9, 2024

1st1 commented Oct 9, 2024

alexhartl commented Oct 10, 2024 •

edited

Loading

1st1 commented Oct 11, 2024

alexhartl commented Dec 1, 2024

gh-91887: Store strong references to pending tasks #121264

Are you sure you want to change the base?

gh-91887: Store strong references to pending tasks #121264

Conversation

alexhartl commented Jul 2, 2024 • edited by bedevere-app bot Loading

cpython-cla-bot bot commented Jul 2, 2024 • edited Loading

ZeroIntensity left a comment

Choose a reason for hiding this comment

ZeroIntensity Jul 10, 2024

Choose a reason for hiding this comment

alexhartl Jul 11, 2024

Choose a reason for hiding this comment

ZeroIntensity Jul 10, 2024

Choose a reason for hiding this comment

ZeroIntensity left a comment

Choose a reason for hiding this comment

1st1 commented Sep 12, 2024

freakboy3742 commented Sep 27, 2024

alexhartl commented Sep 27, 2024

1st1 left a comment • edited Loading

Choose a reason for hiding this comment

bedevere-app bot commented Oct 9, 2024

1st1 commented Oct 9, 2024

gvanrossum commented Oct 9, 2024

1st1 commented Oct 9, 2024

alexhartl commented Oct 10, 2024 • edited Loading

1st1 commented Oct 11, 2024

alexhartl commented Dec 1, 2024

alexhartl commented Jul 2, 2024 •

edited by bedevere-app bot

Loading

cpython-cla-bot bot commented Jul 2, 2024 •

edited

Loading

1st1 left a comment •

edited

Loading

alexhartl commented Oct 10, 2024 •

edited

Loading