gh-104144: Optimize gather to finish eagerly when all futures complete eagerly #104138

itamaro · 2023-05-03T17:51:44Z

gh-97696 introduced eager tasks factory, which speeds up some async-heavy workloads by up to 50% when opted in.

installing the eager tasks factory applies out-of-the-box when gathering futures (asyncio.gather(...)), e.g.:

asyncio.get_event_loop().set_task_factory(asyncio.eager_task_factory)
await asyncio.gather(coro1, coro2, coro3)

coro{1,2,3} will eagerly execute the first step, and potentially complete without scheduling to the event loop if the coros don't block.

the implementation of eager uses callbacks internally that end up getting scheduled to the event loop even if all the futures were able to finish synchronously, and blocking the coroutine in which gather() was awaited, preventing the task from completing eagerly even if otherwise it could.

applications that use multiple levels of nested gathers can benefit significantly from eagerly completing multiple levels without blocking, as implemented in this PR by skipping scheduling done callbacks for futures that are already done (e.g. finished eagerly).

Benchmarks

this makes the async pyperformance benchmarks up to 3x faster (!!), using a patch to pyperformance that adds "eager" flavors

3.12-base.20230503.async.4.json
===============================

Performance version: 1.0.7
Python version: 3.12.0a7+ (64-bit) revision da1980afcb
Report on Linux-5.15.0-1033-aws-x86_64-with-glibc2.31
Number of logical CPUs: 72
Start date: 2023-05-03 23:27:23.329046
End date: 2023-05-03 23:46:37.706326

3.12-nogf.20230503.async.2.json
===============================

Performance version: 1.0.7
Python version: 3.12.0a7+ (64-bit) revision 5397cd9f62
Report on Linux-5.15.0-1033-aws-x86_64-with-glibc2.31
Number of logical CPUs: 72
Start date: 2023-05-03 23:05:45.011427
End date: 2023-05-03 23:22:44.908094

+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| Benchmark                     | 3.12-base.20230503.async.4.json | 3.12-nogf.20230503.async.2.json | Change       | Significance           |
+===============================+=================================+=================================+==============+========================+
| async_tree_cpu_io_mixed       | 868 ms                          | 859 ms                          | 1.01x faster | Not significant        |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_eager              | 391 ms                          | 129 ms                          | 3.03x faster | Significant (t=209.74) |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_eager_cpu_io_mixed | 756 ms                          | 490 ms                          | 1.54x faster | Significant (t=167.41) |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_eager_io           | 1.51 sec                        | 1.50 sec                        | 1.00x faster | Not significant        |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_eager_memoization  | 595 ms                          | 314 ms                          | 1.89x faster | Significant (t=70.25)  |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_io                 | 1.39 sec                        | 1.40 sec                        | 1.00x slower | Not significant        |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_memoization        | 677 ms                          | 683 ms                          | 1.01x slower | Not significant        |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_none               | 575 ms                          | 574 ms                          | 1.00x faster | Not significant        |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+

Issue: Leverage eager tasks to optimize asyncio gather & TaskGroups further #104144

carljm

This looks pretty straightforward and reasonable to me, but would prefer for asyncio experts to take a look.

Misc/NEWS.d/next/Library/2023-05-03-16-50-24.gh-issue-104144.yNkjL8.rst

Co-authored-by: Carl Meyer <carl@oddbird.net>

jbower-fb

If you look at gather._done_callback() you'll see it has a bunch of logic which gets executed at the moment all the futures have finished (starting from if nfinished == nfuts: on line 781). If all args can complete eagerly then this will be executed eagerly too.

This might have a few issues:

At this point outer will be None and this will cause trouble on line 803. (Maybe I missed something because I'm surprised this hasn't come up yet).
Handling for futures that were cancelled during eager execution is processed but the results discarded.
We inefficiently create a result list which we discard and then repeat when creating the eagerly completed future result.

Fortunately, I think an easy fix is to move creation of the eager result future to before the argument processing loop. See my in-line comments for specifics.

I'm not 100% sure how this will affect the issue described in bpo-46672, but it has a test so we'll see.

Lib/asyncio/tasks.py

jbower-fb · 2023-05-04T01:54:39Z

Lib/asyncio/tasks.py

+        outer = futures.Future(loop=loop)
+        outer.set_result([c.result for c in children])
+    else:
+        outer = _GatheringFuture(children, loop=loop)


Suggested change

outer = _GatheringFuture(children, loop=loop)

outer.__self_log_traceback = False

outer = _GatheringFuture(children, loop=loop)

…e loop, after the GatheringFuture (outer) is created

carljm · 2023-05-05T19:04:03Z

The title no longer accurately describes the updated PR, since the _GatheringFuture is always created.

itamaro · 2023-05-05T19:05:35Z

thanks for the review @jbower-fb !

I pushed a new version of the PR based on your suggestions, but not identical.
summary of my changes:

instead of immediately calling the done callbacks for eagerly completed futures in the loop, I add them to a list and call their callbacks only at the end, after creating the outer future
this helps because now outer is defined the same way, and the children list is fully populated, which is also important because the done callback uses the children list
I always create a GatheringFuture so I get the same cancellation treatment, but it will still finish eagerly if all futures completed eagerly thanks to the last done callback right before returning
there shouldn't be any issues with bpo-46672 since outer looks the same as before

itamaro · 2023-05-05T19:06:53Z

The title no longer accurately describes the updated PR, since the _GatheringFuture is always created.

thanks, I updated the title!

gvanrossum

LGTM, will fix the typo. Maybe @carljm can merge when you all are agreed on this. Nice fix!

Misc/NEWS.d/next/Library/2023-05-03-16-50-24.gh-issue-104144.yNkjL8.rst

gvanrossum · 2023-05-06T14:51:39Z

PS. In general there's no need to click the "Update branch" button (or otherwise merge main back into the PR) unless there are fixes/changes that might affect the PR (e.g. if touching the same file).

…omplete eagerly (python#104138)

AlexWaygood added the topic-asyncio label May 3, 2023

gh-NNNN: Skip creating GatheringFuture if all futures finished eagerly

b3e479a

itamaro force-pushed the asyncio-skip-gathering-future branch from 4a5d42c to b3e479a Compare May 3, 2023 20:57

itamaro changed the title ~~gh-NNNN: Skip creating GatheringFuture if all futures finished eagerly~~ gh-104144: Skip creating GatheringFuture if all futures finished eagerly May 3, 2023

bedevere-bot mentioned this pull request May 3, 2023

Leverage eager tasks to optimize asyncio gather & TaskGroups further #104144

Closed

Add NEWS entry

8eeebaf

itamaro marked this pull request as ready for review May 3, 2023 23:52

itamaro requested review from 1st1, asvetlov, gvanrossum, kumaraditya303 and willingc as code owners May 3, 2023 23:52

carljm reviewed May 4, 2023

View reviewed changes

Misc/NEWS.d/next/Library/2023-05-03-16-50-24.gh-issue-104144.yNkjL8.rst Outdated Show resolved Hide resolved

Apply review suggestion to NEWS entry

9289c9e

Co-authored-by: Carl Meyer <carl@oddbird.net>

jbower-fb suggested changes May 4, 2023

View reviewed changes

bedevere-bot added the awaiting core review label May 4, 2023

itamaro added 2 commits May 5, 2023 11:50

Merge branch 'main' into asyncio-skip-gathering-future

edbb04c

Modify implementation to eagerly call done callbacks at the end of th…

9be6e51

…e loop, after the GatheringFuture (outer) is created

itamaro changed the title ~~gh-104144: Skip creating GatheringFuture if all futures finished eagerly~~ gh-104144: Optimize gather to finish eagerly when all futures complete eagerly May 5, 2023

bedevere-bot added awaiting review and removed awaiting core review labels May 5, 2023

itamaro added 2 commits May 5, 2023 14:14

update inline comment and NEWS entry

250b73c

Merge branch 'main' into asyncio-skip-gathering-future

7df90eb

gvanrossum approved these changes May 6, 2023

View reviewed changes

Misc/NEWS.d/next/Library/2023-05-03-16-50-24.gh-issue-104144.yNkjL8.rst Outdated Show resolved Hide resolved

bedevere-bot added awaiting merge and removed awaiting review labels May 6, 2023

Fix NEWS typo

ce52714

AlexWaygood added the performance Performance or resource usage label May 6, 2023

kumaraditya303 approved these changes May 6, 2023

View reviewed changes

kumaraditya303 enabled auto-merge (squash) May 6, 2023 14:55

kumaraditya303 merged commit 263abd3 into python:main May 6, 2023

bedevere-bot removed the awaiting merge label May 6, 2023

itamaro deleted the asyncio-skip-gathering-future branch May 7, 2023 22:16

jbower-fb pushed a commit to jbower-fb/cpython-jbowerfb that referenced this pull request May 8, 2023

pythongh-104144: Optimize gather to finish eagerly when all futures c…

cb70222

…omplete eagerly (python#104138)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-104144: Optimize gather to finish eagerly when all futures complete eagerly #104138

gh-104144: Optimize gather to finish eagerly when all futures complete eagerly #104138

itamaro commented May 3, 2023 •

edited

Loading

carljm left a comment

jbower-fb left a comment •

edited

Loading

jbower-fb May 4, 2023

carljm commented May 5, 2023

itamaro commented May 5, 2023

itamaro commented May 5, 2023

gvanrossum left a comment

gvanrossum commented May 6, 2023

	outer = _GatheringFuture(children, loop=loop)
	outer.__self_log_traceback = False
	outer = _GatheringFuture(children, loop=loop)

gh-104144: Optimize gather to finish eagerly when all futures complete eagerly #104138

gh-104144: Optimize gather to finish eagerly when all futures complete eagerly #104138

Conversation

itamaro commented May 3, 2023 • edited Loading

Benchmarks

carljm left a comment

Choose a reason for hiding this comment

jbower-fb left a comment • edited Loading

Choose a reason for hiding this comment

jbower-fb May 4, 2023

Choose a reason for hiding this comment

carljm commented May 5, 2023

itamaro commented May 5, 2023

itamaro commented May 5, 2023

gvanrossum left a comment

Choose a reason for hiding this comment

gvanrossum commented May 6, 2023

itamaro commented May 3, 2023 •

edited

Loading

jbower-fb left a comment •

edited

Loading