-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-124309: fix staggered race on eager tasks #124847
gh-124309: fix staggered race on eager tasks #124847
Conversation
Co-Authored-By: Peter Bierma <zintensitydev@gmail.com>
…ing work on child tasks
I'm marking this as |
Hopefully you will test with a range of aiohttp versions -- the previous version of this PR seemed to work on some but not on others. |
Yeah, that's the idea. We might not even need this PR if their new implementation turns out to work perfectly. |
Don't we still need to do something to make eager tasks work? Or does that problem only occur in combination with aiohttp? Presumably there's at least one previous aiohttp version where the current main (or 3.12) branch doesn't work with aiohttp (or I don't understand the issue). |
Our implementation (on main and 3.12) doesn't work with eager task factories, and most aiohttp versions (apart from the latest, IIUC, because they made their own) rely on our implementation. |
So we definitely have to change something right? |
Yup, I'm just cautious because I don't want to break prior aiohttp versions on accident. |
running_tasks.append(first_task) | ||
ok_to_start.set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first one, should the event be preset so it doesn't negate the benefits of an eager task factory?
Or maybe make the event optional for the first one since we want it to start right away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it's still needed because all the coro_fn can return immediately without awaiting anything resulting in none of the tasks being appended to running_tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not terribly important to maintain the benefits of an eager task factory here as it's supposed to be used for happy_eyeballs over the internet where there is always going to be some delay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was only thinking about the first one since there is a chance the first ip could connect right away (not sure if it can happen synchronously or not)
In aiohttp we don't know if the host is an internet or lan or local host so everything ends up going through this path.
It's also possible the user connecting to localhost has working ipv4 but broken IPv6 and 127.0.0.1 will connect right away and ::1 never will or vise versa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even connecting to a localhost socket raises BlockingIOError and asyncio needs to wait for it to be writable
Thanks for the PR. I can test in production later this week |
we should copy and adapt the aiohttp tests into cpython |
sorry I pressed the wrong button |
Feel free to copy and adapt any tests in aiohappyeyeballs and consider their relicensed as needed for cpython |
I can get to porting aiohttp's pytest stuff over to us sometime later this week. |
The new implementation we have in aiohappyeyeballs seems to be going well. No new issues for aiohappyeyeballs filed, Home Assistant release went well, and my testing went well. I think I’m ready to start testing this PR today I’ll start by running aiohappyeyeballs test suite against it and if it passes try to write a meaningful benchmark to compare the implementations |
@bdraco how is this playing with aiohttp? |
I didn't test it in production yet because of the comments I posted above: It looks like there is a race where tasks can be garbage collected before they return a result: |
Ok, good to know. |
FWIW it's a different systemic issue that we need to fix asap (and backport to 3.13) IMO this PR shouldn't solve that specific problem and focus on a smaller change. |
@@ -139,3 +139,9 @@ async def staggered_race(coro_fns, delay, *, loop=None):
# Make sure no tasks are left running if we leave this function
for t in running_tasks:
t.cancel()
+ try:
+ await t
+ except (SystemExit, KeyboardInterrupt):
+ raise
+ except BaseException:
+ pass That fixes the warning but still the aiohappyeyeballs cancellation swallow check test is failing |
Yes awaiting a task and suppressing cancellation will do that. You need to use a loop and a future. But we can fix that in another PR |
If aiohttp's implementation has been tested in production (and has all their tests passing), why not just copy theirs over? |
Can you give a link to the final diff of their fix? I can't find it in the few links to different issues I clicked through this discussion. |
Hm, tbqh I think the approach in this PR is more straightforward. Unless you have a specific concern I'd go with it. @graingert feel free to go ahead and merge. |
We do need to backport this though. |
I can't merge, I'll need a core dev to do it for me |
I'm worried about the failing tests on aiohttp's end, but I think Thomas said that he'll address that in another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go ahead and approve this to ease stomachs about merging :)
Apologies, I'll go ahead and click the button then. |
Thanks @graingert for the PR, and @1st1 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
This patch is entirely by Thomas and Peter (cherry picked from commit 979c0df) Co-authored-by: Thomas Grainger <tagrain@gmail.com> Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
This patch is entirely by Thomas and Peter (cherry picked from commit 979c0df) Co-authored-by: Thomas Grainger <tagrain@gmail.com> Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
GH-125339 is a backport of this pull request to the 3.13 branch. |
GH-125340 is a backport of this pull request to the 3.12 branch. |
Co-Authored-By: Peter Bierma zintensitydev@gmail.com