Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-124309: fix staggered race on eager tasks #124847

Merged
merged 5 commits into from
Oct 11, 2024

Conversation

graingert
Copy link
Contributor

@graingert graingert commented Oct 1, 2024

graingert and others added 2 commits October 1, 2024 14:39
Co-Authored-By: Peter Bierma <zintensitydev@gmail.com>
@graingert graingert changed the title restore tests from reverted commits gh-124309: fix staggered race on eager tasks Oct 1, 2024
@ZeroIntensity
Copy link
Member

I'm marking this as DO-NOT-MERGE for the time being. aiohttp is planning on trying this sometime this week.

@gvanrossum
Copy link
Member

aiohttp is planning on trying this sometime this week

Hopefully you will test with a range of aiohttp versions -- the previous version of this PR seemed to work on some but not on others.

@ZeroIntensity
Copy link
Member

Yeah, that's the idea. We might not even need this PR if their new implementation turns out to work perfectly.

@gvanrossum
Copy link
Member

Don't we still need to do something to make eager tasks work? Or does that problem only occur in combination with aiohttp? Presumably there's at least one previous aiohttp version where the current main (or 3.12) branch doesn't work with aiohttp (or I don't understand the issue).

@ZeroIntensity
Copy link
Member

Our implementation (on main and 3.12) doesn't work with eager task factories, and most aiohttp versions (apart from the latest, IIUC, because they made their own) rely on our implementation.

@gvanrossum
Copy link
Member

Our implementation (on main and 3.12) doesn't work with eager task factories, and most aiohttp versions (apart from the latest, IIUC, because they made their own) rely on our implementation.

So we definitely have to change something right?

@ZeroIntensity
Copy link
Member

Yup, I'm just cautious because I don't want to break prior aiohttp versions on accident.

running_tasks.append(first_task)
ok_to_start.set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first one, should the event be preset so it doesn't negate the benefits of an eager task factory?

Or maybe make the event optional for the first one since we want it to start right away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's still needed because all the coro_fn can return immediately without awaiting anything resulting in none of the tasks being appended to running_tasks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not terribly important to maintain the benefits of an eager task factory here as it's supposed to be used for happy_eyeballs over the internet where there is always going to be some delay

Copy link
Contributor

@bdraco bdraco Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only thinking about the first one since there is a chance the first ip could connect right away (not sure if it can happen synchronously or not)

In aiohttp we don't know if the host is an internet or lan or local host so everything ends up going through this path.

It's also possible the user connecting to localhost has working ipv4 but broken IPv6 and 127.0.0.1 will connect right away and ::1 never will or vise versa

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even connecting to a localhost socket raises BlockingIOError and asyncio needs to wait for it to be writable

@bdraco
Copy link
Contributor

bdraco commented Oct 2, 2024

Thanks for the PR. I can test in production later this week

@graingert
Copy link
Contributor Author

we should copy and adapt the aiohttp tests into cpython

@graingert graingert closed this Oct 2, 2024
@graingert
Copy link
Contributor Author

sorry I pressed the wrong button

@graingert graingert reopened this Oct 2, 2024
@bdraco
Copy link
Contributor

bdraco commented Oct 2, 2024

Feel free to copy and adapt any tests in aiohappyeyeballs and consider their relicensed as needed for cpython

@ZeroIntensity
Copy link
Member

I can get to porting aiohttp's pytest stuff over to us sometime later this week.

@bdraco
Copy link
Contributor

bdraco commented Oct 4, 2024

The new implementation we have in aiohappyeyeballs seems to be going well. No new issues for aiohappyeyeballs filed, Home Assistant release went well, and my testing went well. I think I’m ready to start testing this PR today

I’ll start by running aiohappyeyeballs test suite against it and if it passes try to write a meaningful benchmark to compare the implementations

@ZeroIntensity
Copy link
Member

@bdraco how is this playing with aiohttp?

@bdraco
Copy link
Contributor

bdraco commented Oct 8, 2024

@bdraco how is this playing with aiohttp?

I didn't test it in production yet because of the comments I posted above: It looks like there is a race where tasks can be garbage collected before they return a result:

#124847 (comment)

@ZeroIntensity
Copy link
Member

Ok, good to know.

@1st1
Copy link
Member

1st1 commented Oct 9, 2024

I didn't test it in production yet because of the comments I posted above: It looks like there is a race where tasks can be garbage collected before they return a result:

FWIW it's a different systemic issue that we need to fix asap (and backport to 3.13)

#121264 (review)

IMO this PR shouldn't solve that specific problem and focus on a smaller change.

@bdraco
Copy link
Contributor

bdraco commented Oct 9, 2024

@@ -139,3 +139,9 @@ async def staggered_race(coro_fns, delay, *, loop=None):
         # Make sure no tasks are left running if we leave this function
         for t in running_tasks:
             t.cancel()
+            try:
+                await t
+            except (SystemExit, KeyboardInterrupt):
+                raise
+            except BaseException:
+                pass

That fixes the warning but still the aiohappyeyeballs cancellation swallow check test is failing

@graingert
Copy link
Contributor Author

@@ -139,3 +139,9 @@ async def staggered_race(coro_fns, delay, *, loop=None):
         # Make sure no tasks are left running if we leave this function
         for t in running_tasks:
             t.cancel()
+            try:
+                await t
+            except (SystemExit, KeyboardInterrupt):
+                raise
+            except BaseException:
+                pass

That fixes the warning but still the aiohappyeyeballs cancellation swallow check test is failing

Yes awaiting a task and suppressing cancellation will do that. You need to use a loop and a future. But we can fix that in another PR

@ZeroIntensity
Copy link
Member

If aiohttp's implementation has been tested in production (and has all their tests passing), why not just copy theirs over?

@1st1
Copy link
Member

1st1 commented Oct 9, 2024

@ZeroIntensity

If aiohttp's implementation has been tested in production (and has all their tests passing), why not just copy theirs over?

Can you give a link to the final diff of their fix? I can't find it in the few links to different issues I clicked through this discussion.

@ZeroIntensity
Copy link
Member

Here

@1st1
Copy link
Member

1st1 commented Oct 9, 2024

Hm, tbqh I think the approach in this PR is more straightforward. Unless you have a specific concern I'd go with it.

@graingert feel free to go ahead and merge.

@1st1
Copy link
Member

1st1 commented Oct 9, 2024

We do need to backport this though.

@graingert
Copy link
Contributor Author

Hm, tbqh I think the approach in this PR is more straightforward. Unless you have a specific concern I'd go with it.

@graingert feel free to go ahead and merge.

I can't merge, I'll need a core dev to do it for me

@ZeroIntensity
Copy link
Member

Hm, tbqh I think the approach in this PR is more straightforward. Unless you have a specific concern I'd go with it.

I'm worried about the failing tests on aiohttp's end, but I think Thomas said that he'll address that in another PR.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go ahead and approve this to ease stomachs about merging :)

@ZeroIntensity ZeroIntensity added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Oct 10, 2024
@1st1
Copy link
Member

1st1 commented Oct 11, 2024

I can't merge, I'll need a core dev to do it for me

Apologies, I'll go ahead and click the button then.

@1st1 1st1 merged commit 979c0df into python:main Oct 11, 2024
36 of 39 checks passed
@miss-islington-app
Copy link

Thanks @graingert for the PR, and @1st1 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 11, 2024
This patch is entirely by Thomas and Peter

(cherry picked from commit 979c0df)

Co-authored-by: Thomas Grainger <tagrain@gmail.com>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 11, 2024
This patch is entirely by Thomas and Peter

(cherry picked from commit 979c0df)

Co-authored-by: Thomas Grainger <tagrain@gmail.com>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Oct 11, 2024

GH-125339 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Oct 11, 2024
@bedevere-app
Copy link

bedevere-app bot commented Oct 11, 2024

GH-125340 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Oct 11, 2024
1st1 pushed a commit that referenced this pull request Oct 12, 2024
)

Co-authored-by: Thomas Grainger <tagrain@gmail.com>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
1st1 pushed a commit that referenced this pull request Oct 12, 2024
)

gh-124309: fix staggered race on eager tasks (GH-124847)

This patch is entirely by Thomas and Peter

(cherry picked from commit 979c0df)

Co-authored-by: Thomas Grainger <tagrain@gmail.com>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants