Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle OSError on failure to close socket instead of raising IndexError #114

Merged
merged 7 commits into from
Nov 30, 2024

Conversation

todddialpad
Copy link
Contributor

What do these changes do?

It is possible for loop.sock_connect to fail in such a way that the wrapped socket no longer refers to a valid file descriptor. In this, the exception handler can fail on the attempt to call sock.close(). When this happens, start_connection can have sock == None and exceptions == []. This causes an IndexError because it assumes that exceptions always contains at least one exception instance.

Are there changes in behavior for the user?

I experienced crashes when handling connections with short timeouts.

Related issue number

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes

It is possible for `loop.sock_connect` to fail in such a way that the wrapped socket no longer refers to a valid file descriptor. In this, the exception handler can fail on the attempt to call `sock.close()`. When this happens, `start_connection` can have `sock == None` and `exceptions == []`. This causes an `IndexError` because it assumes that `exceptions` always contains at least one exception instance.
Fix for type checker
@bdraco
Copy link
Member

bdraco commented Nov 20, 2024

Would you please post the full trace you get when it raises IndexError in the opening text.

I'll try to get the code coverage reporting fixed

@bdraco
Copy link
Member

bdraco commented Nov 20, 2024

I bumped the codecov-action in #115 so hopefully codecov will work again once this PR has tests added

@webknjaz
Copy link
Member

Will this fix #112 / #93? If so, they should be linked. Plus, I hope for some tests covering the new code too.

@bdraco
Copy link
Member

bdraco commented Nov 20, 2024

codecov reporting should work now if you update with main

@todddialpad
Copy link
Contributor Author

Will this fix #112 / #93? If so, they should be linked. Plus, I hope for some tests covering the new code too.

Ah yes, I believe this is the same issue.

This started happening to us in production and the closest I have been able to come to a standalone test case is to generate statistically similar traffic patterns (which in our case means fairly aggressive timeouts). I haven't been able yet to craft a self-contained unit test.

chore: bump codecov-action to 5.0.3 (aio-libs#115)
Copy link

codecov bot commented Nov 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (fd90f56) to head (b2ab123).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #114   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines          227       235    +8     
  Branches        60        60           
=========================================
+ Hits           227       235    +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Dreamsorcerer
Copy link
Member

This started happening to us in production and the closest I have been able to come to a standalone test case is to generate statistically similar traffic patterns (which in our case means fairly aggressive timeouts). I haven't been able yet to craft a self-contained unit test.

You should still be able to provide the stacktrace from the logs though? That gives a starting point for us to look at.

@todddialpad
Copy link
Contributor Author

todddialpad commented Nov 30, 2024

This started happening to us in production and the closest I have been able to come to a standalone test case is to generate statistically similar traffic patterns (which in our case means fairly aggressive timeouts). I haven't been able yet to craft a self-contained unit test.

You should still be able to provide the stacktrace from the logs though? That gives a starting point for us to look at.

I think when I have seen this error, it is generated as a side effect of the uvloop issue MagicStack/uvloop#645.

Not sure how helpful the stack trace is:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/aiohappyeyeballs/impl.py", line 174, in _connect_sock
    await loop.sock_connect(sock, address)
  File "uvloop/loop.pyx", line 2633, in sock_connect"

@bdraco
Copy link
Member

bdraco commented Nov 30, 2024

It might make more sense to wait for uvloop to do a new release with MagicStack/uvloop#645 as it feels like this is papering over the problem and we could unintentionally be hiding that uvloop's internal state is incorrect/broken in some way.

@todddialpad
Copy link
Contributor Author

It might make more sense to wait for uvloop to do a new release with MagicStack/uvloop#645 as it feels like this is papering over the problem and we could unintentionally be hiding that uvloop's internal state is incorrect/broken in some way.

Certainly can, but I don't think this papers over anything, just avoids the extraneous IndexError exception that itself may hide a uvloop issue.

@bdraco
Copy link
Member

bdraco commented Nov 30, 2024

It might make more sense to wait for uvloop to do a new release with MagicStack/uvloop#645 as it feels like this is papering over the problem and we could unintentionally be hiding that uvloop's internal state is incorrect/broken in some way.

Certainly can, but I don't think this papers over anything, just avoids the extraneous IndexError exception that itself may hide a uvloop issue.

Nevermind, I was overthinking it. The exception will still get re-raised at the end. So I think this is OK to merge once we have test coverage

@bdraco bdraco changed the title Fix IndexError on failure to close socket fix: handle OSError on failure to close socket instead of raising IndexError Nov 30, 2024
@bdraco
Copy link
Member

bdraco commented Nov 30, 2024

Don't worry about the individual commit messages. I'll squash it before merging

Copy link
Member

@bdraco bdraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @todddialpad

@bdraco bdraco merged commit c542f68 into aio-libs:main Nov 30, 2024
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants