Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add connect retries #1196

Closed
wants to merge 20 commits into from
Closed

Add connect retries #1196

wants to merge 20 commits into from

Conversation

florimondmanca
Copy link
Member

@florimondmanca florimondmanca commented Aug 19, 2020

Closes #1141

Adds off-by-default "retry for connect errors" behavior, used as httpx.Client(..., retries=<int>).

  • Makes sure that only ConnectError/ConnectTimeout are retried on, and that we only retry on idempotent HTTP verbs (i.e. exclude those that by definition should not be retried on because it could duplicate state changes on the server side).
  • httpx.request(<method>, connect_retries=<int>) is not included, since I'm not sure we want this knob in the high-level API.
  • The httpx.Client(retries=<int>) API is a good start to any future enhancements to retry functionality, such as an httpx.Retry class for encapsulating retries configuration, or providing a custom retries API, while preserving "retries=<int> retries on connect errors" for the most basic use cases.

Rendered docs preview:

Screenshot 2020-09-01 at 23 11 54

@tomchristie
Copy link
Member

Fab. 👍✨

Thoughts...

  • I guess we should probably use the inverse form instead of if request.method in ("POST", "PATCH"), eg. if request.method in IDEMPOTENT_METHODS with IDEMPOTENT_METHODS = ["GET", "HEAD", "PUT", "DELETE", "OPTIONS", "TRACE"], so that requests with nonstandard methods are not treated as idempotent.
  • Wondering if we ought to be keeping the API as narrow as possible, and initially only supporting retries: int = 0? Rather than exposing httpx.Retries() and backoff_factor. Pushing back on the API surface area as long as possible is generally a good idea.
  • It'd be good to review requests/urllib3 behaviour here, and just double check that we're matching up with requests' behaviour when installing an adapter with retries=<int>. Any working through of that would be great.

@tomchristie
Copy link
Member

The requests docs give...

max_retries – The maximum number of retries each connection should attempt. Note, this applies only to failed DNS lookups, socket connections and connection timeouts, never to requests where data has made it to the server. By default, Requests does not retry failed connections. If you need granular control over the conditions under which we retry a request, import urllib3’s Retry class and pass that instead.

However, it looks like if retries != 0 then Retry.from_int(...) is used...

https://github.com/psf/requests/blob/48237afd9d064c639b9c2bcea7e75ab7b717b181/requests/adapters.py#L116-L119

Which I think means you'll end up with read=None on the class (Not the same as read=False), and that read retries will be used... https://github.com/urllib3/urllib3/blob/f0c43419e45db17a3c5f287be80aa0d14e2f58c6/src/urllib3/util/retry.py#L243

So I think the actual behaviour that requests uses isn't quite as documented, and is actually...

  • ConnectError and ConnectTimeout always retry if retries=<positive int>.
  • ReadError retries if the method is idempotent and retries=<positive int>.

?

@florimondmanca
Copy link
Member Author

florimondmanca commented Aug 29, 2020

Based on #1141 (comment), I updated this PR to solely focus on adding support for httpx.Client(connect_retries=<int>).

@tomchristie Should be ready for a new round of reviews…

@florimondmanca florimondmanca requested a review from a team August 29, 2020 19:10
@florimondmanca florimondmanca changed the title Add retries Add connect retries Aug 29, 2020
@tomchristie tomchristie added the enhancement New feature or request label Sep 1, 2020
httpx/_client.py Outdated Show resolved Hide resolved
tomchristie
tomchristie previously approved these changes Sep 1, 2020
Copy link
Member

@tomchristie tomchristie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nicely done.

I think this probably strikes the right balance for us at the moment.
One comment for us to think about around the parameter naming.

Plus I guess we'd want the docs in before we merge it?

But yeah I think this is pretty fab. 👍👍👍

@florimondmanca
Copy link
Member Author

@tomchristie Changed the connect_retries naming, and added some docs. :-) Leaving this open for a bit in case you or anyone else has any comments on the docs, or remaining comments on the implementation.

@tomchristie tomchristie mentioned this pull request Sep 2, 2020
httpx/_client.py Outdated Show resolved Hide resolved
@florimondmanca
Copy link
Member Author

@tomchristie Addressed latest feedback — we should be good to go on this now? 😄

@tomchristie
Copy link
Member

Okay, so I was leaving this on pause for a bit to try to figure out if connection retries really make sense on the client on at the transport layer, and I think we really do want this on the transport layer, wrapping as close to the connection opening as possible.

The reason here is that if we pull connection retries out to the client layer, then we can end up in a gnarly situation when there's high contention on the connection pool. With connection retries out at the client layer an incoming request with a retry will:

  • Wait to acquire the max connections semaphore.
  • Attempt a connection and fail.
  • Release the max connections semaphore.
  • Wait to acquire the max connections semaphore.
  • Attempt a connection and fail.
  • Release the max connections semaphore.

We'd really prefer a behaviour where once the semaphore is acquired, the connection will perform any connect+retry within that scope, so...

  • Wait to acquire the max connections semaphore.
  • Attempt a connection and fail.
  • Attempt a connection and fail.
  • Release the max connections semaphore.

This means much less thrashing when there's high contention.
Also supposing we have a timeout configuration like... httpx.Timeout(5.0, pool=60.0), and retries=2, then the maximum possible timeout is a more expected 60+5+5+5, rather than a possibly surprising 60+60+60+5+5+5.

Anyways, upshot of this is that I think? I'm keen on us pushing the implementation details of this into the transport layer. Possibly by passing retries as part of the __init__ configuration, and handling the retry behaviour within the Connection._open_socket method?... https://github.com/encode/httpcore/blob/92d3e9bbd3c442c6035860eea9e15a2249e0cad8/httpcore/_async/connection.py#L99

@florimondmanca
Copy link
Member Author

@tomchristie The motivations wrt timeouts and pooling are strong, indeed. Thoughts on my reservations related to testing here? #1141 (comment) On HTTPCore we can't just "swap out a transport", so I'm not sure how we'd test this w/o monkeypatch since it would be right into the socket opening logic…

@tomchristie
Copy link
Member

Thoughts on my reservations related to testing here? #1141 (comment) On HTTPCore we can't just "swap out a transport", so I'm not sure how we'd test this w/o monkeypatch since it would be right into the socket opening logic…

Ah really great point yup.

This is kinda the point at which we need to be digging into exposing backend=<BackendClass>, and having tests that run against mock backends, so that we can do stuff like testing connection retries, and all sorts of other in-detail bits.

@florimondmanca
Copy link
Member Author

Opened encode/httpcore#221 against HTTPCore, closing this one for now!

@florimondmanca florimondmanca deleted the fm/retries branch October 10, 2020 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retries
2 participants