Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite urllib3 retry warnings + add 4 diagnostic connection errors #12818

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ichard26
Copy link
Member

@ichard26 ichard26 commented Jul 2, 2024

Towards:

This PR does two things:

  • Captures the ugly Retrying (Retry(total=4 ... urllib3 warnings and rewrites them to be understandable if possible
  • Defines four new PipDiagnosticErrors for connection errors that are clearer and provide some basic hints to the user to aid debugging, which are raised on a best effort basis

While this patch isn't pretty and it's admittedly rather hacky (especially as we cannot upgrade to modern urllib3 until 2025 so upstream improvements are impossible), I'd say the usability benefits are worth it here. ✨

Examples

Show screenshots

Screenshot from 2024-07-15 14-06-36
Screenshot from 2024-07-15 14-07-42
Screenshot from 2024-07-15 14-10-48
Screenshot from 2024-07-15 14-13-31
Screenshot from 2024-07-15 14-18-31
Screenshot from 2024-07-15 14-19-40

@ichard26 ichard26 added C: error messages Improving error messages skip news Does not need a NEWS file entry (eg: trivial changes) labels Jul 2, 2024
@ichard26
Copy link
Member Author

ichard26 commented Jul 14, 2024

In an effort to gather some concrete information to push this PR and effort in general forward, I searched for "retrying" in the issue tracker and organised all of the relevant open and closed reports (stopping after the first 125 closed issues aka #7757). I was going to do all of them, but 2020 is plenty old enough and I've spent enough time on this already :)

New Connection Errors

SSL Errors

Timeout Errors

Protocol Errors

Proxy Errors

Not actually a retry related error, but seems to have good links to further discussion on pip + proxies: #9216

I need to go through these issues more carefully, but the takeaways for now are:

  • A lot of issues stem from proxies, ranging from a misconfigured proxy, bugs in our proxy stack, or a system-wide proxy that pip didn't pick up (due to impl. limitations). Yes, there is a proxy category above, but I'm willing to wager that a good chunk of the other reports ultimately relate to proxies (as they can meddle with the TCP/HTTPS connection in numerous ways that result in strange and varied errors)
  • While there has been already a ton of discussion on improving the reporting in this area, I'd say that the eventual write-up on these network issues should probably include a snippet that uses requests to access some PyPI page. I've seen in several threads that getting the reporter to try accessing PyPI from requests directly helped them narrow down the root issue (likely a proxy 🙂)

@pradyunsg
Copy link
Member

pradyunsg commented Jul 15, 2024

FWIW, One of the ideas I had in mind for the error reporting stack in pip is creating a page structure like pip.pypa.io/en/latest/errors/<diagnostic error code here> where we could elaborate on what the error means and what the user can do to explore fixing it.

It would also, let us use versionadded / versionchanged / versionremoved as well as HTTP redirects on the error reporting side of things as well.

Then, each of the diagnostic errors can get a page whose link is printed in some form of "go here to find out more about this thing".

@ichard26 ichard26 force-pushed the errors/network-connection branch from 3c4c844 to a1dac04 Compare July 15, 2024 17:52
@ichard26 ichard26 changed the title Rewrite urllib3 retry warnings and raise a diagnostic error on connection errors Rewrite urllib3 retry warnings + add 4 diagnostic connection errors Jul 15, 2024
@ichard26
Copy link
Member Author

ichard26 commented Jul 15, 2024

Yup! Setting up the error index on pip's documentation is on the road map. IMO it's the only feasible way for us to provide enough advice to end-users on how to resolve their network/connection issues.


Anyway, an update: I polished the code and new diagnostic messaging in light of the research I did yesterday. I also broke up the patch into two commits, one for each feature. I did try patching urllib3, but on further reflection, I don't want to violate our vendoring policy when it's not really necessary. Sure, it makes the warning filtering a bit more robust and clearer, but it's going to break either way the next time we bump urllib3 (although this entirely patch may break when we migrate to urllib 2.x ...).

All that's left is tests. In the meanwhile, I welcome any initial feedback on the implementation approach and new messaging!

@ichard26
Copy link
Member Author

Oh hmm, I forgot that the finder skips pages that result in a network error instead of bailing outright, hence the failing unit tests. It shouldn't be too bad to adapt its exception handling to handle the new errors, but I do wonder if there are other places where raising a new exception may cause behavioural changes. The rabbit hole continues...

Alright, so this is 100% a hack, but we don't really have any
alternatives for improving the truly user-hostile warnings that urllib3
emits when retrying after a connection error. Upstream improvements
aren't possible until we are able to catch up with urllib3 2.x, but
we're stuck on the 1.x series until at least 2025 due to OpenSSL
incompatibilties.

Thus, the approach taken here is to install a logging filter on the
connectionpool.py urllib3 module (don't worry, the hacks continue[^1])
which "sniffs" for the appropriate warnings and rewrites them as best as
possible with the limited context urllib3 gives us. Under verbose mode,
the original error is appended as a "here's all of the context, sorry
that it's unreadable" aid.

[^1]: I did have to resort to parsing the timeout error strings with
  regex in order to get the configured timeout (since logically they're
  set per request, not globally or per session... doing it this way
  results in less buggy behaviour and tests a lot less annoying to write).
@ichard26 ichard26 force-pushed the errors/network-connection branch 2 times, most recently from 9c34c00 to 0eb14c0 Compare July 26, 2024 01:16
ichard26 added 3 commits July 25, 2024 21:22
- ConnectionFailedError
- ConnectionTimeoutError
- SSLVerificationError
- ProxyConnectionError

The first one is the most generic of the bunch. It's the error that
is raised when is no better (or at least defined) error to raise. This
sucks, but unfortunately requests and in particular urllib3 errors are
terrible. They lack so much context that it is challenging to present
specific, actionable errors. For example, there's no way to deduce that
a urllib3 NewConnectionError is caused by DNS unless you inspect the
error chain (for socket.gaierror) or parse the raw error string (which
is too hacky for my taste).

The other three are a lot better, although they are admittedly still
rather vague. The SSLVerificationError can't discern whether it was
caused by an expired or self-signed certificate for example. To work
around this, I've opted to include the original error as additional
context (with some limited rewriting if possible).

Anyway, the main benefit here is that despite the coarseness, we're able
to provide at least some actionable advice through the diagnostic hints.
In addition, these errors are way more readable than whatever requests
or urllib3 spits out (often being caught, stringified and presented in
single-line screaming red by pip).
Frankly, this is essentially guaranteed to break the next time we
upgrade urllib3 (esp. as we're due for a major bump to the 2.x series),
but these unit tests serve to ensure the rewriting filters works *now*
and demonstrates in what situations it does work.
Also refactor two testing utilities:

- I moved the rendered() helper which converts a rich renderable (or str
  equivalent) into pure text from test_exceptions.py to the test
  library. I also enabled soft wrap to make it a bit nicer for testing
  single sentences renderables (although it may make sense to make it
  configurable later on).

- I extracted the "instant close" HTTP server into its own fixture so it
  could be easily reused for a connection error unit test as well.
@ichard26 ichard26 force-pushed the errors/network-connection branch from 0eb14c0 to 36ccb34 Compare July 26, 2024 01:23
@ichard26 ichard26 marked this pull request as ready for review July 26, 2024 01:58
@ichard26
Copy link
Member Author

Yay, CI is passing. This is finally ready for review 🎉

@ichard26
Copy link
Member Author

Hi, I'd like to see some initial feedback on this as the effort-vs-reward ratio is extremely favourable here. I'm scheduling this PR for the 25 release, but this is only a suggestion. Feel free to push this back as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: error messages Improving error messages skip news Does not need a NEWS file entry (eg: trivial changes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants