-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resume incomplete download #11180
base: main
Are you sure you want to change the base?
Resume incomplete download #11180
Conversation
03859cb
to
c467a5c
Compare
25cab52
to
c869789
Compare
if total_length is not None and bytes_received < total_length: | ||
if self._resume_incomplete: | ||
logger.critical( | ||
"Failed to download %s after %d resumption attempts.", | ||
link, | ||
self._resume_attempts, | ||
) | ||
else: | ||
logger.critical( | ||
"Failed to download %s." | ||
" Set --incomplete-downloads=resume to automatically" | ||
"resume incomplete download.", | ||
link, | ||
) | ||
os.remove(filepath) | ||
raise RuntimeError("Incomplete download") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what to do here. I don't think throwing an exception (L244) and printing a long (and useless) stack trace is user-friendly, but I can't think of a better alternative. Maybe we should just reuse the the same log messages above so it's at least helpful?
I think throwing an (subclassed) DiagnosticPipError here might be a good idea? We can let the user know that:
- the download is incomplete
- the incomplete file has been cleaned up
- they can use
--incomplete-downloads=resume
to enable the feature if they haven't already - they can modify the retry limit with
--incomplete-download-retries
.
pip/src/pip/_internal/exceptions.py
Lines 54 to 63 in e5898ab
class DiagnosticPipError(PipError): | |
"""An error, that presents diagnostic information to the user. | |
This contains a bunch of logic, to enable pretty presentation of our error | |
messages. Each error gets a unique reference. Each error can also include | |
additional context, a hint and/or a note -- which are presented with the | |
main error message in a consistent style. | |
This is adapted from the error output styling in `sphinx-theme-builder`. | |
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented the above in a separate commit: a082517.
c869789
to
d655669
Compare
93b9d14
to
a082517
Compare
I saw that a previous version also queries |
I wonder (discussions needed!) if we should aggressively set the default to |
As per RFC 7233 servers that don't support So to answer your question, checking |
I think it might make more sense if we only have a single |
Ah yes that makes sense (perhaps with a shorter option name though) |
Suggestions are welcome :) |
Just use |
Could you elaborate? I don't think |
I think it’s “just use |
Ah I see. Our resume function is conceptually similar to On a side note, part of the problem should get fixed upstream soon (psf/requests#4956, psf/requests#6092), but that doesn't change the fact that pip has to make additional partial requests to resume. |
Yeah that’s the problem, we can’t really use the same counter between connection retries and resumes. But I’m guessing it’s not that big a problem and we could intentionally implement the wrong behaviour and find a way to fix that later… For example for |
Good point. But that's an easy change so I think we can wait a bit and see what others think. |
I think the default 5 connection retries should include resuming, and that resuming should not start from 0 by default, but would not mind reusing the user's |
Imo the download shouldn't resume by starting by 0. Let it resume using the already-partially-downloaded file. |
The implementation in this PR resumes from partially downloaded file when possible. There are cases where resuming is not possible, e.g. when the file has changed on the server after we started the download, and we have to start downloading from scratch again. Is that the behavior you want? |
Oh right, sorry. I misunderstood exactly what this PR does and from this new point of view I have nothing to say as it sounds much useful to many people who have poor/low bandwidth. |
I was having problems no being able to download large files with pip because of my slow internet but this helped so much. Thank you, it was really useful. |
Overview
This PR adds a feature that resumes download if the downloaded file is incomplete (e.g. when the Internet connection is poor). More specifically, if :
Content-Length
header,Content-Length
header,--incomplete-downloads=resume
,the downloader will make new requests and attempt to resume download using a
Range
header. If the initial response includes anETag
(preferred) orDate
header, the downloader will ask the server to resume download only when it is safe (i.e., the file hasn't changed since the initial request) using anIf-Range
header.If the server responds with a 200 (e.g. if the server doesn't support partial content or can't check if the file has changed), the downloader will restart the download (i.e. start from the very first byte); if the server responds with a 206 Partial Content, the downloader will resume the download from the partially downloaded file.
Note if the server always responds with 200, the downloader can potentially get stuck and waste unreasonable amounts of bandwidth downloading the first few bytes over and over again. Therefore, a retry limit is introduced to avoid this case.
If not enough bytes are received and auto resumption is disabled or the retry limit is exceeded, the downloader will clean up the incomplete file and fail with an exception.
Flags
To control the auto resumption behavior, two new flags are added:
--incomplete-downloads=resume,/discard
controls whether the auto resumption feature is enabled (defaults todiscard
);--incomplete-download-retries
limits the maximum number of retries (defaults to5
).Towards #4796