-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content-Length is not checked, resulting in short reads with no error #4956
Comments
I should also add that I'm happy to port over #3563 but I'm unclear on why it was reverted or if the approach applies to requests 2.x |
I have realized it wasn't reverted, I misread the PR that I thought reverted it. |
I can confirm that Requests ignores For a simple demonstration, first install Flask, copy the following code into a file, and run it:
You're now running a server on localhost that, when you make a request to http://localhost:5000/test, returns a response with a 14-byte-long body but with a Content-Length header claiming that the body contains 10000 bytes. Now observe how different tools handle making a request to that endpoint:
Used in the most straightforward possible way:
Whoops. So, why did this happen? Requests is built on top of The answer is that there are possible ways to use both How
|
It looks like the audit trail was followed for the most part here. We did merge this into Requests in #3563, but it's in a separate branch that was intended for 3.0. Adding the flag is a breaking change for the 2.x branch, so we're unable to resolve this until the next major version. |
Just providing an update, I've opened urllib3/urllib3#2514 to change the default here in urllib3 2.0. |
It's great the next versions of both urllib3 and requests have the fixes. But those branches seem to have a very unpredictable timeplan before they can be released. "Before PyCon 2020" ? Any suggestion how to work around this effectively for us mortals still on the released versions? Right now we are looking at wrapping every file downloaded with requests, using our own content length and checksum checking to be safe. Which is a good idea anyway, but not always possible since you don't always have this side-channel data. Considering the severity of this bug (silent corrupted files), is there any chance of getting this into the 2.x branch? |
I ran into this problem today and found no fix in requests 2.x and no 3.x release to upgrade to. I found this blog article that offers a solution in 2.x I'm still torn between treating json parsing errors the same as transport errors, putting in the above blogs quick fix in a custom session, or trying out httpx instead. |
I guess the release of urllib3 2.0 fixes this, via urllib3/urllib3#2514? (I'm slightly reluctant to mention this, because if you're happy to let it be fixed in a minor release via urllib3... then it seems to me that you might just as well have fixed it directly in requests long ago. So I'm worried you're now going to feel the need to break it again, for compatibility. Please don't!) |
@dimbleby, yes, it is fixed in urllib3 2. We're aware of the change as we made it :) Users are free to upgrade to 2.0 as we made no mandated changes in Requests which is why this is acceptable within a minor version. Requests is still compatible with 1.26.x, as well as 2.x which includes this change. I'll close this out now since there's a path forward for users with this issue. |
that caused incomplete downoads without errors when transfer was interrupted. See psf/requests#4956 and https://github.com/urllib3/urllib3/pull/2514/files for details.
Currently, requests does not detect when the length of a response body does not match the
Content-Length
header. This behavior results in undetected failures when a TCP connection is closed early (for example) and is contrary to the behavior of other tools (e.g.curl
) and libraries (e.g.reqwest
) which fail if a body length does not matchContent-Length
[3]. It is also contrary to the HTTP 1.1 RFC [5]. Frustration with this behavior is well documented ([1][2][3][4]) and was first reported as a bug 5 years ago [5]. In [1] it was agreed that the underlying checking should be done in urllib3. This was implemented in urllib3 v1.17 with theenforce_content_length
setting [6]. However the setting defaults to False. At the time this was implemented in urllib3, there was a merge on requests:proposed/3.0.0 [7] that setenforce_content_length
to True so that short reads would be detected (it was later revertededit: it wasn't reverted), but no such merge happened on requests 2.x branch.Requests master now requires urllib3>=1.21 so we are guaranteed to have enforce_content_length. There is a strong argument to be made that
enforce_content_length=True
should be the default but at the very least it should be possible for a user to opt into this.[1] #2833
[2] #2275
[3] https://blog.petrzemek.net/2018/04/22/on-incomplete-http-reads-and-the-requests-library-in-python/
[4] https://news.ycombinator.com/item?id=16896899
[5] #1855
[6] urllib3/urllib3#949
[7] #3563.
Expected Result
HTTP bodies that do not match the HTTP
Content-Length
header should be detectedActual Result
HTTP bodies that do not match the HTTP
Content-Length
header are not detected.Reproduction Steps
#2833 (comment)
System Information
The text was updated successfully, but these errors were encountered: