Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Received "Response payload is not completed" when reading response #4581

Closed
pablogsal opened this issue Feb 17, 2020 · 89 comments
Closed

Received "Response payload is not completed" when reading response #4581

pablogsal opened this issue Feb 17, 2020 · 89 comments
Assignees
Labels
bug needs-info Issue is lacking sufficient information and will be closed if not provided

Comments

@pablogsal
Copy link

pablogsal commented Feb 17, 2020

🐞 Describe the bug

A ClientPayloadError: Response payload is not completed exception is raised when reading the response of a GET request from the GitHub REST API (this should not be especially relevant). The response seems correct and using curl succeeds all the time.

💡 To Reproduce

Do lots of requests against an endpoint of the GitHub API (it happens from time to time).

💡 Expected behavior
The response is correctly parsed.

📋 Logs/tracebacks

  File "/opt/lib/python3.8/site-packages/gidgethub/aiohttp.py", line 20, in _request
    return response.status, response.headers, await response.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 973, in read
    self._body = await self.content.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 358, in read
    block = await self.readany()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 380, in readany
    await self._wait('readany')
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 296, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Some contextual information at the time of the exception at aiohttp/streams.py", line 358, in read:

blocks =  [b'[\n  {\n    "id": 2941521,\n    "node_id": "MDU6TGFiZWwyOTQxNTIx",\n    "url": "https://REDACTED_GITHUB_URL/repos/testorg/test-debian/labels/skip-issue",\n    "name": "skip-issue",\n    "color": "000000",\n    "default": false\n  }\n]\n']

n=-1

As you can see the blocks contain the whole payload (the full JSON) but aiohttp still complains about the payload not being completed.

📋 Your version of the Python

$ python --version
Python 3.8.1

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: async-timeout, attrs, chardet, multidict, yarl
Required-by: raven-aiohttp, pytest-aiohttp, aiojobs
$ python -m pip show multidict
Name: multidict
Version: 4.7.3
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.4.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /Users/pgalindo3/.pyenv/versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp

📋 Additional context

I suspect this has to be with this comment:

aiohttp/aiohttp/streams.py

Lines 352 to 356 in 1740c99

# TODO: should be `if` instead of `while`
# because waiter maybe triggered on chunk end,
# without feeding any data
while not self._buffer and not self._eof:
await self._wait('read')

as this is where is failing and the situation seems that is the one described in the TODO.

@pablogsal pablogsal added the bug label Feb 17, 2020
@pablogsal
Copy link
Author

CC: @socketpair @asvetlov as the TODO was added in 5c4cb82

@webknjaz
Copy link
Member

I'm curious if it's somehow related to #4549.

@WH-2099
Copy link
Contributor

WH-2099 commented Jul 20, 2020

I'm curious if it's somehow related to #4549.

I encountered these two problems at the same time. #4549 #4581
Python: 3.8.4
aiohttp: 3.6.2

I think there do have some relaionship.
And I think the core conditions are frequent visits to a single domain name.

@WH-2099
Copy link
Contributor

WH-2099 commented Jul 21, 2020

CC: @socketpair @asvetlov as the TODO was added in 5c4cb82

Change the while to if seems no help.

jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

This should speed up the CIBA tests and OIDC test modules where we use
dcr can hence can proceed without any alias. Unfortunately the CIBA
tests had to have CIBA disabled because Authlete's CIBA simulated
authentication device doesn't seem to cope with multiple parallel

Reduces the time to run oidcc-basic-certification-test-plan from 313 seconds
to 63 seconds on my local machine - most of which comes from the test
that sleeps for 30 seconds before reusing an auth code.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	scripts/run-test-plan.py
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	scripts/run-test-plan.py
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 9, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
@tildedave
Copy link

tildedave commented Jun 23, 2021

Hi there, I've received this error too. My use case is a reverse-proxy in front of gunicorn. The other reports of this instance on this repo appear to be poorly behaved HTTP servers, which I think is a separate issue.

My code streams the proxy body back to upstream, using code roughly like:

response = web.StreamingResponse(headers=response_headers, status=status)
async for chunk in gunicorn_response.content.iter_chunked(64 * 1_024):
   await response.write(chunk)
await response.write_eof()
return response

What seems to be happening is that gunicorn returns a Connection: close header and then there is a race condition between aiohttp reading the body and the connection being closed. If I get the data out of aiohttp in time, it works, but sometimes this stack trace is triggered.

Investigation:

  • Calling await gunicorn_response.content.wait_eof() prior to calling iter_chunked reduces the instance of the error, but it still happens occasionally.
  • Setting the Connection: keep-alive header in the reverse proxy's request to gunicorn resolves this issue. This is an acceptable workaround for me.
  • Changing the while to an if in the source code in conjunction with calling wait_eof() prior to iterating over the body resolves the issue.

jogu added a commit to openid-certification/conformance-suite that referenced this issue Nov 21, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	.gitlab-ci.yml
#	.gitlab-ci/run-tests.sh
#	scripts/run-test-plan.py
#	test/Dockerfile
jogu added a commit to openid-certification/conformance-suite that referenced this issue Nov 21, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	.gitlab-ci.yml
#	.gitlab-ci/run-tests.sh
#	scripts/run-test-plan.py
#	test/Dockerfile
@asvetlov asvetlov self-assigned this Dec 14, 2021
@thehesiod
Copy link
Contributor

we have this happening between an aiohttp 3.8.1 client + server in production, trying to reproduce

@ghost
Copy link

ghost commented Feb 10, 2022

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12
aiohttp: 3.8.1
yarl: 1.7.2
multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

@a-gn
Copy link

a-gn commented May 12, 2022

I got this error on Python 3.9 with aiohttp 3.8.1 on Windows. I haven't been able to reproduce.

My server failed around the same time as the error. The backoff library re-ran the request, and I got the ClientPayloadError. Is this expected? Shouldn't losing the server raise ConnectionError instead?

@aralroca
Copy link

I have the same issue with Python 3.10

@joelcorporan

This comment was marked as duplicate.

@junbaibai0719
Copy link

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12 aiohttp: 3.8.1 yarl: 1.7.2 multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

I am also getting that error using iter_chunked when downloading files from S3

martinpitt added a commit to cockpit-project/console.dot that referenced this issue Sep 19, 2022
Instead, pass through the upstream Connection header, like we do with
Upgrade: already.

This helps to avoid aio-libs/aiohttp#4581
martinpitt added a commit to cockpit-project/console.dot that referenced this issue Sep 19, 2022
This is conceptually broken -- the browser should decide when it wants
to upgrade a connection to websocket, and we should just pass that
through (like we do with `Upgrade:` already). This is also what the real
3scale does, so more faithfully reproduces its behaviour.

This also helps to avoid aio-libs/aiohttp#4581
jelly pushed a commit to cockpit-project/console.dot that referenced this issue Sep 20, 2022
This is conceptually broken -- the browser should decide when it wants
to upgrade a connection to websocket, and we should just pass that
through (like we do with `Upgrade:` already). This is also what the real
3scale does, so more faithfully reproduces its behaviour.

This also helps to avoid aio-libs/aiohttp#4581
hail-ci-robot pushed a commit to hail-is/hail that referenced this issue May 11, 2024
…r retrying (#14545)

The treatment of `ClientPayloadError` as a sometimes transient error was
originally made in response to [an existing
issue](aio-libs/aiohttp#4581) in aiohttp that
can cause transient errors on the client that are difficult to
distinguish from a real broken server. What's in `main` matched exactly
on the error message, but that error message has [since
changed](aio-libs/aiohttp@dc38630)
to include more information, breaking our transient error handling. This
change relaxes the requirement of the error response string to fix
transient error handling for our current version of `aiohttp`.

I wish I had a better approach. `ClientPayloadError` can also be thrown
in the case of malformed data, so I am reticent to treat it as always
transient, but we could perhaps make it a `limited_retries_error` and
avoid inspecting the error message.
@jlarkin-mojo
Copy link

jlarkin-mojo commented May 13, 2024

Sorry - was there an official fix for this? I am able to reproduce when utilizing aiohttp through aiobotocore.

Basically:

  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/de_common/aws/s3.py", line 229, in get_object
    body = await response["Body"].read()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiobotocore/response.py", line 56, in read
    chunk = await self.__wrapped__.content.read(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 383, in read
    block = await self.readany()
            ^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 405, in readany
    await self._wait("readany")
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 312, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>

Anyone have any suggestions or thoughts? I'm using aiohttp==3.9.5

@toffeegryphon
Copy link

toffeegryphon commented May 31, 2024

I think I have a reliably reproducible example that is derived from @gtedesco-r7's example
https://gist.github.com/toffeegryphon/6a94a7883923f80c2259bbb297bb0d3b

Python 3.12
aiohttp 3.9.3

A ClientPayloadError(TransferEncodingError) is raised when the server closes the connection (or rather the client realizes the connection is closed) before the client reads the end-of-response 0 size chunk, and a ServerDisconnectedError is raised when the server closes the connection after the end-of-response chunk is read (specifically, when a TCP connector is reused). A 104 Peer Disconnected seems to be raised occasionally, I believe it happens when the client realizes the connection is closed while the response buffer is being read, but I haven't looked too closely at whether this is true or not.

@andrewkruse
Copy link

andrewkruse commented Jun 28, 2024

Was there traction on this (or one of the other incredibly similar errors)? I suspect I'm getting some form of throttle that is causing things to get weird after spending a while reading all sorts of random posts from folks. I can't seem to get a better error message no matter how I try, probably because one of the libraries in the middle eats it or something.

I'm downloading a 124.5MB file from Google Cloud using the gcloud-aio library and fastapi with a streaming response. And I intermittently get this exception. My download takes roughly 4-5 seconds when working correctly. But if I try it a few times in a row (say like 5 or 6), it errors out instead of working.

aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>. SSLError(1, '[SSL] record layer failure (_ssl.c:2580)')

I suspect this is a me problem in some manner, but I can't seem to figure out what issue I'm trying to fix.

Here's my fastapi endpoint:

async def download(request: Request) -> StreamingResponse:
    async def generate_response():
        # we need to make a new session inside this generator
        #  because the one on the request context is closed when we return
        #  the streaming response, and then it goes byebye and fails
        #  the streaming download out with a partial file error.
        async with aiohttp.ClientSession(raise_for_status=True) as blob_session:
            new_storage_client = Storage(session=blob_session)
            async with await new_storage_client.download_stream(bucket, file) as stream:
                yield await stream.read()
    return StreamingResponse(generate_response())

@toffeegryphon
Copy link

For our situation, we've figured out that yeah, it was legitimately the connection being broken from the server side (I.e. No clear bug from aiohttp). We had Nginx as a load balancer/reverse proxy in front of our servers, and whenever a reload happens, the workers would get killed some time after, and any long running requests would get forcefully terminated (and correctly reported as an error).

@Dreamsorcerer
Copy link
Member

@webknjaz @bdraco Think one of you could try the reproducer again on master? I'm suspecting it might be fixed by the chunking issue I sorted out recently.

@Dreamsorcerer Dreamsorcerer added the needs-info Issue is lacking sufficient information and will be closed if not provided label Aug 22, 2024
@bdraco
Copy link
Member

bdraco commented Aug 23, 2024

I can give it a shot this weekend when I'll have the ability to let it run for an extended period (assuming my reattempt at a long travel day tomorrow doesn't go as poorly as my attempt on Wednesday and I actually get to my destination)

@jedrazb
Copy link

jedrazb commented Aug 23, 2024

Hey! I’m encountering a similar issue where the connection appears to be terminated on the server, resulting in a ClientPayloadError(TransferEncodingError) on the client side. This occurs rarely and is difficult to reproduce.

Would simply retrying the request on the client side be sufficient to handle this error, or would it be necessary to recreate the aiohttp session (or implement any special error-handling logic)? Will aiohttp attempt to reconnect to the server automatically upon a retry?

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Aug 23, 2024

The project you've referenced above appears to be using 3.10.3, the fix I mentioned above is in 3.10.4. If you're still experiencing problems on the latest release, you'll have to try and figure out a reproducer or some way to log the raw chunks. However, you should verify whether you are using the compiled C parser (e.g. wheels from pip) or the Python parser (which can be forced with AIOHTTP_NO_EXTENSIONS=1 envvar) first.

If the server is sending malformed responses, then the behaviour appears correct, if you send another request to retry, then aiohttp should close the bad connection and create a new connection for the next request. If the responses are correct, then we'd need to find the cause of the issue here.

@webknjaz
Copy link
Member

@Dreamsorcerer I'm afraid I'm at capacity these days, so I'll defer to @bdraco for now.

@bdraco
Copy link
Member

bdraco commented Aug 24, 2024

I started running the test client again. I'll leave it for a few hours and report back

while python3 4581_client.py; do :; done

@bdraco
Copy link
Member

bdraco commented Aug 24, 2024

No longer reproducible on 3.10.5. I'd say this is fixed

@bdraco bdraco closed this as completed Aug 24, 2024
ntk148v added a commit to iosevka-webfonts/update-centre that referenced this issue Aug 26, 2024
@nihil-admirari
Copy link

I'm getting this error with the following:

male_doctor = 'Arzt'     # works
FEMALE_DOCTOR = 'Ärztin' # THROWS

async with aiohttp.ClientSession(
    connector=aiohttp.TCPConnector(ttl_dns_cache=300, enable_cleanup_closed=True),
    cookie_jar=aiohttp.DummyCookieJar(),
    raise_for_status=True,
    timeout=aiohttp.ClientTimeout(sock_connect=10)
) as session:
    async with session.get('https://www.linguee.com/english-german/search',
                           params={'query': FEMALE_DOCTOR}) as rsp:
        txt = await rsp.text()

Server return slightly different responses to a successful and an unsuccessful queries.

Artz has Content-Length and no 'Transfer-Encoding': 'chunked':

<ClientResponse(https://www.linguee.com/english-german/search?query=Arzt) [200 None]>
<CIMultiDictProxy('Date': 'Thu, 03 Oct 2024 21:12:17 GMT', 'Content-Type': 'text/html; charset="iso-8859-15"', 'Content-Length': '24220', 'Cache-Control': 'public, max-age=86400', 'x-trace-id': '0f2ae5c129b94c428e4173bc79e1b87f', 'Set-Cookie': 'ForeignLang=DE; SameSite=Strict; Max-Age=63072000; Path=/', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'br', 'Age': '5461', 'x-linguee-cache-status': 'HIT', 'x-cache-status': 'MISS', 'strict-transport-security': 'max-age=63072000; includeSubDomains; preload', 'server-timing': 'l7_lb_tls;dur=363, l7_lb_idle;dur=0, l7_lb_receive;dur=0, l7_lb_total;dur=364', 'Access-Control-Expose-Headers': 'Server-Timing, X-Trace-ID')>

Ärztin doesn't have Content-Length, but has 'Transfer-Encoding': 'chunked':

<ClientResponse(https://www.linguee.com/english-german/search?query=%C3%84rztin) [200 None]>
<CIMultiDictProxy('Date': 'Thu, 03 Oct 2024 21:17:26 GMT', 'Content-Type': 'text/html; charset="iso-8859-15"', 'Transfer-Encoding': 'chunked', 'Cache-Control': 'public, max-age=86400', 'x-trace-id': 'da7d20f6eba742e1b1d801e9ab56b688', 'Set-Cookie': 'ForeignLang=DE; SameSite=Strict; Max-Age=63072000; Path=/', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'br', 'Age': '5769', 'x-linguee-cache-status': 'HIT', 'x-cache-status': 'MISS', 'strict-transport-security': 'max-age=63072000; includeSubDomains; preload', 'server-timing': 'l7_lb_tls;dur=440, l7_lb_idle;dur=0, l7_lb_receive;dur=0, l7_lb_total;dur=445', 'Access-Control-Expose-Headers': 'Server-Timing, X-Trace-ID')>

May be a server problem, but I have no control over it, and browser and curl manage to parse the response successfully.

I'm on Linux.

Python 3.12.3

Name: aiohttp
Version: 3.10.8

Name: multidict
Version: 6.1.0

Name: yarl
Version: 1.13.1

Stacktrace:

---------------------------------------------------------------------------
TransferEncodingError                     Traceback (most recent call last)
File /opt/venv/net/lib/python3.12/site-packages/aiohttp/client_proto.py:92, in ResponseHandler.connection_lost(self, exc)
     91 try:
---> 92     uncompleted = self._parser.feed_eof()
     93 except Exception as underlying_exc:

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/_http_parser.pyx:513, in aiohttp._http_parser.HttpParser.feed_eof()

TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.

The above exception was the direct cause of the following exception:

ClientPayloadError                        Traceback (most recent call last)
Cell In[29], line 8
      1 async with aiohttp.ClientSession(
      2         connector=aiohttp.TCPConnector(ttl_dns_cache=300, enable_cleanup_closed=True),
      3         cookie_jar=aiohttp.DummyCookieJar(),
      4         raise_for_status=True,
      5         timeout=aiohttp.ClientTimeout(sock_connect=10)
      6     ) as session:
      7     async with session.get('https://www.linguee.com/english-german/search', params={'query': 'Ärztin'}) as rsp:
----> 8         txt = await rsp.text()

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/client_reqrep.py:1220, in ClientResponse.text(self, encoding, errors)
   1218 """Read response payload and decode."""
   1219 if self._body is None:
-> 1220     await self.read()
   1222 if encoding is None:
   1223     encoding = self.get_encoding()

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/client_reqrep.py:1178, in ClientResponse.read(self)
   1176 if self._body is None:
   1177     try:
-> 1178         self._body = await self.content.read()
   1179         for trace in self._traces:
   1180             await trace.send_response_chunk_received(
   1181                 self.method, self.url, self._body
   1182             )

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/streams.py:386, in StreamReader.read(self, n)
    384 blocks = []
    385 while True:
--> 386     block = await self.readany()
    387     if not block:
    388         break

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/streams.py:408, in StreamReader.readany(self)
    404 # TODO: should be `if` instead of `while`
    405 # because waiter maybe triggered on chunk end,
    406 # without feeding any data
    407 while not self._buffer and not self._eof:
--> 408     await self._wait("readany")
    410 return self._read_nowait(-1)

File /opt/venv/net/lib/python3.12/site-packages/aiohttp/streams.py:315, in StreamReader._wait(self, func_name)
    313 try:
    314     with self._timer:
--> 315         await waiter
    316 finally:
    317     self._waiter = None

ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>

@Dreamsorcerer
Copy link
Member

The server closes the connection without completing the message.
Browsers atleast tend to render data as it's received, without worrying so much about whether it's received the full message or not.

If we wanted to support something like this, we'd need some way to signal to the user that the message is probably not complete. i.e. There is no way for us to know if what we received is a valid message or only part of one (or, strictly speaking, it is only part of a message).

@Dreamsorcerer
Copy link
Member

I don't see it working with curl though:

> curl 'https://www.linguee.com/english-german/search?query=%C3%84rztin'
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
> curl 'https://www.linguee.com/english-german/search?query=Ärztin'
<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>

@nihil-admirari
Copy link

Warning: Binary output can mess up your terminal

Had the same response for both Arzt and Ärztin. When I said curl is working I meant that there is no difference between a throwing and a non-throwing response as far as curl is concerned. I haven't played with command line options, probably there was some way to decode the binary output.

Apparently they've fixed the server: can't reproduce a problem anymore. curl also print HTML to the console.

If we wanted to support something like this, we'd need some way to signal to the user that the message is probably not complete.

Would be appreciated, apparently the problem pop-ups from time to time. May be change ClientPayloadError: Response payload is not completed to a specific ClientIncompleteResponse: The server closed the connection without completing the message, and then make the partial response available? Someone can catch ClientIncompleteResponse and get whatever was received in rsp.text().

@Dreamsorcerer
Copy link
Member

Had the same response for both Arzt and Ärztin. When I said curl is working I meant that there is no difference between a throwing and a non-throwing response as far as curl is concerned.

Huh, I didn't test curl with the other one. But, I was pretty sure that it was producing binary output because the message was incomplete. The binary output was the raw brotli encoded bytes, i.e. curl didn't attempt to decompress the response.

May be change ClientPayloadError: Response payload is not completed to a specific ClientIncompleteResponse: The server closed the connection without completing the message, and then make the partial response available? Someone can catch ClientIncompleteResponse and get whatever was received in rsp.text().

If you're using a .read() variant that loads the whole body into memory anyway, then that could possibly work. If someone wants to try and implement that, we'll review it. But, given it is literally invalid HTTP behaviour and very rarely encountered, I doubt any maintainers will volunteer to work on it.

@joy13975
Copy link

joy13975 commented Nov 12, 2024

In case there are sad people like me who still encounter this problem.

If you are also using nginx as a reverse proxy and somewhere in the config you have proxy_read_timeout that is quite short (my case 3s), this can be a cause for backend streaming API (mine: FastAPI StreamingResponse from LLM) that happens to always exceed 3s.

By making this proxy_read_timeout way above the total streaming time solved the problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs-info Issue is lacking sufficient information and will be closed if not provided
Projects
None yet
Development

No branches or pull requests