Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable keep-alive when working with proxy #3070

Merged
merged 2 commits into from
Jun 12, 2018
Merged

Conversation

asvetlov
Copy link
Member

Many HTTP proxies has buggy keepalive support.

Let's not reuse connection but close it after processing every response.

@socketpair
Copy link
Contributor

I disagree about that. This is performance issue. Could you provide exact cases where that problem reproduced ?

@asvetlov
Copy link
Member Author

The feature was disabled from the very beginning.
I've enabled it in version 3.3 but we have constant timeout issues on our production servers.

Would be nice to have opt-in flag but it requires another issue/PR (maybe an attribute in ProxyInfo dataclass).

@socketpair
Copy link
Contributor

Maybe bug in your production servers ? does not it ?

@asvetlov
Copy link
Member Author

No.
It is definitely a bug of our proxies (force_close for connector removes all timeout issues).
I recall the same problems 4 years ago (different project and proxy provider).
I suspect that buggy proxies are pretty common.

@codecov-io
Copy link

Codecov Report

Merging #3070 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3070      +/-   ##
==========================================
+ Coverage   98.11%   98.11%   +<.01%     
==========================================
  Files          42       42              
  Lines        7747     7750       +3     
  Branches     1349     1349              
==========================================
+ Hits         7601     7604       +3     
  Misses         51       51              
  Partials       95       95
Impacted Files Coverage Δ
aiohttp/client_proto.py 96% <100%> (+0.05%) ⬆️
aiohttp/connector.py 98.18% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3dfc123...3eba4b3. Read the comment docs.

@asvetlov asvetlov merged commit 8b9924f into master Jun 12, 2018
@asvetlov asvetlov deleted the force-close-proxy branch June 12, 2018 08:39
asvetlov added a commit that referenced this pull request Jun 12, 2018
* Disable keep-alive when working with proxy

* Add changenote
(cherry picked from commit 8b9924f)

Co-authored-by: Andrew Svetlov <andrew.svetlov@gmail.com>
asvetlov added a commit that referenced this pull request Jun 12, 2018
* Disable keep-alive when working with proxy

* Add changenote
(cherry picked from commit 8b9924f)

Co-authored-by: Andrew Svetlov <andrew.svetlov@gmail.com>
kornicameister pushed a commit to kornicameister/korni-stats-collector that referenced this pull request Jun 12, 2018
This PR updates [aiohttp](https://pypi.org/project/aiohttp) from **3.3.1** to **3.3.2**.



<details>
  <summary>Changelog</summary>
  
  
   ### 3.3.2
   ```
   ==================

- Many HTTP proxies has buggy keepalive support. Let&#39;s not reuse connection but
  close it after processing every response. (`3070 &lt;https://github.com/aio-libs/aiohttp/pull/3070&gt;`_)

- Provide vendor source files in tarball (`3076 &lt;https://github.com/aio-libs/aiohttp/pull/3076&gt;`_)
   ```
   
  
</details>


 

<details>
  <summary>Links</summary>
  
  - PyPI: https://pypi.org/project/aiohttp
  - Changelog: https://pyup.io/changelogs/aiohttp/
  - Repo: https://github.com/aio-libs/aiohttp
</details>
shikhir-arora added a commit to gieseladev/discord.py that referenced this pull request Aug 28, 2018
==================

Features
--------

- Add type hints (`3049 &lt;https://github.com/aio-libs/aiohttp/pull/3049&gt;`_)
- Add ``raise_for_status`` request parameter (`3073 &lt;https://github.com/aio-libs/aiohttp/pull/3073&gt;`_)
- Add type hints to HTTP client (`3092 &lt;https://github.com/aio-libs/aiohttp/pull/3092&gt;`_)
- Minor server optimizations (`3095 &lt;https://github.com/aio-libs/aiohttp/pull/3095&gt;`_)
- Preserve the cause when `HTTPException` is raised from another exception. (`3096 &lt;https://github.com/aio-libs/aiohttp/pull/3096&gt;`_)
- Add `close_boundary` option in `MultipartWriter.write` method. Support streaming (`3104 &lt;https://github.com/aio-libs/aiohttp/pull/3104&gt;`_)
- Added a ``remove_slash`` option to the ``normalize_path_middleware`` factory. (`3173 &lt;https://github.com/aio-libs/aiohttp/pull/3173&gt;`_)
- The class `AbstractRouteDef` is importable from `aiohttp.web`. (`3183 &lt;https://github.com/aio-libs/aiohttp/pull/3183&gt;`_)


Bugfixes
--------

- Prevent double closing when client connection is released before the
last ``data_received()`` callback. (`3031 &lt;https://github.com/aio-libs/aiohttp/pull/3031&gt;`_)
- Make redirect with `normalize_path_middleware` work when using url encoded paths. (`3051 &lt;https://github.com/aio-libs/aiohttp/pull/3051&gt;`_)
- Postpone web task creation to connection establishment. (`3052 &lt;https://github.com/aio-libs/aiohttp/pull/3052&gt;`_)
- Fix ``sock_read`` timeout. (`3053 &lt;https://github.com/aio-libs/aiohttp/pull/3053&gt;`_)
- When using a server-request body as the `data=` argument of a client request, iterate over the content with `readany` instead of `readline` to avoid `Line too long` errors. (`3054 &lt;https://github.com/aio-libs/aiohttp/pull/3054&gt;`_)
- fix `UrlDispatcher` has no attribute `add_options`, add `web.options` (`3062 &lt;https://github.com/aio-libs/aiohttp/pull/3062&gt;`_)
- correct filename in content-disposition with multipart body (`3064 &lt;https://github.com/aio-libs/aiohttp/pull/3064&gt;`_)
- Many HTTP proxies has buggy keepalive support.
Let&#39;s not reuse connection but close it after processing every response. (`3070 &lt;https://github.com/aio-libs/aiohttp/pull/3070&gt;`_)
- raise 413 &quot;Payload Too Large&quot; rather than raising ValueError in request.post()
Add helpful debug message to 413 responses (`3087 &lt;https://github.com/aio-libs/aiohttp/pull/3087&gt;`_)
- Fix `StreamResponse` equality, now that they are `MutableMapping` objects. (`3100 &lt;https://github.com/aio-libs/aiohttp/pull/3100&gt;`_)
- Fix server request objects comparison (`3116 &lt;https://github.com/aio-libs/aiohttp/pull/3116&gt;`_)
- Do not hang on `206 Partial Content` response with `Content-Encoding: gzip` (`3123 &lt;https://github.com/aio-libs/aiohttp/pull/3123&gt;`_)
- Fix timeout precondition checkers (`3145 &lt;https://github.com/aio-libs/aiohttp/pull/3145&gt;`_)


Improved Documentation
----------------------

- Add a new FAQ entry that clarifies that you should not reuse response
objects in middleware functions. (`3020 &lt;https://github.com/aio-libs/aiohttp/pull/3020&gt;`_)
- Add FAQ section &quot;Why is creating a ClientSession outside of an event loop dangerous?&quot; (`3072 &lt;https://github.com/aio-libs/aiohttp/pull/3072&gt;`_)
- Fix link to Rambler (`3115 &lt;https://github.com/aio-libs/aiohttp/pull/3115&gt;`_)
- Fix TCPSite documentation on the Server Reference page. (`3146 &lt;https://github.com/aio-libs/aiohttp/pull/3146&gt;`_)
- Fix documentation build configuration file for Windows. (`3147 &lt;https://github.com/aio-libs/aiohttp/pull/3147&gt;`_)
- Remove no longer existing lingering_timeout parameter of Application.make_handler from documentation. (`3151 &lt;https://github.com/aio-libs/aiohttp/pull/3151&gt;`_)
- Mention that ``app.make_handler`` is deprecated, recommend to use runners
API instead. (`3157 &lt;https://github.com/aio-libs/aiohttp/pull/3157&gt;`_)


Deprecations and Removals
-------------------------

- Drop ``loop.current_task()`` from ``helpers.current_task()`` (`2826 &lt;https://github.com/aio-libs/aiohttp/pull/2826&gt;`_)
- Drop ``reader`` parameter from ``request.multipart()``. (`3090 &lt;https://github.com/aio-libs/aiohttp/pull/3090&gt;`_)
kornicameister pushed a commit to kornicameister/korni-stats-collector that referenced this pull request Sep 17, 2018
This PR updates [aiohttp](https://pypi.org/project/aiohttp) from **3.3.2** to **3.4.4**.



<details>
  <summary>Changelog</summary>
  
  
   ### 3.4.4
   ```
   ==================

- Fix installation from sources when compiling toolkit is not available (`3241 &lt;https://github.com/aio-libs/aiohttp/pull/3241&gt;`_)
   ```
   
  
  
   ### 3.4.3
   ```
   ==================

- Add ``app.pre_frozen`` state to properly handle startup signals in sub-applications. (`3237 &lt;https://github.com/aio-libs/aiohttp/pull/3237&gt;`_)
   ```
   
  
  
   ### 3.4.2
   ```
   ==================

- Fix ``iter_chunks`` type annotation (`3230 &lt;https://github.com/aio-libs/aiohttp/pull/3230&gt;`_)
   ```
   
  
  
   ### 3.4.1
   ```
   ==================

- Fix empty header parsing regression. (`3218 &lt;https://github.com/aio-libs/aiohttp/pull/3218&gt;`_)
- Fix BaseRequest.raw_headers doc. (`3215 &lt;https://github.com/aio-libs/aiohttp/pull/3215&gt;`_)
- Fix documentation building on ReadTheDocs (`3221 &lt;https://github.com/aio-libs/aiohttp/pull/3221&gt;`_)
   ```
   
  
  
   ### 3.4.0
   ```
   ==================

Features
--------

- Add type hints (`3049 &lt;https://github.com/aio-libs/aiohttp/pull/3049&gt;`_)
- Add ``raise_for_status`` request parameter (`3073 &lt;https://github.com/aio-libs/aiohttp/pull/3073&gt;`_)
- Add type hints to HTTP client (`3092 &lt;https://github.com/aio-libs/aiohttp/pull/3092&gt;`_)
- Minor server optimizations (`3095 &lt;https://github.com/aio-libs/aiohttp/pull/3095&gt;`_)
- Preserve the cause when `HTTPException` is raised from another exception. (`3096 &lt;https://github.com/aio-libs/aiohttp/pull/3096&gt;`_)
- Add `close_boundary` option in `MultipartWriter.write` method. Support streaming (`3104 &lt;https://github.com/aio-libs/aiohttp/pull/3104&gt;`_)
- Added a ``remove_slash`` option to the ``normalize_path_middleware`` factory. (`3173 &lt;https://github.com/aio-libs/aiohttp/pull/3173&gt;`_)
- The class `AbstractRouteDef` is importable from `aiohttp.web`. (`3183 &lt;https://github.com/aio-libs/aiohttp/pull/3183&gt;`_)


Bugfixes
--------

- Prevent double closing when client connection is released before the
  last ``data_received()`` callback. (`3031 &lt;https://github.com/aio-libs/aiohttp/pull/3031&gt;`_)
- Make redirect with `normalize_path_middleware` work when using url encoded paths. (`3051 &lt;https://github.com/aio-libs/aiohttp/pull/3051&gt;`_)
- Postpone web task creation to connection establishment. (`3052 &lt;https://github.com/aio-libs/aiohttp/pull/3052&gt;`_)
- Fix ``sock_read`` timeout. (`3053 &lt;https://github.com/aio-libs/aiohttp/pull/3053&gt;`_)
- When using a server-request body as the `data=` argument of a client request, iterate over the content with `readany` instead of `readline` to avoid `Line too long` errors. (`3054 &lt;https://github.com/aio-libs/aiohttp/pull/3054&gt;`_)
- fix `UrlDispatcher` has no attribute `add_options`, add `web.options` (`3062 &lt;https://github.com/aio-libs/aiohttp/pull/3062&gt;`_)
- correct filename in content-disposition with multipart body (`3064 &lt;https://github.com/aio-libs/aiohttp/pull/3064&gt;`_)
- Many HTTP proxies has buggy keepalive support.
  Let&#39;s not reuse connection but close it after processing every response. (`3070 &lt;https://github.com/aio-libs/aiohttp/pull/3070&gt;`_)
- raise 413 &quot;Payload Too Large&quot; rather than raising ValueError in request.post()
  Add helpful debug message to 413 responses (`3087 &lt;https://github.com/aio-libs/aiohttp/pull/3087&gt;`_)
- Fix `StreamResponse` equality, now that they are `MutableMapping` objects. (`3100 &lt;https://github.com/aio-libs/aiohttp/pull/3100&gt;`_)
- Fix server request objects comparison (`3116 &lt;https://github.com/aio-libs/aiohttp/pull/3116&gt;`_)
- Do not hang on `206 Partial Content` response with `Content-Encoding: gzip` (`3123 &lt;https://github.com/aio-libs/aiohttp/pull/3123&gt;`_)
- Fix timeout precondition checkers (`3145 &lt;https://github.com/aio-libs/aiohttp/pull/3145&gt;`_)


Improved Documentation
----------------------

- Add a new FAQ entry that clarifies that you should not reuse response
  objects in middleware functions. (`3020 &lt;https://github.com/aio-libs/aiohttp/pull/3020&gt;`_)
- Add FAQ section &quot;Why is creating a ClientSession outside of an event loop dangerous?&quot; (`3072 &lt;https://github.com/aio-libs/aiohttp/pull/3072&gt;`_)
- Fix link to Rambler (`3115 &lt;https://github.com/aio-libs/aiohttp/pull/3115&gt;`_)
- Fix TCPSite documentation on the Server Reference page. (`3146 &lt;https://github.com/aio-libs/aiohttp/pull/3146&gt;`_)
- Fix documentation build configuration file for Windows. (`3147 &lt;https://github.com/aio-libs/aiohttp/pull/3147&gt;`_)
- Remove no longer existing lingering_timeout parameter of Application.make_handler from documentation. (`3151 &lt;https://github.com/aio-libs/aiohttp/pull/3151&gt;`_)
- Mention that ``app.make_handler`` is deprecated, recommend to use runners
  API instead. (`3157 &lt;https://github.com/aio-libs/aiohttp/pull/3157&gt;`_)


Deprecations and Removals
-------------------------

- Drop ``loop.current_task()`` from ``helpers.current_task()`` (`2826 &lt;https://github.com/aio-libs/aiohttp/pull/2826&gt;`_)
- Drop ``reader`` parameter from ``request.multipart()``. (`3090 &lt;https://github.com/aio-libs/aiohttp/pull/3090&gt;`_)
   ```
   
  
</details>


 

<details>
  <summary>Links</summary>
  
  - PyPI: https://pypi.org/project/aiohttp
  - Changelog: https://pyup.io/changelogs/aiohttp/
  - Repo: https://github.com/aio-libs/aiohttp
</details>
@@ -895,6 +895,11 @@ def _get_fingerprint(self, req):
transport, proto = await self._create_direct_connection(
proxy_req, [], timeout, client_error=ClientProxyConnectionError)

# Many HTTP proxies has buggy keepalive support. Let's not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asvetlov do you have exact references which proxies do not support keepalive correctly ?

Rejecting keepalive will enlarge latencies, and make problems with frequent requests due to TIME_WAIT and inability to reuse TCP ports.

@asvetlov
Copy link
Member Author

asvetlov commented Jan 5, 2019

I had a bad experience in my previous job.
We used paid proxy fleets from different companies to mask our crawler agents.
Sorry, cannot provide more info.

@webknjaz
Copy link
Member

webknjaz commented Jan 5, 2019

Why not have a switch for it, for users who want to risk it?

@socketpair
Copy link
Contributor

I vote for the switch, with default which ENABLES keep-alive.

@asvetlov
Copy link
Member Author

asvetlov commented Jan 5, 2019

PR is welcome!

@lock
Copy link

lock bot commented Jan 5, 2020

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Jan 5, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Jan 5, 2020
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Jan 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bot:chronographer:provided There is a change note present in this PR outdated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants