Expose the response's content buffering limit for lines longer than 128KB #4456

nolar · 2019-12-19T18:17:17Z

What do these changes do?

Solve the problem with "Line is too long" for JSON-lines streaming responses (Kubernetes API and similar), as described in #4453.

Are there changes in behavior for the user?

An extra optional kwarg is exposed for the requests, which affects the response's content buffering:

        response = await session.get('http://xyz/path', limit=1*1024*1024)

The value of 0 means no limit — which can affect the memory footprint:

        response = await session.get('http://xyz/path', limit=0)

The default is 2**16, as it was before, which leads to the buffer size limit of 128 KB (the high-watermark is 2x of the limit itself).

PR notes

I have doubts on the following topics — it would be nice to clarify them:

Cython & .pyx files — I have no experience with this, just changed like the surrounding code.
Should it be named limit= for the users? Or maybe response_buffer_limit=?
What's the purpose of the low-watermark, of the actual memory consumed got up to 2x of that limit, not as would be expected by the users?
Monkey-patching of DEFAULT_LIMIT (if it was used by anyone) will break now, as the imports are done by-value into multiple modules. I usually use the Google-style imports (from pkg import mod; mod.VAL), which work with any monkey-patching quite well. I didn't find such cases in aiohttp, so I'm not sure if this will match the code style.

The code is developed and manually tested against v3.6.2. The master-based version is just a rebase, in the assumption that it works. I could not test the master/v4.0.0a1 code, as it break aresponses completely, and the example uses aresponses to have a minimalistic web server locally.

The tests (make test) on v3.6.2 & v4.0.0a1 fail with ValueError: option names {'--aiohttp-fast'} already added — some help would be needed to make them run.

Related issue number

#4453

Checklist

I think the code is well written
Unit tests for the changes exist
Documentation reflects the changes
If you provide code modification, please add yourself to CONTRIBUTORS.txt
- The format is <Name> <Surname>.
- Please keep alphabetical order, the file is sorted by names.
Add a new news fragment into the CHANGES folder
- name it <issue_id>.<type> for example (588.bugfix)
- if you don't have an issue_id change it to the pr id after creating the pr
- ensure type is one of the following:
  - .feature: Signifying a new feature.
  - .bugfix: Signifying a bug fix.
  - .doc: Signifying a documentation improvement.
  - .removal: Signifying a deprecation or removal of public API.
  - .misc: A ticket has been closed, but it is not of interest to users.
- Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

codecov-io · 2019-12-19T18:23:13Z

Codecov Report

Merging #4456 into master will increase coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4456      +/-   ##
==========================================
+ Coverage   97.48%   97.51%   +0.03%     
==========================================
  Files          43       43              
  Lines        8865     8866       +1     
  Branches     1390     1390              
==========================================
+ Hits         8642     8646       +4     
+ Misses         98       96       -2     
+ Partials      125      124       -1

Impacted Files	Coverage Δ
aiohttp/streams.py	`97.4% <100%> (ø)`	⬆️
aiohttp/client_proto.py	`96.66% <100%> (ø)`	⬆️
aiohttp/http_parser.py	`96.88% <100%> (ø)`	⬆️
aiohttp/pytest_plugin.py	`97.27% <0%> (+2.04%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e623222...362cffa. Read the comment docs.

webknjaz · 2019-12-30T17:57:00Z

Thanks for the PR! Let's see what Andrew thinks.

asvetlov · 2020-01-24T17:05:42Z

I think that for exceptional cases like yours you can always read the whole text and split it by lines explicitly if needed.
Passing the line size limit through the whole API is cumbersome, it is done for very little effect and affects too few people.

nolar · 2020-01-24T17:48:44Z

That's the point: I cannot read the whole text. This is a stream of JSON lines, one line per event as soon as these events happen. The stream never ends and the connection never disconnects (in theory; in practice, there are timeouts and unstable networks).

So, I never know in advance how big is the response to read, and I should handle the incoming lines immediately, while the connection can remain alive but contentless even hours after that.

Currently, I have implemented it with my own line reader (https://github.com/zalando-incubator/kopf/pull/276/files), which is fine for me, but it looks like a job of the HTTP client library: the use-case of streaming JSON lines is not so rare, I presume. It is, however, for you to decide on should it be part of the library or not.

asvetlov · 2020-01-24T18:02:02Z

Your solution is fine, it is only 20 lines of simple code.
Adding the feature to aiohttp makes the source much more complex and harder to understand, at least with the current design of streams.

nolar · 2020-01-31T12:04:38Z

Okay. Then, it makes sense to close this PR. Thank you for your attention.

Expose the buffer limit for long lines of the streaming responses

362cffa

nolar mentioned this pull request Dec 19, 2019

"Line is too long" for JSON-lines from Kubernetes API #4453

Closed

webknjaz requested a review from asvetlov December 30, 2019 17:52

webknjaz added client enhancement labels Dec 30, 2019

webknjaz assigned asvetlov and nolar Dec 30, 2019

nolar closed this Jan 31, 2020

nolar deleted the 4453-line-is-too-long branch January 31, 2020 12:05

spacemanspiff2007 mentioned this pull request Oct 16, 2020

Make DEFAULT_LIMIT monkey patchable #4696

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose the response's content buffering limit for lines longer than 128KB #4456

Expose the response's content buffering limit for lines longer than 128KB #4456

nolar commented Dec 19, 2019 •

edited by webknjaz

Loading

codecov-io commented Dec 19, 2019 •

edited

Loading

webknjaz commented Dec 30, 2019

asvetlov commented Jan 24, 2020

nolar commented Jan 24, 2020

asvetlov commented Jan 24, 2020

nolar commented Jan 31, 2020 •

edited

Loading

Expose the response's content buffering limit for lines longer than 128KB #4456

Expose the response's content buffering limit for lines longer than 128KB #4456

Conversation

nolar commented Dec 19, 2019 • edited by webknjaz Loading

What do these changes do?

Are there changes in behavior for the user?

PR notes

Related issue number

Checklist

codecov-io commented Dec 19, 2019 • edited Loading

Codecov Report

webknjaz commented Dec 30, 2019

asvetlov commented Jan 24, 2020

nolar commented Jan 24, 2020

asvetlov commented Jan 24, 2020

nolar commented Jan 31, 2020 • edited Loading

nolar commented Dec 19, 2019 •

edited by webknjaz

Loading

codecov-io commented Dec 19, 2019 •

edited

Loading

nolar commented Jan 31, 2020 •

edited

Loading