Allow setting `chunk_size` for `Response.iter_bytes()` etc... #394

sethmlarson · 2019-09-26T13:57:40Z

Requests allowed setting chunk_size within .iter_content() which is currently not an option for our alternatives .stream() and .stream_text().

For .stream_text() we should go the extra step and fix the issue that users sometimes run into when using this feature and use chunk-size for measuring the decoded text, not the raw bytes.

The text was updated successfully, but these errors were encountered:

tomchristie · 2019-09-27T09:44:31Z

I seem to remeber this playing into some primitives around streaming bytes vs text that we never ended up digging into?

tomchristie · 2019-12-20T11:53:02Z

A good first pass onto this would be to change the decoder interface slightly, so that instead of eg. yeilding a byte chunk, they yield a list of byte chunks.

On the first refactoring pass, we don't need to actually change the internal implmentation much - the decoders can just always yield a list with a single item.

We'd then be able to add a chunk_size argument to the decoders, which would return 0, 1, or many properly-sized chunks on each yield.

florimondmanca · 2019-12-21T16:15:51Z

Updated the issue title to reflect the current Response.aiter_* API :-) (see #610).

b0g3r · 2020-03-12T17:21:53Z

How could I help with this issue?

florimondmanca · 2020-03-12T17:39:13Z

Hi @b0g3r! I think this is still something we’d like to have, and given discussions in python-gitlab/python-gitlab#1036 it seems like some folks would like to see it too. :)

Ways to move forward would be:

Propose an API for this, with context on the existing API on Requests
Investigate implementation details (ie how are we going to split chunks: buffering, other? Looking at how Requests/urllib3 do this can help)
Draft a PR :)

b0g3r · 2020-03-13T11:10:25Z

Do I understand correctly that we will need to forward chunk_size here?

httpx/httpx/_dispatch/urllib3.py

Lines 112 to 115 in a82adcc

    
           def response_bytes() -> typing.Iterator[bytes]: 
        
               with as_network_error(socket.error): 
        
                   for chunk in conn.stream(4096, decode_content=False): 
        
                       yield chunk

florimondmanca · 2020-03-13T12:20:29Z

@b0g3r Careful that we're in a sort of transition state w.r.t. urllib3 usage due to #804 (we'll soon use our own sync implementation, though keeping urllib3 as an option). Due to this I wouldn't advise relying on any existing urllib3 functionality — also because we'd want to provide chunk sizing on the async layer too, and it'd be odd to have a different implementation on both sides.

I think we want to look at controlling the chunk size directly from response.iter_bytes()/response.aiter_bytes(), instead…

tomchristie · 2020-03-13T16:38:45Z

@b0g3r So, as with comment #394 (comment) - the right place to start with this would be a pull request to https://github.com/encode/httpx/blob/master/httpx/_decoders.py that changes the interface of the decoders, so that they return a list of bytes rather than bytes.

(And correspondingly, changing the places where the response calls the decoder such as

httpx/httpx/_models.py

Line 915 in a82adcc

yield self.decoder.decode(chunk)

to deal with a list of bytes as a return result.)

I'd start with that as a foundational pull request, which will then make the remaining work much easier. (Adding chunk sizes to the decoder interface, and through to the response methods.)

b0g3r · 2020-03-13T16:59:38Z

(a)iter_raw(self, chunk_size=1)

chunk_size=1 because requests.Response.iter_content has it
instead of

for part in self._raw_stream:
    yield part

let's use bytestring as buffer

buffer = b""
for part in self._raw_stream:
    buffer += part
    while len(buffer) >= chunk_size:
        yield buffer[:chunk_size]
        buffer = buffer[chunk_size:]
if buffer:
    yield buffer

(a)iter_bytes(self, chunk_size=ITER_CHUNK_SIZE)

chunk_size=ITER_CHUNK_SIZE (512) because requests has it 🌚
calls (a)iter_raw

(a)iter_text(self, chunk_size=ITER_CHUNK_SIZE)

calls (a)iter_bytes

(a)iter_line(self, chunk_size=ITER_CHUNK_SIZE)

calls (a)iter_test
current code expects that each chunk containt full line(s), but it's not true (UPD: found code for splitting in LineDecoder)
requests has elegant solution

b0g3r · 2020-03-13T17:11:23Z

@tomchristie As I see (a)iter_raw doesn't use any decoder 🤔

piersoh · 2020-05-21T14:14:11Z

Would be good to have chunk_size=None option so that httpx can return chunks at the HTTP chunk boundaries as per the requests library - this is useful for apps that require timely delivery.

…ncode#394)

…es in whatever size the chunks are received (encode#394)

…ncode#394)

…es in whatever size the chunks are received (encode#394)

…ncode#394)

…es in whatever size the chunks are received (encode#394)

…ncode#394)

…es in whatever size the chunks are received (encode#394)

sethmlarson added good first issue Good for newcomers requests-compat Issues related to Requests backwards compatibility labels Sep 26, 2019

sethmlarson mentioned this issue Sep 26, 2019

Why switch from HTTPX to Requests? PendragonLore/anido#1

Closed

sethmlarson removed the good first issue Good for newcomers label Sep 29, 2019

florimondmanca changed the title ~~Allow setting chunk_size for Response.stream() and Response.stream_text()~~ Allow setting chunk_size for Response.aiter_*() Dec 21, 2019

vishes-shell mentioned this issue Mar 5, 2020

Async and sync compatible wrapper python-gitlab/python-gitlab#1036

Draft

tomchristie added this to the v1.1 milestone Jul 30, 2020

tomchristie modified the milestones: v1.1, v1.0 Aug 7, 2020

tomchristie changed the title ~~Allow setting chunk_size for Response.aiter_*()~~ Allow setting chunk_size for Response.iter_bytes() etc... Aug 7, 2020

tomchristie changed the title ~~Allow setting chunk_size for Response.iter_bytes() etc...~~ Allow setting chunk_size for Response.iter_bytes() etc... Aug 7, 2020

cdeler added a commit to cdeler/httpx that referenced this issue Sep 9, 2020

Added chunk_size to Response.iter_bytes() and Response.aiter_bytes() (e…

2c74758

…ncode#394)

cdeler added a commit to cdeler/httpx that referenced this issue Sep 9, 2020

Added chunk_size to Response.iter_bytes() and Response.aiter_bytes() (e…

e2f38ba

…ncode#394)

cdeler mentioned this issue Sep 9, 2020

Added chunk_size to Response.iter_bytes() and Response.aiter_bytes() (#394) #1271

Closed

cdeler added a commit to cdeler/httpx that referenced this issue Sep 9, 2020

Added chunk_size to Response.iter_bytes() and Response.aiter_bytes() (e…

3aab30b

…ncode#394)

cdeler added a commit to cdeler/httpx that referenced this issue Sep 9, 2020

When chunk_size=None, Response.(a)iter_bytes returns data as it arriv…

04f11f7

…es in whatever size the chunks are received (encode#394)

cdeler added a commit to cdeler/httpx that referenced this issue Sep 9, 2020

Added chunk_size to Response.iter_bytes() and Response.aiter_bytes() (e…

20934d6

…ncode#394)

cdeler added a commit to cdeler/httpx that referenced this issue Sep 9, 2020

When chunk_size=None, Response.(a)iter_bytes returns data as it arriv…

b63a642

…es in whatever size the chunks are received (encode#394)

cdeler added a commit to cdeler/httpx that referenced this issue Sep 9, 2020

Added chunk_size to Response.iter_bytes() and Response.aiter_bytes() (e…

62f62bc

…ncode#394)

cdeler added a commit to cdeler/httpx that referenced this issue Sep 9, 2020

When chunk_size=None, Response.(a)iter_bytes returns data as it arriv…

aceaa56

…es in whatever size the chunks are received (encode#394)

cdeler added a commit to cdeler/httpx that referenced this issue Sep 10, 2020

Added chunk_size to Response.iter_bytes() and Response.aiter_bytes() (e…

eb3ea9e

…ncode#394)

cdeler added a commit to cdeler/httpx that referenced this issue Sep 10, 2020

When chunk_size=None, Response.(a)iter_bytes returns data as it arriv…

60ea230

…es in whatever size the chunks are received (encode#394)

tomchristie mentioned this issue Sep 22, 2020

Version 0.15.0 #1301

Merged

4 tasks

florimondmanca mentioned this issue Nov 16, 2020

Support for chunk_size #1277

Merged

3 tasks

tomchristie closed this as completed in #1277 Nov 25, 2020

trishankatdatadog mentioned this issue Dec 26, 2020

Fix slow retrieval defense mechanism trishankatdatadog/tuf-on-a-plane#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow setting `chunk_size` for `Response.iter_bytes()` etc... #394

Allow setting `chunk_size` for `Response.iter_bytes()` etc... #394

sethmlarson commented Sep 26, 2019

tomchristie commented Sep 27, 2019

tomchristie commented Dec 20, 2019

florimondmanca commented Dec 21, 2019 •

edited

Loading

b0g3r commented Mar 12, 2020

florimondmanca commented Mar 12, 2020 •

edited

Loading

b0g3r commented Mar 13, 2020

florimondmanca commented Mar 13, 2020 •

edited

Loading

tomchristie commented Mar 13, 2020

b0g3r commented Mar 13, 2020 •

edited

Loading

b0g3r commented Mar 13, 2020

piersoh commented May 21, 2020

Allow setting chunk_size for Response.iter_bytes() etc... #394

Allow setting chunk_size for Response.iter_bytes() etc... #394

Comments

sethmlarson commented Sep 26, 2019

tomchristie commented Sep 27, 2019

tomchristie commented Dec 20, 2019

florimondmanca commented Dec 21, 2019 • edited Loading

b0g3r commented Mar 12, 2020

florimondmanca commented Mar 12, 2020 • edited Loading

b0g3r commented Mar 13, 2020

florimondmanca commented Mar 13, 2020 • edited Loading

tomchristie commented Mar 13, 2020

b0g3r commented Mar 13, 2020 • edited Loading

(a)iter_raw(self, chunk_size=1)

(a)iter_bytes(self, chunk_size=ITER_CHUNK_SIZE)

(a)iter_text(self, chunk_size=ITER_CHUNK_SIZE)

(a)iter_line(self, chunk_size=ITER_CHUNK_SIZE)

b0g3r commented Mar 13, 2020

piersoh commented May 21, 2020

Allow setting `chunk_size` for `Response.iter_bytes()` etc... #394

Allow setting `chunk_size` for `Response.iter_bytes()` etc... #394

florimondmanca commented Dec 21, 2019 •

edited

Loading

florimondmanca commented Mar 12, 2020 •

edited

Loading

florimondmanca commented Mar 13, 2020 •

edited

Loading

b0g3r commented Mar 13, 2020 •

edited

Loading