Add line iterator to StreamingBody #1195

sujaymansingh · 2017-05-15T16:52:47Z

This way, we can readlines from a streaming body (without having to load
all bytes into memory!)

This will allow us to use StreamingBody in places where a Python
file-like object is expected.
(E.g. csv.reader!)

This way, we can readlines from a streaming body (without having to load all bytes into memory!) This will allow us to use StreamingBody in places where a Python file-like object is expected. (E.g. csv.reader!)

codecov-io · 2017-05-15T17:01:42Z

Codecov Report

Merging #1195 into develop will increase coverage by <.01%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           develop    #1195      +/-   ##
===========================================
+ Coverage    98.01%   98.02%   +<.01%     
===========================================
  Files           45       45              
  Lines         7306     7328      +22     
===========================================
+ Hits          7161     7183      +22     
  Misses         145      145

Impacted Files	Coverage Δ
botocore/response.py	`92.06% <100%> (+4.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9cca4a...6866ab7. Read the comment docs.

This was not dealing well when the chunk ended on a newline

There is no need for a stop iteration in a generator, we just don't yield anything more!

In particular, catch the case where a chunk ends on a line boundary. We can do this by ensuring that no matter what the chunk size, we still end up with the same lines.

dstufft · 2017-05-24T15:28:43Z

botocore/response.py

+        default_chunk_size = 1024
+        return self.iter_lines(default_chunk_size)
+
+    def iter_lines(self, chunk_size):


I think I would prefer to have this as a private method, normal file objects do not have an iter_lines method.

Thanks @dstufft. I can change it to _iter_lines. Is that ok with you?

Yea that seems fine to me.

@dstufft, I've made the change in @6866ab7
Hope it helps!

dstufft · 2017-05-24T15:28:58Z

Overall this looks good to me, just one small comment.

mbeacom · 2017-06-15T20:25:05Z

Is this mergeable with the requested changes in? I think it'd be a great addition until the IOBase/RawIOBase possibilities are explored and tested further.

mcrowson · 2017-07-31T14:46:32Z

Looks like the comment was addressed? There is a group of us really excited for this PR 💯

mattrasband · 2017-09-07T16:45:39Z

@dstufft Would it be best to use a bytearray here instead of creating a bunch of new bytes objects? For larger files (as common in s3) it seems like this could cause a bit of resource thrashing.

edit: From http://ze.phyr.us/bytearray/ here is a timeit and writeup on the mutable vs immutable sequences used for buffers:

In [1]: %%timeit x = b''
x += b'x'
   ...:
100000 loops, best of 3: 3.02 µs per loop

In [2]: %%timeit x = bytearray()
x.extend(b'x')
   ...:
10000000 loops, best of 3: 152 ns per loop

sujaymansingh · 2017-10-27T09:31:02Z

@mrasband @dstufft Is the bytearray performance the only concern?

mattrasband · 2018-01-16T16:08:38Z

@sujaymansingh I don't have any authority to approve/merge - I stumbled on the issue as I wrote my own workaround a while ago and wanted to contribute it back (you already had). Eagerly awaiting this myself.

ranman · 2018-02-28T05:56:19Z

This would be super cool to have for all streaming response types

sburns · 2018-05-01T20:52:39Z

What will it take to get this merged?

W-Ely · 2018-05-04T23:17:37Z

Do it!

ranman · 2018-05-04T23:43:57Z

I would still love to see this!

iolairus · 2018-06-05T09:47:02Z

Looking forward to this one, +1

seddy · 2018-06-05T10:48:59Z

Any chance on getting this merged? It would be really useful for us!

smferguson · 2018-06-14T14:55:00Z

+1. it'd be great to have this in!

billcrook · 2018-06-14T15:00:42Z

+1

sujaymansingh · 2018-06-18T12:35:47Z

@dstufft @mbeacom What should we do about this? At this point it's been over a year since this PR was raised. Presumably there have been billions of changes upstream.

I could rebase or branch off again, but it's only worth doing that if

there is still a need/desire for this functionality from people
it's something that is likely to be merged (it'd be terrible to have to wait another year :))

Would you say it's worth doing so? If so, I can create a new branch and submit.
Otherwise, it's probably worth abandoning this PR.

sburns · 2018-06-18T15:01:18Z

there is still a need/desire for this functionality from people

I hope the continued +1 comments are evidence of this.

sujaymansingh · 2018-06-18T16:03:45Z

Good point @sburns !

jamesls · 2018-06-25T18:45:20Z

Hi everyone sorry for the delay on this. I'm taking a look at this now. I'll assign this to myself and make sure this gets merged.

jamesls · 2018-06-25T18:49:25Z

botocore/response.py

+        default_chunk_size = 1024
+        return self._iter_lines(default_chunk_size)
+
+    def _iter_lines(self, chunk_size):


What are people's thoughts on exposing this method directly as iter_lines() instead of having the default behavior of __iter__ be line based?

This would be similar to what requests exposes via it's iter_lines method. Looking at what they provide for __iter__, it appears to be chunks of 128 bytes. That's closer to how I'd expect a streaming response object to behave, though I'd bump up the chunks to maybe 1k or 4k.

+1 to the default iterator being larger chunks (4k but that's just me) and but also being able to get lines explicitly via iter_lines().

jamesls · 2018-06-25T19:34:08Z

Given the age of the PR and that this was against your develop branch, I was hesitant to push to your fork so I pulled in your PR and incorporated some of the feedback I had. PR here: #1491

sujaymansingh · 2018-06-26T15:39:07Z

That's a great idea @jamesls! Thanks 👍

jamesls · 2018-06-27T00:23:29Z

Merged via #1491

sujaymansingh · 2018-07-02T20:27:24Z

Thanks @jamesls

Add line iterator to StreamingBody

ec9f7bd

This way, we can readlines from a streaming body (without having to load all bytes into memory!) This will allow us to use StreamingBody in places where a Python file-like object is expected. (E.g. csv.reader!)

sujaymansingh added 3 commits May 16, 2017 11:21

Fix bug when splitting chunks into lines!

1ba292d

This was not dealing well when the chunk ended on a newline

Remove StopIteration from generator

7f84961

There is no need for a stop iteration in a generator, we just don't yield anything more!

Increase code coverage

b3ff465

In particular, catch the case where a chunk ends on a line boundary. We can do this by ensuring that no matter what the chunk size, we still end up with the same lines.

dstufft reviewed May 24, 2017

View reviewed changes

dstufft added the incorporating-feedback label May 24, 2017

atrigent mentioned this pull request May 29, 2017

Use io.IOBase as base class for StreamingBody #879

Closed

Rename to _iter_lines to highlight it should be private

6866ab7

mcrowson mentioned this pull request Jun 15, 2017

Zipped and unzipped deployment adding upto more than 500Mb and failing deployment Miserlou/Zappa#881

Closed

uiur mentioned this pull request Aug 11, 2017

still doesn't work with boto3 StreamingBody pandas-dev/pandas#17135

Closed

stealthycoin added enhancement This issue requests an improvement to a current feature. medium labels Sep 14, 2017

mbeacom approved these changes May 2, 2018

View reviewed changes

jamesls self-assigned this Jun 25, 2018

jamesls reviewed Jun 25, 2018

View reviewed changes

jamesls mentioned this pull request Jun 25, 2018

Add iter_chunks()/iter_lines() to StreamingBody #1491

Merged

jamesls merged commit 6866ab7 into boto:develop Jun 27, 2018

renovate bot mentioned this pull request Jun 27, 2018

Update dependency boto3 to ==1.7.47 khornberg/elasticpypi#223

Merged

This was referenced Jul 4, 2018

Update dependency boto3 to ==1.7.50 kitsuyui/bamboo-crawler#33

Merged

Update dependency boto3 to v1.7.57 mrlynn/ansible-opsmanager-demo#30

Merged

Update dependency botocore to v1.10.57 mrlynn/ansible-opsmanager-demo#31

Merged

renovate bot mentioned this pull request Jul 12, 2018

Update dependency boto3 to v1.7.57 - autoclosed mrlynn/ansible-opsmanager-demo#36

Merged

renovate bot mentioned this pull request Jul 24, 2018

Update dependency botocore to v1.10.62 mozilla/frost#128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add line iterator to StreamingBody #1195

Add line iterator to StreamingBody #1195

sujaymansingh commented May 15, 2017

codecov-io commented May 15, 2017 •

edited

Loading

dstufft May 24, 2017

sujaymansingh Jun 2, 2017 •

edited

Loading

dstufft Jun 2, 2017

sujaymansingh Jun 13, 2017

dstufft commented May 24, 2017

mbeacom commented Jun 15, 2017

mcrowson commented Jul 31, 2017

mattrasband commented Sep 7, 2017 •

edited

Loading

sujaymansingh commented Oct 27, 2017

mattrasband commented Jan 16, 2018

ranman commented Feb 28, 2018

sburns commented May 1, 2018

W-Ely commented May 4, 2018

ranman commented May 4, 2018

iolairus commented Jun 5, 2018

seddy commented Jun 5, 2018

smferguson commented Jun 14, 2018

billcrook commented Jun 14, 2018

sujaymansingh commented Jun 18, 2018

sburns commented Jun 18, 2018

sujaymansingh commented Jun 18, 2018

jamesls commented Jun 25, 2018

jamesls Jun 25, 2018

sburns Jun 26, 2018

jamesls commented Jun 25, 2018

sujaymansingh commented Jun 26, 2018

jamesls commented Jun 27, 2018

sujaymansingh commented Jul 2, 2018

Add line iterator to StreamingBody #1195

Add line iterator to StreamingBody #1195

Conversation

sujaymansingh commented May 15, 2017

codecov-io commented May 15, 2017 • edited Loading

Codecov Report

dstufft May 24, 2017

Choose a reason for hiding this comment

sujaymansingh Jun 2, 2017 • edited Loading

Choose a reason for hiding this comment

dstufft Jun 2, 2017

Choose a reason for hiding this comment

sujaymansingh Jun 13, 2017

Choose a reason for hiding this comment

dstufft commented May 24, 2017

mbeacom commented Jun 15, 2017

mcrowson commented Jul 31, 2017

mattrasband commented Sep 7, 2017 • edited Loading

sujaymansingh commented Oct 27, 2017

mattrasband commented Jan 16, 2018

ranman commented Feb 28, 2018

sburns commented May 1, 2018

W-Ely commented May 4, 2018

ranman commented May 4, 2018

iolairus commented Jun 5, 2018

seddy commented Jun 5, 2018

smferguson commented Jun 14, 2018

billcrook commented Jun 14, 2018

sujaymansingh commented Jun 18, 2018

sburns commented Jun 18, 2018

sujaymansingh commented Jun 18, 2018

jamesls commented Jun 25, 2018

jamesls Jun 25, 2018

Choose a reason for hiding this comment

sburns Jun 26, 2018

Choose a reason for hiding this comment

jamesls commented Jun 25, 2018

sujaymansingh commented Jun 26, 2018

jamesls commented Jun 27, 2018

sujaymansingh commented Jul 2, 2018

codecov-io commented May 15, 2017 •

edited

Loading

sujaymansingh Jun 2, 2017 •

edited

Loading

mattrasband commented Sep 7, 2017 •

edited

Loading