-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add line iterator to StreamingBody #1195
Conversation
This way, we can readlines from a streaming body (without having to load all bytes into memory!) This will allow us to use StreamingBody in places where a Python file-like object is expected. (E.g. csv.reader!)
Codecov Report
@@ Coverage Diff @@
## develop #1195 +/- ##
===========================================
+ Coverage 98.01% 98.02% +<.01%
===========================================
Files 45 45
Lines 7306 7328 +22
===========================================
+ Hits 7161 7183 +22
Misses 145 145
Continue to review full report at Codecov.
|
This was not dealing well when the chunk ended on a newline
There is no need for a stop iteration in a generator, we just don't yield anything more!
In particular, catch the case where a chunk ends on a line boundary. We can do this by ensuring that no matter what the chunk size, we still end up with the same lines.
botocore/response.py
Outdated
default_chunk_size = 1024 | ||
return self.iter_lines(default_chunk_size) | ||
|
||
def iter_lines(self, chunk_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer to have this as a private method, normal file objects do not have an iter_lines
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dstufft. I can change it to _iter_lines
. Is that ok with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that seems fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dstufft, I've made the change in @6866ab7
Hope it helps!
Overall this looks good to me, just one small comment. |
Is this mergeable with the requested changes in? I think it'd be a great addition until the |
Looks like the comment was addressed? There is a group of us really excited for this PR 💯 |
@dstufft Would it be best to use a edit: From http://ze.phyr.us/bytearray/ here is a timeit and writeup on the mutable vs immutable sequences used for buffers:
|
@mrasband @dstufft Is the |
@sujaymansingh I don't have any authority to approve/merge - I stumbled on the issue as I wrote my own workaround a while ago and wanted to contribute it back (you already had). Eagerly awaiting this myself. |
This would be super cool to have for all streaming response types |
What will it take to get this merged? |
Do it! |
I would still love to see this! |
Looking forward to this one, +1 |
Any chance on getting this merged? It would be really useful for us! |
+1. it'd be great to have this in! |
+1 |
@dstufft @mbeacom What should we do about this? At this point it's been over a year since this PR was raised. Presumably there have been billions of changes upstream. I could rebase or branch off again, but it's only worth doing that if
Would you say it's worth doing so? If so, I can create a new branch and submit. |
I hope the continued +1 comments are evidence of this. |
Good point @sburns ! |
Hi everyone sorry for the delay on this. I'm taking a look at this now. I'll assign this to myself and make sure this gets merged. |
default_chunk_size = 1024 | ||
return self._iter_lines(default_chunk_size) | ||
|
||
def _iter_lines(self, chunk_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are people's thoughts on exposing this method directly as iter_lines()
instead of having the default behavior of __iter__
be line based?
This would be similar to what requests exposes via it's iter_lines method. Looking at what they provide for __iter__
, it appears to be chunks of 128 bytes. That's closer to how I'd expect a streaming response object to behave, though I'd bump up the chunks to maybe 1k or 4k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the default iterator being larger chunks (4k but that's just me) and but also being able to get lines explicitly via iter_lines()
.
Given the age of the PR and that this was against your |
That's a great idea @jamesls! Thanks 👍 |
Merged via #1491 |
Thanks @jamesls |
This way, we can readlines from a streaming body (without having to load
all bytes into memory!)
This will allow us to use StreamingBody in places where a Python
file-like object is expected.
(E.g. csv.reader!)