-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use io.IOBase as base class for StreamingBody #879
Comments
Are there any methods in particular that you are looking for? It may be difficult to inherit from IOBase as not all of its methods may be possible to implement (given that we are reading from an unseekable stream from the server underneath). |
@kyleknap, IOBase can certainly work with streaming data. This is sufficient for my current use-case, but with a little more work I think it could be fully supported. My main interest was in line-by-line iteration: class StreamingBodyIO(RawIOBase):
"""Wrap a boto StreamingBody in the IOBase API."""
def __init__(self, body):
self.body = body
def readable(self):
return True
def read(self, n=-1):
n = None if n < 0 else n
return self.body.read(n) |
...actually, is there any point to keeping |
When returning a file-like object, such as StreamingBody, io.IOBase is a very reasonable abstract to assume. From the io.IOBase docstring:
E.g., the missing io.IOBase attributes make it very dificult to lazily read a compressed file from S3: from gzip import GzipFile
import boto3
body_stream_gz = boto3.resource('s3').Object('my_bucket', 'my_object.txt.gz').get()['Body']
with GzipFile(fileobj=body_stream_gz) as body_stream:
content = body_stream.read('1000')
---------------------------------------------------------------------------
AttributeError: 'StreamingBody' object has no attribute 'tell' Here's an accepted cpython PR for the same issue: |
Another problem that this should resolve: text response bodies that one wishes to deal with as such, but still stream. |
@GFernie did this already get fixed? I was able to get the following code working: import json
import gzip
import boto3
from urllib.parse import urlparse
def parse_s3_uri(uri):
o = urlparse(uri)
return o.netloc, o.path.lstrip('/')
def stream_file(uri):
bucket, key = parse_s3_uri(uri)
fileobj = boto3.resource('s3').Object(bucket, key).get()['Body']
with gzip.GzipFile(fileobj=fileobj) as f:
yield from (json.loads(line.decode('utf-8')) for line in f)
if __name__ == "__main__":
for row in stream_file('s3://mybucket/my/path/key.json.gz'):
print(row) it lazily reads from a compressed json.gz file |
I am also very curious about @AlJohri's question! Is this actually fixed now? |
Would be keen to get it fixed as well. |
Especially since there has been a pull request. |
We've taken a couple stabs at this but both attempts have failed (#2208, #2150) once we do more extensive integration testing across our packages (botocore, boto3, aws-cli, s3transfer). This class currently defines a This behavior seems to be avoidable on Python 3 by stubbing out the implementation of |
@joguSD Could you perhaps implement the finalization behavior yourself, and lie about inheriting from |
@glyph There are likely some more intricate approaches we can take to circumvent the finalization behavior in Python 2. Our initial research suggested it was going to be a fairly deep dive to get this working nicely across all of our packages. Given we're ending Python 2 support in the next 5 months, I'd say it's more likely we'll implement the Python 3 |
@nateprewitt The approach we took for this sort of thing in Twisted when we had areas where 2 was holding us back was we'd implement the 3-only version of the thing but not expose it for py2, so py3 users could start taking advantage. It wouldn't be a regression for py2 users if you did it today, after all. |
In April I submitted a PR for a short-term fix for this. The PR has not been merged, or closed, or commented on. Can anyone here tell me the status of this repo? I see fresh commits, so it's actively maintained. |
Closing this as the PR linked above was merged. |
|
StreamingBody is a "file-like object", but file-like objects should probably inherit from IOBase or implement a similar suite of methods. This will provide an easy fix for #767.
The text was updated successfully, but these errors were encountered: