still doesn't work with boto3 StreamingBody #17135

uiur · 2017-08-01T01:16:10Z

Example

s3_object = client.get_object(Bucket=bucket, Key=key)
result = read_csv(s3_object["Body"])
# ValueError: Invalid file path or buffer object type: <class 'botocore.response.StreamingBody'>

Problem description

issue: is_file_like requirements are too strict for boto3 S3 objects #16135
pull request: #16150

This issue was closed but it's not working. I've found that the test is skipped and has a bug.
I'll send a pull request to reproduce the behavior.

I think it's because botocore.response.StreamingBody doesn't have __iter__ so is_file_like returns False.

INSTALLED VERSIONS

commit: c8dcf19
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL:
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.21.0.dev+317.gc8dcf19
pytest: 3.1.3
pip: 7.1.2
setuptools: 36.2.6
Cython: 0.26
numpy: 1.13.1
scipy: None
pyarrow: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: 0.1.2
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

uiur · 2017-08-01T01:23:24Z

cc @gfyoung

gfyoung · 2017-08-01T06:25:31Z

@uiureo : Thanks for the report! From our conversation back in that issue, we got the impression that the PR would address this issue. Network tests are more difficult to reproduce because they are out of our control, but that's a little unfortunate that the purported patch failed to address this.

gfyoung · 2017-08-01T06:28:49Z

One issue currently is that we are facing this delicate balancing act between accepting as many "valid" IO objects as possible without allowing for invalid ones. We check for __iter__ because of #15337, as that is a robust method of rejecting such objects.

For your patch to be robust, you will need to find a way that rejects mock.Mock() while allowing for StreamingBody. That would be fantastic if you could!

uiur · 2017-08-11T04:50:48Z

This issue would be resolved if this PR boto/botocore#1195 is merged at botocore.

gfyoung · 2017-08-11T04:52:16Z

@uiureo : Thanks for letting us know! That would definitely be one way to resolve this issue.

stonefly · 2017-09-19T10:45:00Z

It's been widely accepted that as long as an object has read or write function it can be seen as a file-like object.

Much more importantly, this is the behavior before pandas 0.20.*.

A lot of codes relying on this assumption now stop working.

I'm wondering what is the point to re-define what is file-like here? Is the iter actually used in the implementation of read_csv for example?

klintan · 2018-02-22T23:15:13Z

This is almost 5 month old still open, however it seems neither 0.21.0 or 0.22.0 lets you use the boto3 . Had to go back all the way to 0.19.2 for it to work.

Is it a design decision to not include it at this point or will we see this capability in future versions ?

jreback · 2018-02-22T23:24:40Z

we let’s see we have 2200 issues and only a small number of volunteers - how shall things be prioritized?

submitting a PR would be the fastest way here

kokes · 2018-08-27T11:30:08Z

I can now confirm that this is no longer an issue with a recent boto3 (the abovementioned PR has been merged). Since boto3 is not a requirement for pandas, there is nothing to done here, is there?

In [16]: obj=client.get_object(Bucket='my-bucket', Key='my/data/foo.csv')

In [17]: obj['Body']
Out[17]: <botocore.response.StreamingBody at 0x7f26ac400d30>

In [18]: df=pandas.read_csv(obj['Body'])

In [19]: len(df)
Out[19]: 7865

gfyoung · 2018-08-27T12:05:29Z

@kokes : That's great to hear! Might you be able to construct a unit test for us?

kokes · 2018-08-27T13:09:13Z

@gfyoung I tried - it's my first code PR, so I'm not sure about things like adding dev dependencies (should I edit all the travis/circleci metadata as well?). Also, not sure if I can just have a test without asserts, since all I'm testing is that it doesn't fail, so any exception should break the test.

Running this with botocore older than 1.10.47 will result in an error, as expected.

        if not is_file_like(filepath_or_buffer):
            msg = "Invalid file path or buffer object type: {_type}"
>           raise ValueError(msg.format(_type=type(filepath_or_buffer)))
E           ValueError: Invalid file path or buffer object type: <class 'botocore.response.StreamingBody'>

pandas/io/common.py:232: ValueError
================================================== 1 failed, 1 passed in 0.36 seconds ===================================================
root@114ec59e9d98:/pandas#

gfyoung · 2018-08-27T18:34:00Z

@kokes : I would suggest writing the test first and don't worry about the dependencies part yet. We can discuss that once you have written the test.

How does that sound?

kokes · 2018-08-27T18:37:10Z

@gfyoung we’ve already iterated once with Tom, see #22520

mroeschke · 2021-05-26T05:03:52Z

Appears this was closed by #22520

uiur mentioned this issue Aug 1, 2017

Fix test for boto3 #17136

Closed

4 tasks

gfyoung added IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label Regression Functionality that used to work in a prior pandas version and removed IO CSV read_csv, to_csv labels Aug 1, 2017

jorisvandenbossche mentioned this issue Sep 19, 2017

why is_file_like needs to see __iter__ #17591

Closed

jorisvandenbossche added this to the 0.21.0 milestone Sep 19, 2017

jreback modified the milestones: 0.21.0, Next Major Release Sep 23, 2017

kokes mentioned this issue Aug 27, 2018

TST: Streaming of S3 files #22520

Merged

jreback modified the milestones: Contributions Welcome, 0.24.0 Aug 28, 2018

jreback modified the milestones: 0.24.0, Contributions Welcome Nov 6, 2018

mroeschke added good first issue and removed IO Data IO issues that don't fit into a more specific label Regression Functionality that used to work in a prior pandas version labels Oct 16, 2019

mroeschke added the Needs Tests Unit test(s) needed to prevent regressions label Oct 16, 2019

mroeschke closed this as completed May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

still doesn't work with boto3 StreamingBody #17135

still doesn't work with boto3 StreamingBody #17135

uiur commented Aug 1, 2017 •

edited by gfyoung

Loading

INSTALLED VERSIONS

uiur commented Aug 1, 2017

gfyoung commented Aug 1, 2017

gfyoung commented Aug 1, 2017

uiur commented Aug 11, 2017

gfyoung commented Aug 11, 2017

stonefly commented Sep 19, 2017 •

edited

Loading

klintan commented Feb 22, 2018

jreback commented Feb 22, 2018

kokes commented Aug 27, 2018

gfyoung commented Aug 27, 2018 •

edited

Loading

kokes commented Aug 27, 2018

gfyoung commented Aug 27, 2018

kokes commented Aug 27, 2018

mroeschke commented May 26, 2021

still doesn't work with boto3 StreamingBody #17135

still doesn't work with boto3 StreamingBody #17135

Comments

uiur commented Aug 1, 2017 • edited by gfyoung Loading

Example

Problem description

INSTALLED VERSIONS

uiur commented Aug 1, 2017

gfyoung commented Aug 1, 2017

gfyoung commented Aug 1, 2017

uiur commented Aug 11, 2017

gfyoung commented Aug 11, 2017

stonefly commented Sep 19, 2017 • edited Loading

klintan commented Feb 22, 2018

jreback commented Feb 22, 2018

kokes commented Aug 27, 2018

gfyoung commented Aug 27, 2018 • edited Loading

kokes commented Aug 27, 2018

gfyoung commented Aug 27, 2018

kokes commented Aug 27, 2018

mroeschke commented May 26, 2021

uiur commented Aug 1, 2017 •

edited by gfyoung

Loading

stonefly commented Sep 19, 2017 •

edited

Loading

gfyoung commented Aug 27, 2018 •

edited

Loading