-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMPAT: reading json with lines=True from s3, xref #17200 #17201
Conversation
@alph486 : Thanks for the PR! You will need to write tests for us to merge this. I would also take a look at what @TomAugspurger suggested in terms of patching this. |
Is there a particular way you guys go about handling s3 in your tests? I see some files in a bucket somewhere, but idk if there are any |
This reverts commit 8255cd0.
Hello @alph486! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on November 27, 2017 at 00:04 Hours UTC |
@gfyoung After going back to clean this up I realized jsonl from URL based file also suffers from the same issue. I have a cleaner fix that I think works in both cases. Can you let me know how I should proceed for getting a JSONL file uploaded to the public s3 bucket for testing this in the test suite? I don't have write access. |
@TomAugspurger @jreback : thoughts? |
if you put the file on a gist somewhere @TomAugspurger or I can upload to the bucket. |
@jreback @TomAugspurger Here's a file that I use in my tests. If you can give me an s3 path where you drop it I'll push the tests. https://gist.github.com/alph486/c7eff9a896d80dccae5a3b3283e75a4e#file-items-jsonl |
bump? |
@alph486 : Are you still waiting on the updated URL to patch the failing tests? |
@gfyoung yeah, i just need a target s3 url to put in my code |
ping @jreback @TomAugspurger |
I don't think I have access to the pandas s3 bucket.
…On Fri, Aug 18, 2017 at 1:59 PM, gfyoung ***@***.***> wrote:
ping @jreback <https://github.com/jreback> @TomAugspurger
<https://github.com/tomaugspurger>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17201 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIvlpJ-cAwFzSmyAFeY270PMEynkhks5sZd78gaJpZM4OxISG>
.
|
@jreback ? |
yeah I don't think I have this either. @TomAugspurger IIRC you can login using the Continuum dashboard. |
if that works we should prob change it :> |
Hmm I'm not seeing it in the environment I have access to. I'll ask IT.
Probably best to have it under an account we manage anyways.
…On Mon, Aug 21, 2017 at 6:56 PM, Jeff Reback ***@***.***> wrote:
if that works we should prob change it :>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17201 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIjccf2YRr9hMExOy0a3VnV3HTBXBks5sahlCgaJpZM4OxISG>
.
|
sorry to keep nudging here, but any progress from IT @TomAugspurger ? Id love to try to a patched official version out so we can use our feature in production. |
No word yet. Really, we should be using https://github.com/spulec/moto to mock all our S3 calls. Now that s3fs (and boto3) are handling all the network stuff, we don't need to be testing it. Just our text parsing. I'll open an issue about that (and you can look at fixing it if you're interested :) |
I can try to implement my tests using moto... we'll see if that works
Thanks,
Kevin
…On Aug 24, 2017, 6:04 AM -0500, Tom Augspurger ***@***.***>, wrote:
No word yet.
Really, we should be using https://github.com/spulec/moto to mock all our S3 calls. Now that s3fs (and boto3) are handling all the network stuff, we don't need to be testing it. Just our text parsing. I'll open an issue about that (and you can look at fixing it if you're interested :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks, I'd be totally fine with just using moto for your test for now, and following up moving most of the other S3 tests to use moto. You'll need to edit one of the ci files to include moto. Probably |
Ah, thanks for that. I hope to get to trying that locally today, I'll update here when I know.
Thanks,
Kevin
…On Aug 24, 2017, 8:28 AM -0500, Tom Augspurger ***@***.***>, wrote:
Thanks, I'd be totally fine with just using moto for your test for now, and following up moving most of the other S3 tests to use moto.
You'll need to edit one of the ci files to include moto. Probably ci/requirements-3.6.run
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Any guidance on which requirements files I should change for both release packaging, development, and CI? (I currently have |
you shouldn't add |
Unless I'm misunderstanding: you cannot run any of the test suites that require Edit: For example from my attempt of running
|
@alph486 close! Still a few style things in https://travis-ci.org/pandas-dev/pandas/jobs/279576239#L1938 FYI if you run |
@TomAugspurger Yeah, but part of those unused imports are a backwards way of me leveraging the |
@alph486 sorry I missed your comment earlier. I'll push a commit in a minute that refactors the s3_resource stuff. |
Ok thanks, sorry about it taking so long. I'm not super confident in here. |
OK pushed two commits. One gets you back up to date with master, and 6979fb8 is the actual changes. I added a If you |
@jreback are you okay with the changes made per your review? |
End in newline
End in newline
Should be all good. ping on green @alph486 |
ci/requirements-3.5.pip
Outdated
@@ -1,2 +1,3 @@ | |||
xarray==0.9.1 | |||
pandas-gbq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needds to be reverted
ci/requirements_dev.txt
Outdated
@@ -5,4 +5,4 @@ cython | |||
pytest>=3.1.0 | |||
pytest-cov | |||
flake8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this, s3fs is NOT a requirement for dev; we should be robust to not having this installed
ci/requirements_all.txt
Outdated
@@ -26,3 +26,4 @@ sqlalchemy | |||
bottleneck | |||
pymysql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok
pandas/io/json/json.py
Outdated
""" | ||
If PY3 and/or isinstance(json, bytes) | ||
""" | ||
if isinstance(json, bytes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 line comment
@@ -0,0 +1,2 @@ | |||
{"a": 1, "b": 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, ok you have to have this named .json
otherwise it won't be picked up by setup.py
(IOW the install test will fail).
@pytest.fixture(scope='module') | ||
def tips_file(): | ||
return os.path.join(tm.get_data_path(), 'tips.csv') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
@@ -0,0 +1,2 @@ | |||
{"a": 1, "b": 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, ok you have to have this named .json
otherwise it won't be picked up by setup.py
(IOW the install test will fail).
can you rebase / update |
thanks @alph486 |
…andas-dev#17201) (cherry picked from commit 4fd104a)
Attempt to decode the bytes array with
encoding
passed to the call.