-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add tranparent compression to json reading/writing #17798
ENH: Add tranparent compression to json reading/writing #17798
Conversation
Hello @simongibbons! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on October 06, 2017 at 08:18 Hours UTC |
This works in the same way as the argument to ``read_csv`` and ``to_csv``. I've added tests confirming that it works with both file paths, as well and file URLs and S3 URLs.
015dd5d
to
3ed830c
Compare
Codecov Report
@@ Coverage Diff @@
## master #17798 +/- ##
==========================================
- Coverage 91.24% 91.24% -0.01%
==========================================
Files 163 163
Lines 49967 49967
==========================================
- Hits 45593 45590 -3
- Misses 4374 4377 +3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17798 +/- ##
==========================================
- Coverage 91.24% 91.23% -0.02%
==========================================
Files 163 163
Lines 49967 49971 +4
==========================================
- Hits 45593 45590 -3
- Misses 4374 4381 +7
Continue to review full report at Codecov.
|
COMPRESSION_TYPES = [None, 'bz2', 'gzip', 'xz'] | ||
|
||
|
||
def test_compress_gzip(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls parametrize all of this
see how this is done in other compression tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the pattern that was used in the tests of compression with pickle
.
Where there is a function which is used to decompress files by compression type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The zip tests will IMO always need to be special cases, as there isn't a writer we will always need to read from a fixture.
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -195,7 +195,7 @@ Other Enhancements | |||
- :func:`read_json` now accepts a ``chunksize`` parameter that can be used when ``lines=True``. If ``chunksize`` is passed, read_json now returns an iterator which reads in ``chunksize`` lines with each iteration. (:issue:`17048`) | |||
- :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names | |||
- Improved the import time of pandas by about 2.25x (:issue:`16764`) | |||
|
|||
- :func:`read_json` and :func:`to_json` now accept a ``compression`` argument which allows them to transparently handled compressed files. (:issue:`XXXXXXX`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
lgtm, thanks for the quick response! |
Let me know if you want me to squash this when it's ready to merge. |
@simongibbons no need to squash, its done automatically on merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
Does the ZIP file need to be added to MANIFEST.IN?
|
||
|
||
@pytest.mark.parametrize('compression', COMPRESSION_TYPES) | ||
def test_with_s3_url(compression): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I think it might be need to be added to https://travis-ci.org/pandas-dev/pandas/jobs/284093329 is our build test, which IS picking this up. |
Probably covered by |
Thanks @simongibbons! |
@TomAugspurger NO its NOT covered by that. See the failing tests. This NEEDS to be in setup.py |
you can change it to |
…7798) * ENH: Add tranparent compression to json reading/writing This works in the same way as the argument to ``read_csv`` and ``to_csv``. I've added tests confirming that it works with both file paths, as well and file URLs and S3 URLs. * Fix PEP8 violations * Add PR number to whatsnew entry * Remove problematic Windows test (The S3 test hits the same edge case) * Extract decompress file function so that pytest.paramatrize can be used cleanly * Fix typo in whatsnew entry
…7798) * ENH: Add tranparent compression to json reading/writing This works in the same way as the argument to ``read_csv`` and ``to_csv``. I've added tests confirming that it works with both file paths, as well and file URLs and S3 URLs. * Fix PEP8 violations * Add PR number to whatsnew entry * Remove problematic Windows test (The S3 test hits the same edge case) * Extract decompress file function so that pytest.paramatrize can be used cleanly * Fix typo in whatsnew entry
…7798) * ENH: Add tranparent compression to json reading/writing This works in the same way as the argument to ``read_csv`` and ``to_csv``. I've added tests confirming that it works with both file paths, as well and file URLs and S3 URLs. * Fix PEP8 violations * Add PR number to whatsnew entry * Remove problematic Windows test (The S3 test hits the same edge case) * Extract decompress file function so that pytest.paramatrize can be used cleanly * Fix typo in whatsnew entry
This works in the same way as the argument to
read_csv
andto_csv
.I've added tests confirming that it works with both file paths, and S3 URLs. (obviously there will be edge cases I've missed - please let me know if there are important ones that I should add coverage for).
The implementation is mostly plumbing, using the logic that was in place for the same functionality in
read_csv
.git diff upstream/master -u -- "*.py" | flake8 --diff