ENH: Add tranparent compression to json reading/writing #17798

simongibbons · 2017-10-05T20:07:05Z

This works in the same way as the argument to read_csvand to_csv.

I've added tests confirming that it works with both file paths, and S3 URLs. (obviously there will be edge cases I've missed - please let me know if there are important ones that I should add coverage for).

The implementation is mostly plumbing, using the logic that was in place for the same functionality in read_csv.

closes ENH: add gzip/bz2 compression to relevant read_* methods #15644
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2017-10-05T20:07:08Z

Hello @simongibbons! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on October 06, 2017 at 08:18 Hours UTC

This works in the same way as the argument to ``read_csv`` and ``to_csv``. I've added tests confirming that it works with both file paths, as well and file URLs and S3 URLs.

codecov · 2017-10-05T21:56:30Z

Codecov Report

Merging #17798 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17798      +/-   ##
==========================================
- Coverage   91.24%   91.24%   -0.01%     
==========================================
  Files         163      163              
  Lines       49967    49967              
==========================================
- Hits        45593    45590       -3     
- Misses       4374     4377       +3

Flag	Coverage Δ
#multiple	`89.04% <ø> (+0.01%)`	⬆️
#single	`40.24% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/json/json.py	`100% <ø> (ø)`	⬆️
pandas/core/generic.py	`92.03% <ø> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.74% <0%> (-0.1%)`	⬇️
pandas/core/indexes/datetimes.py	`95.48% <0%> (-0.1%)`	⬇️
pandas/io/common.py	`71.61% <0%> (+2.96%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22515f5...3ed830c. Read the comment docs.

codecov · 2017-10-05T21:56:44Z

Codecov Report

Merging #17798 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17798      +/-   ##
==========================================
- Coverage   91.24%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       49967    49971       +4     
==========================================
- Hits        45593    45590       -3     
- Misses       4374     4381       +7

Flag	Coverage Δ
#multiple	`89.03% <ø> (ø)`	⬆️
#single	`40.24% <ø> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`92.03% <ø> (ø)`	⬆️
pandas/io/json/json.py	`100% <ø> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.74% <0%> (-0.1%)`	⬇️
pandas/core/indexes/timedeltas.py	`91.19% <0%> (ø)`	⬆️
pandas/core/indexes/range.py	`92.83% <0%> (ø)`	⬆️
pandas/core/indexes/numeric.py	`97.18% <0%> (ø)`	⬆️
pandas/core/indexes/period.py	`92.78% <0%> (ø)`	⬆️
pandas/core/indexes/datetimes.py	`95.58% <0%> (ø)`	⬆️
pandas/core/indexes/multi.py	`96.39% <0%> (ø)`	⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22515f5...402fa11. Read the comment docs.

jreback · 2017-10-06T01:44:55Z

pandas/tests/io/json/test_compression.py

+COMPRESSION_TYPES = [None, 'bz2', 'gzip', 'xz']
+
+
+def test_compress_gzip():


pls parametrize all of this

see how this is done in other compression tests

I used the pattern that was used in the tests of compression with pickle.

Where there is a function which is used to decompress files by compression type.

The zip tests will IMO always need to be special cases, as there isn't a writer we will always need to read from a fixture.

jreback · 2017-10-06T01:45:59Z

doc/source/whatsnew/v0.21.0.txt

@@ -195,7 +195,7 @@ Other Enhancements
 - :func:`read_json` now accepts a ``chunksize`` parameter that can be used when ``lines=True``. If ``chunksize`` is passed, read_json now returns an iterator which reads in ``chunksize`` lines with each iteration. (:issue:`17048`)
 - :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names
 - Improved the import time of pandas by about 2.25x  (:issue:`16764`)
-
+- :func:`read_json` and :func:`to_json` now accept a ``compression`` argument which allows them to transparently handled compressed files. (:issue:`XXXXXXX`)


update this

…ed cleanly

jreback · 2017-10-06T10:09:48Z

lgtm, thanks for the quick response!

@TomAugspurger ?

simongibbons · 2017-10-06T10:17:28Z

Let me know if you want me to squash this when it's ready to merge.

jreback · 2017-10-06T10:39:01Z

@simongibbons no need to squash, its done automatically on merging.

TomAugspurger

+1.

Does the ZIP file need to be added to MANIFEST.IN?

TomAugspurger · 2017-10-06T13:18:30Z

pandas/tests/io/json/test_compression.py

+
+
+@pytest.mark.parametrize('compression', COMPRESSION_TYPES)
+def test_with_s3_url(compression):


This shares some code with the (to be merged) #17201

I think it's fine for now, but we'll want to clean it up whenever the later is merged. Since this is clean at the moment, I think we'll merge it, and then refactor this test in #17201.

jreback · 2017-10-06T14:01:43Z

Does the ZIP file need to be added to MANIFEST.IN?

hmm I think it might be need to be added to setup.py

https://travis-ci.org/pandas-dev/pandas/jobs/284093329 is our build test, which IS picking this up.

TomAugspurger · 2017-10-06T14:08:03Z

Probably covered by pandas.tests.io: ['json/data/*.json'] in the setup.py.

TomAugspurger · 2017-10-06T14:08:30Z

Thanks @simongibbons!

jreback · 2017-10-06T14:15:05Z

@TomAugspurger NO its NOT covered by that. See the failing tests. This NEEDS to be in setup.py

jreback · 2017-10-06T14:15:34Z

you can change it to '/json/data/*.json*' and it will work I think

…7798) * ENH: Add tranparent compression to json reading/writing This works in the same way as the argument to ``read_csv`` and ``to_csv``. I've added tests confirming that it works with both file paths, as well and file URLs and S3 URLs. * Fix PEP8 violations * Add PR number to whatsnew entry * Remove problematic Windows test (The S3 test hits the same edge case) * Extract decompress file function so that pytest.paramatrize can be used cleanly * Fix typo in whatsnew entry

simongibbons added 2 commits October 5, 2017 22:55

ENH: Add tranparent compression to json reading/writing

9f2af42

This works in the same way as the argument to ``read_csv`` and ``to_csv``. I've added tests confirming that it works with both file paths, as well and file URLs and S3 URLs.

Fix PEP8 violations

3ed830c

simongibbons force-pushed the add-json-compression branch from 015dd5d to 3ed830c Compare October 5, 2017 21:56

jreback requested changes Oct 6, 2017

View reviewed changes

simongibbons added 4 commits October 6, 2017 07:30

Add PR number to whatsnew entry

2a7c3b2

Remove problematic Windows test (The S3 test hits the same edge case)

8e9fd4a

Extract decompress file function so that pytest.paramatrize can be us…

ff98b60

…ed cleanly

Fix typo in whatsnew entry

402fa11

jreback added Enhancement IO JSON read_json, to_json, json_normalize labels Oct 6, 2017

jreback added this to the 0.21.0 milestone Oct 6, 2017

jreback approved these changes Oct 6, 2017

View reviewed changes

TomAugspurger approved these changes Oct 6, 2017

View reviewed changes

TomAugspurger merged commit 3b4121b into pandas-dev:master Oct 6, 2017

jreback added a commit to jreback/pandas that referenced this pull request Oct 6, 2017

BLD: fix setup.py for xref pandas-dev#17798

185b41d

jreback mentioned this pull request Oct 6, 2017

BLD: fix setup.py for xref #17798 #17804

Merged

jreback added a commit that referenced this pull request Oct 6, 2017

BLD: fix setup.py for xref #17798 (#17804)

5bb693a

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017

BLD: fix setup.py for xref pandas-dev#17798 (pandas-dev#17804)

a8d5bfc

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

BLD: fix setup.py for xref pandas-dev#17798 (pandas-dev#17804)

183bb03

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

BLD: fix setup.py for xref pandas-dev#17798 (pandas-dev#17804)

2765ccc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add tranparent compression to json reading/writing #17798

ENH: Add tranparent compression to json reading/writing #17798

simongibbons commented Oct 5, 2017 •

edited

Loading

pep8speaks commented Oct 5, 2017 •

edited

Loading

codecov bot commented Oct 5, 2017

codecov bot commented Oct 5, 2017 •

edited

Loading

jreback Oct 6, 2017

simongibbons Oct 6, 2017

simongibbons Oct 6, 2017

jreback Oct 6, 2017

simongibbons Oct 6, 2017

jreback commented Oct 6, 2017

simongibbons commented Oct 6, 2017

jreback commented Oct 6, 2017

TomAugspurger left a comment

TomAugspurger Oct 6, 2017

jreback commented Oct 6, 2017 •

edited

Loading

TomAugspurger commented Oct 6, 2017 •

edited

Loading

TomAugspurger commented Oct 6, 2017

jreback commented Oct 6, 2017

jreback commented Oct 6, 2017

		COMPRESSION_TYPES = [None, 'bz2', 'gzip', 'xz']


		def test_compress_gzip():



		@pytest.mark.parametrize('compression', COMPRESSION_TYPES)
		def test_with_s3_url(compression):

ENH: Add tranparent compression to json reading/writing #17798

ENH: Add tranparent compression to json reading/writing #17798

Conversation

simongibbons commented Oct 5, 2017 • edited Loading

pep8speaks commented Oct 5, 2017 • edited Loading

Comment last updated on October 06, 2017 at 08:18 Hours UTC

codecov bot commented Oct 5, 2017

Codecov Report

codecov bot commented Oct 5, 2017 • edited Loading

Codecov Report

jreback Oct 6, 2017

Choose a reason for hiding this comment

simongibbons Oct 6, 2017

Choose a reason for hiding this comment

simongibbons Oct 6, 2017

Choose a reason for hiding this comment

jreback Oct 6, 2017

Choose a reason for hiding this comment

simongibbons Oct 6, 2017

Choose a reason for hiding this comment

jreback commented Oct 6, 2017

simongibbons commented Oct 6, 2017

jreback commented Oct 6, 2017

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger Oct 6, 2017

Choose a reason for hiding this comment

jreback commented Oct 6, 2017 • edited Loading

TomAugspurger commented Oct 6, 2017 • edited Loading

TomAugspurger commented Oct 6, 2017

jreback commented Oct 6, 2017

jreback commented Oct 6, 2017

simongibbons commented Oct 5, 2017 •

edited

Loading

pep8speaks commented Oct 5, 2017 •

edited

Loading

codecov bot commented Oct 5, 2017 •

edited

Loading

jreback commented Oct 6, 2017 •

edited

Loading

TomAugspurger commented Oct 6, 2017 •

edited

Loading