ENH: add gzip/bz2 compression to relevant read_* methods #15644

gfairchild · 2017-03-10T07:52:39Z

This issue is a branch off of #11666, which implemented compression support for read_pickle. There are still a few other read_* methods that could possibly benefit from compression support. Looking at the I/O API reference, this jump out at me:

read_json - This can definitely benefit from compression. I've stored very large gzipped JSON files before. As a general rule, any read_* method that supports any kind of plaintext format should support compression.
read_stata~~- I don't use Stata, but it looks like a .dta file is not a plaintext file. Is it naturally compressed, or can they be compressed significantly like pickles?~~
read_sas - I've also never used SAS, and like Stata's .dta files, it looks like .xpt and .sas7bdat files are both some binary format. Can they be compressed well?

The text was updated successfully, but these errors were encountered:

jreback · 2017-03-10T08:59:43Z

most important is json

bashtage · 2017-03-10T13:35:18Z

Stata is not compressed but is just a fairly plan binary file format. This said, I don' t think there is much of a reason to add compression methods since the output file wouldn't be usable in Stata (presumable the reason to output in this format) without manual decompression.

jreback · 2017-03-10T13:55:18Z

IIRC read_sas has internal compression as well? or is it a different file extension?

gfairchild · 2017-03-21T20:08:03Z

In this case, it looks like read_json may be the only method that needs compression support added.

jreback · 2017-03-21T20:16:58Z

@gfairchild want to take a stab at this? should be fairly straightforward as you can pretty much reuse the infrastructure (mainly just passing the compression arg thru). This is really just a couple of tests as well.

gfairchild · 2017-03-21T20:17:53Z

I'd be happy to. Just got to find the time. Maybe I can do it this weekend.

gfairchild mentioned this issue Mar 10, 2017

ENH: add gzip/bz2 compression to read_pickle() (and perhaps other read_*() methods) #11666

Closed

jreback added IO Data IO issues that don't fit into a more specific label Difficulty Intermediate IO JSON read_json, to_json, json_normalize IO SAS SAS: read_sas IO Stata read_stata, to_stata labels Mar 10, 2017

jreback added this to the Next Major Release milestone Mar 10, 2017

colinhiggins mentioned this issue Jun 21, 2017

ENH: simple patch for read_json compression #16750

Closed

4 tasks

simongibbons mentioned this issue Oct 5, 2017

ENH: Add tranparent compression to json reading/writing #17798

Merged

4 tasks

jreback modified the milestones: Next Major Release, 0.21.0 Oct 6, 2017

TomAugspurger closed this as completed in #17798 Oct 6, 2017

ozak mentioned this issue May 31, 2019

Compression keyword for Stata and others? #26599

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add gzip/bz2 compression to relevant read_* methods #15644

ENH: add gzip/bz2 compression to relevant read_* methods #15644

gfairchild commented Mar 10, 2017 •

edited by jreback

Loading

jreback commented Mar 10, 2017

bashtage commented Mar 10, 2017

jreback commented Mar 10, 2017

gfairchild commented Mar 21, 2017

jreback commented Mar 21, 2017

gfairchild commented Mar 21, 2017

ENH: add gzip/bz2 compression to relevant read_* methods #15644

ENH: add gzip/bz2 compression to relevant read_* methods #15644

Comments

gfairchild commented Mar 10, 2017 • edited by jreback Loading

jreback commented Mar 10, 2017

bashtage commented Mar 10, 2017

jreback commented Mar 10, 2017

gfairchild commented Mar 21, 2017

jreback commented Mar 21, 2017

gfairchild commented Mar 21, 2017

gfairchild commented Mar 10, 2017 •

edited by jreback

Loading