Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/DOC/CLN: Document arguments and reconcile C and Python engines for read_csv #12686

Open
17 of 22 tasks
kawochen opened this issue Mar 22, 2016 · 22 comments
Open
17 of 22 tasks
Labels
API - Consistency Internal Consistency of API/Behavior Docs Enhancement IO CSV read_csv, to_csv

Comments

@kawochen
Copy link
Contributor

kawochen commented Mar 22, 2016

Known differences between Python & C engines

Update here

Features supported in the Python engine only

Features supported in the C engine only

marked as internal on C engine only (maybe be a bit louder about this in the internal code)

Undocumented arguments to read_csv

Differences

@kawochen
Copy link
Contributor Author

I think error_bad_lines and warn_bad_lines should be one argument. They seem too intertwined now.

@jreback jreback added Enhancement Docs API Design IO CSV read_csv, to_csv Master Tracker High level tracker for similar issues labels Mar 22, 2016
@jreback jreback added this to the 0.18.1 milestone Mar 22, 2016
@jreback
Copy link
Contributor

jreback commented Mar 22, 2016

thanks for the list @kawochen. There are some issues which are relevant for some of the points. Can you link them when you have a chance (put next to the check boxes)

@gfyoung
Copy link
Member

gfyoung commented Apr 19, 2016

@kawochen : add as_recarray to the undocumented options. This one returns a np.recarray instead of a DataFrame of the data if set to True.

@gfyoung
Copy link
Member

gfyoung commented May 25, 2016

nrows issue has been fixed! Please check it off @jreback @kawochen

@jreback jreback modified the milestones: 0.19.0, 0.18.2 May 25, 2016
@gfyoung
Copy link
Member

gfyoung commented May 26, 2016

skip_footer is aliased to skipfooter AFAICT. Not sure why one hasn't been chosen yet over the other. There is a stray #deprecated comment in the code, but it doesn't look like it has been enforced.

jreback pushed a commit that referenced this issue May 26, 2016
closes #5888,  xref #12686

Author: Chris <cbartak@gmail.com>

Closes #13293 from chris-b1/low-memory-doc and squashes the following commits:

daf9bca [Chris] DOC: low_memory in read_csv
@gfyoung
Copy link
Member

gfyoung commented May 31, 2016

@kawochen, @jreback : float_precision is very well documented in fact, both in parsers.py and io.rst.

@jreback
Copy link
Contributor

jreback commented May 31, 2016

updated float_precision & na_filter

@gfyoung
Copy link
Member

gfyoung commented May 31, 2016

@jreback : no, it's still CParser-only but just move it to the list above with an unchecked box. We would still want to give that functionality to the Python parser.

@jreback
Copy link
Contributor

jreback commented May 31, 2016

I checked the box; its enough.

@gfyoung
Copy link
Member

gfyoung commented May 31, 2016

Huh? The original classification was that it was undocumented AND only supported in the C engine. The checkbox gives the impression that both issues are resolved.

@jreback
Copy link
Contributor

jreback commented May 31, 2016

@gfyoung better?

@gfyoung
Copy link
Member

gfyoung commented May 31, 2016

Yes! That works. Thanks, @jreback !

jreback pushed a commit that referenced this issue Jun 2, 2016
Title is self-explanatory.    xref #12686 - I don't quite understand
why these are marked (if at all) as internal to the C engine only, as
the benefits for having these options accepted for the Python engine
is quite clear based on the documentation I added as well.
Implementation simply just calls the already-written function in
`pandas/parsers.pyx` - as it isn't specific to the `TextReader` class,
crossing over to grab this function from Cython (instead of
duplicating in pure Python) seems reasonable while maintaining that
separation between the C and Python engines.

Author: gfyoung <gfyoung17@gmail.com>

Closes #13323 from gfyoung/python-engine-compact-ints and squashes the following commits:

95f7ba8 [gfyoung] ENH: Add support for compact_ints and use_unsigned in Python engine
@gfyoung
Copy link
Member

gfyoung commented Jun 3, 2016

@jreback : can you xref to #13349 for skip_footer? That way we know that it is in fact documented, but there are just duplicate arguments

@jreback
Copy link
Contributor

jreback commented Jun 3, 2016

I updated

jreback pushed a commit that referenced this issue Jun 6, 2016
So I wasn't 100% correct when I said that `float_precision` was
documented <a href="#12686 (comment)
ecomment-222684918">here<a/>.  It was well documented internally for
`TextParser` and in a section for `io.rst`, but it wasn't listed
formally in the parameters for the `read_csv` documentation.

Author: gfyoung <gfyoung17@gmail.com>

Closes #13377 from gfyoung/float-precision-doc and squashes the following commits:

a9eed16 [gfyoung] DOC: actually document float_precision in read_csv
@gfyoung
Copy link
Member

gfyoung commented Jun 6, 2016

@jreback , @kawochen : as_recarray has been fixed now --> check it off! 😄

@gfyoung
Copy link
Member

gfyoung commented Nov 26, 2016

@jorisvandenbossche : You can check-off the dtype checkbox here too!

@jreback
Copy link
Contributor

jreback commented Mar 23, 2017

@gfyoung are all of the open items on the check boxes still open? (IOW have we missed checking anything off). anything we should just take off (and/or just document)?

@gfyoung
Copy link
Member

gfyoung commented Mar 23, 2017

All of those are valid differences that should be patched, though the implementation is not straightforward for any of them. It would be worthwhile to double check that they are properly documented for now.

@jreback jreback modified the milestones: 0.21.0, 0.20.0 Mar 23, 2017
@jreback
Copy link
Contributor

jreback commented Mar 23, 2017

thanks @gfyoung more docs always welcome!

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

@gfyoung can you review the top section and see where we are?

@jreback jreback modified the milestones: 0.21.0, Next Major Release Sep 23, 2017
@gfyoung
Copy link
Member

gfyoung commented Sep 24, 2017

@jreback : At the time of commenting, this list is correct and up-to-date with our progress.

@jreback
Copy link
Contributor

jreback commented Sep 24, 2017

@gfyoung ok thanks. feel free to issue PR's to close some of these :>

@jreback jreback modified the milestones: Next Major Release, Admin, High Level Issue Tracking Sep 24, 2017
@TomAugspurger TomAugspurger removed the Master Tracker High level tracker for similar issues label Jul 6, 2018
@TomAugspurger TomAugspurger removed this from the High Level Issue Tracking milestone Jul 6, 2018
@jbrockmendel jbrockmendel added the API - Consistency Internal Consistency of API/Behavior label Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Docs Enhancement IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

6 participants