ENH/DOC/CLN: Document arguments and reconcile C and Python engines for read_csv #12686

kawochen · 2016-03-22T14:11:51Z

Known differences between Python & C engines

Update here

Features supported in the Python engine only

skipfooter / skip_footer (API: skipfooter or skip_footer? read_csv can't seem to decide #13349) - num of lines at the bottom of the file to skip
sniffing (sep=None) - deduce the sep ENH: add read_csv sniffing (sep=None) for C engine #9645
regex sep - regular expression/multicharacter seperator

Features supported in the C engine only

marked as internal on C engine only (maybe be a bit louder about this in the internal code)

buffer_lines DEPR, DOC: Deprecate buffer_lines in read_csv #13360

Undocumented arguments to `read_csv`

doublequote DOC: document doublequote in read_csv #13368
compact_ints API: Deprecate compact_ints and use_unsigned in read_csv #13323
use_unsigned API: Deprecate compact_ints and use_unsigned in read_csv #13323
as_recarray #(DEPR: Deprecate as_recarray in read_csv #13373)
memory_map, IO: memory_map kw in read_csv #7477, DOC, ENH: Support memory_map for Python engine #13381

Differences

validity of names and its length with respect to usecols API/DOC: Specification for names parameter in read_csv #16469
different handling of na_values when converters is also present. Inconsistent Handling of na_values and converters in read_csv #13302
different handling of columns aggregated to create date columns API: Inconsistent handling of columns aggregated to create date columns #23845

The text was updated successfully, but these errors were encountered:

kawochen · 2016-03-22T14:16:16Z

I think error_bad_lines and warn_bad_lines should be one argument. They seem too intertwined now.

jreback · 2016-03-22T14:23:22Z

thanks for the list @kawochen. There are some issues which are relevant for some of the points. Can you link them when you have a chance (put next to the check boxes)

gfyoung · 2016-04-19T16:53:23Z

@kawochen : add as_recarray to the undocumented options. This one returns a np.recarray instead of a DataFrame of the data if set to True.

gfyoung · 2016-05-25T12:30:31Z

nrows issue has been fixed! Please check it off @jreback @kawochen

gfyoung · 2016-05-26T20:34:13Z

skip_footer is aliased to skipfooter AFAICT. Not sure why one hasn't been chosen yet over the other. There is a stray #deprecated comment in the code, but it doesn't look like it has been enforced.

closes #5888, xref #12686 Author: Chris <cbartak@gmail.com> Closes #13293 from chris-b1/low-memory-doc and squashes the following commits: daf9bca [Chris] DOC: low_memory in read_csv

gfyoung · 2016-05-31T13:17:10Z

@kawochen, @jreback : float_precision is very well documented in fact, both in parsers.py and io.rst.

jreback · 2016-05-31T13:17:15Z

updated float_precision & na_filter

gfyoung · 2016-05-31T13:17:39Z

@jreback : no, it's still CParser-only but just move it to the list above with an unchecked box. We would still want to give that functionality to the Python parser.

jreback · 2016-05-31T13:18:50Z

I checked the box; its enough.

gfyoung · 2016-05-31T13:20:09Z

Huh? The original classification was that it was undocumented AND only supported in the C engine. The checkbox gives the impression that both issues are resolved.

jreback · 2016-05-31T13:22:02Z

@gfyoung better?

gfyoung · 2016-05-31T13:23:04Z

Yes! That works. Thanks, @jreback !

Title is self-explanatory. xref #12686 - I don't quite understand why these are marked (if at all) as internal to the C engine only, as the benefits for having these options accepted for the Python engine is quite clear based on the documentation I added as well. Implementation simply just calls the already-written function in `pandas/parsers.pyx` - as it isn't specific to the `TextReader` class, crossing over to grab this function from Cython (instead of duplicating in pure Python) seems reasonable while maintaining that separation between the C and Python engines. Author: gfyoung <gfyoung17@gmail.com> Closes #13323 from gfyoung/python-engine-compact-ints and squashes the following commits: 95f7ba8 [gfyoung] ENH: Add support for compact_ints and use_unsigned in Python engine

gfyoung · 2016-06-03T22:06:11Z

@jreback : can you xref to #13349 for skip_footer? That way we know that it is in fact documented, but there are just duplicate arguments

jreback · 2016-06-03T22:09:31Z

I updated

So I wasn't 100% correct when I said that `float_precision` was documented <a href="#12686 (comment) ecomment-222684918">here<a/>. It was well documented internally for `TextParser` and in a section for `io.rst`, but it wasn't listed formally in the parameters for the `read_csv` documentation. Author: gfyoung <gfyoung17@gmail.com> Closes #13377 from gfyoung/float-precision-doc and squashes the following commits: a9eed16 [gfyoung] DOC: actually document float_precision in read_csv

gfyoung · 2016-06-06T23:21:25Z

@jreback , @kawochen : as_recarray has been fixed now --> check it off! 😄

gfyoung · 2016-11-26T09:33:33Z

@jorisvandenbossche : You can check-off the dtype checkbox here too!

jreback · 2017-03-23T13:51:42Z

@gfyoung are all of the open items on the check boxes still open? (IOW have we missed checking anything off). anything we should just take off (and/or just document)?

gfyoung · 2017-03-23T20:02:51Z

All of those are valid differences that should be patched, though the implementation is not straightforward for any of them. It would be worthwhile to double check that they are properly documented for now.

jreback · 2017-03-23T22:01:54Z

thanks @gfyoung more docs always welcome!

jreback · 2017-09-23T21:05:31Z

@gfyoung can you review the top section and see where we are?

gfyoung · 2017-09-24T08:32:45Z

@jreback : At the time of commenting, this list is correct and up-to-date with our progress.

jreback · 2017-09-24T13:36:01Z

@gfyoung ok thanks. feel free to issue PR's to close some of these :>

jreback added Enhancement Docs API Design IO CSV read_csv, to_csv Master Tracker High level tracker for similar issues labels Mar 22, 2016

jreback added this to the 0.18.1 milestone Mar 22, 2016

jreback modified the milestones: 0.18.2, 0.18.1 Apr 22, 2016

jreback mentioned this issue Apr 22, 2016

ENH: Python parser now accepts delim_whitespace=True #12958

Closed

jreback modified the milestones: 0.19.0, 0.18.2 May 25, 2016

chris-b1 mentioned this issue May 26, 2016

DOC: low_memory in read_csv #13293

Closed

2 tasks

jreback pushed a commit that referenced this issue May 26, 2016

DOC: low_memory in read_csv

4b05055

closes #5888, xref #12686 Author: Chris <cbartak@gmail.com> Closes #13293 from chris-b1/low-memory-doc and squashes the following commits: daf9bca [Chris] DOC: low_memory in read_csv

This was referenced May 27, 2016

Inconsistent Handling of na_values and converters in read_csv #13302

Open

API: Deprecate compact_ints and use_unsigned in read_csv #13323

Closed

gfyoung mentioned this issue Jun 6, 2016

DOC: actually document float_precision in read_csv #13377

Closed

This was referenced Jul 31, 2016

read_csv dtype argument not working when there is a footer #5232

Closed

read_csv character encoding bug? #2741

Closed

chris-b1 mentioned this issue Sep 24, 2016

API: add dtype= option to python parser #14295

Merged

4 tasks

jreback modified the milestones: 0.21.0, 0.20.0 Mar 23, 2017

gfyoung mentioned this issue Mar 26, 2017

DOC: Explain differences further for sep parameter #15804

Merged

gfyoung mentioned this issue Apr 6, 2017

ENH: Support malformed row handling in Python engine #15925

Merged

jreback modified the milestones: 0.21.0, Next Major Release Sep 23, 2017

jreback modified the milestones: Next Major Release, Admin, High Level Issue Tracking Sep 24, 2017

TomAugspurger removed the Master Tracker High level tracker for similar issues label Jul 6, 2018

TomAugspurger removed this from the High Level Issue Tracking milestone Jul 6, 2018

jbrockmendel added the API - Consistency Internal Consistency of API/Behavior label Sep 22, 2020

mroeschke removed the API Design label Apr 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH/DOC/CLN: Document arguments and reconcile C and Python engines for read_csv #12686

ENH/DOC/CLN: Document arguments and reconcile C and Python engines for read_csv #12686

kawochen commented Mar 22, 2016 •

edited by phofl

Loading

In C engine only (but undocumented)

kawochen commented Mar 22, 2016

jreback commented Mar 22, 2016

gfyoung commented Apr 19, 2016

gfyoung commented May 25, 2016

gfyoung commented May 26, 2016 •

edited

Loading

gfyoung commented May 31, 2016

jreback commented May 31, 2016

gfyoung commented May 31, 2016 •

edited

Loading

jreback commented May 31, 2016

gfyoung commented May 31, 2016

jreback commented May 31, 2016

gfyoung commented May 31, 2016

gfyoung commented Jun 3, 2016

jreback commented Jun 3, 2016

gfyoung commented Jun 6, 2016

gfyoung commented Nov 26, 2016

jreback commented Mar 23, 2017

gfyoung commented Mar 23, 2017

jreback commented Mar 23, 2017

jreback commented Sep 23, 2017

gfyoung commented Sep 24, 2017

jreback commented Sep 24, 2017

ENH/DOC/CLN: Document arguments and reconcile C and Python engines for read_csv #12686

ENH/DOC/CLN: Document arguments and reconcile C and Python engines for read_csv #12686

Comments

kawochen commented Mar 22, 2016 • edited by phofl Loading

Known differences between Python & C engines

Features supported in the Python engine only

Features supported in the C engine only

In C engine only (but undocumented)

marked as internal on C engine only (maybe be a bit louder about this in the internal code)

Undocumented arguments to read_csv

Differences

kawochen commented Mar 22, 2016

jreback commented Mar 22, 2016

gfyoung commented Apr 19, 2016

gfyoung commented May 25, 2016

gfyoung commented May 26, 2016 • edited Loading

gfyoung commented May 31, 2016

jreback commented May 31, 2016

gfyoung commented May 31, 2016 • edited Loading

jreback commented May 31, 2016

gfyoung commented May 31, 2016

jreback commented May 31, 2016

gfyoung commented May 31, 2016

gfyoung commented Jun 3, 2016

jreback commented Jun 3, 2016

gfyoung commented Jun 6, 2016

gfyoung commented Nov 26, 2016

jreback commented Mar 23, 2017

gfyoung commented Mar 23, 2017

jreback commented Mar 23, 2017

jreback commented Sep 23, 2017

gfyoung commented Sep 24, 2017

jreback commented Sep 24, 2017

kawochen commented Mar 22, 2016 •

edited by phofl

Loading

Undocumented arguments to `read_csv`

gfyoung commented May 26, 2016 •

edited

Loading

gfyoung commented May 31, 2016 •

edited

Loading