Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add read_csv sniffing (sep=None) for C engine #9645

Open
jorisvandenbossche opened this issue Mar 13, 2015 · 6 comments
Open

ENH: add read_csv sniffing (sep=None) for C engine #9645

jorisvandenbossche opened this issue Mar 13, 2015 · 6 comments
Labels
Enhancement IO CSV read_csv, to_csv

Comments

@jorisvandenbossche
Copy link
Member

If you now do pd.read_csv(myfile.csv, sep=None), you get the warning:

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support sep=None with delim_whitespace=False; you can avoid this warning by specifying engine='python'

Is this warning correct that it is not implemented for the c engine?
If so, it could maybe be added.

@iskandr
Copy link

iskandr commented Oct 27, 2015

👍 I would also appreciate this feature.

@gfyoung
Copy link
Member

gfyoung commented May 25, 2016

@jorisvandenbossche : The reason it isn't in the C engine is because we rely on Python's csv library to do the sniffing for us. This could easily be transferred to the C engine, but is that too much coupling between the CParser and the PythonParser? I am not sure. On the one hand, the implementation would be faster in C, but the library's Python implementation is very detailed in terms of its determination of sniffed delimiter. Is it worth trying to replicate such an implementation in C?

cc @jreback

@jreback
Copy link
Contributor

jreback commented May 25, 2016

sniffing is quite cheap and you don't have to implement it at all
just use the sniffer
iirc it only reads a few lines

this can be done before the engine actually opens the file as a preprocessing step (and make the interface uniform between the engines)

@gfyoung
Copy link
Member

gfyoung commented May 25, 2016

@jreback : PythonParser first iterates through all of the skipped rows before finally reaching a row that should be read after which it sniffs the delimiter from that line. Let me see if something similar can be done with the C engine.

@jreback
Copy link
Contributor

jreback commented May 25, 2016

that's fine

this is an explicit action

@gfyoung
Copy link
Member

gfyoung commented May 25, 2016

Hmm...this is not as simple to do without duplicating a good amount of code from the PythonParser. Need to see if there is a nice way to refactor this.

@mroeschke mroeschke removed this from the Someday milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

5 participants