Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: formalize the pandas IO API #15862

Closed
jreback opened this issue Apr 2, 2017 · 4 comments
Closed

API: formalize the pandas IO API #15862

jreback opened this issue Apr 2, 2017 · 4 comments
Labels
API Design Docs IO Data IO issues that don't fit into a more specific label

Comments

@jreback
Copy link
Contributor

jreback commented Apr 2, 2017

#15838 (comment)

we have fairly uniform IO routines of the form

.to_format(path, df, **kwargs) (takes DataFrame)
and
pd.read_format(path, **kwargs) (returns DataFrame)

so should document various aspects of this:

  • contract on input path strings
  • file-like objects & is_file_like (ENH: Add file buffer validation to I/O ops #15894)
  • do we do any encoding / compression (only on csv/json ATM), compression
  • various guarantees on what we are sending in (e.g. no Index, string columns which are non-duplicated), no non-string objects (see feather/parquet impl).
  • make these more pluggable
  • perhaps allow a specification for block access / chunking.
  • additional args to accept/use: mode (for writing)
@h-vetinari
Copy link
Contributor

This should IMO also extend to generic write methods pd.to_xxx. For example, as of v0.23.1, there's:

  • pd.to_msgpack(path, *args, **kwargs), where *args is an arbitrary number of objects to pack,
  • while pd.to_pickle(obj, path, ...) switches the order around (and it would be possible to pass an arbitrary number of objects that are then wrapped in a tuple),
  • and the other pd.to_xxx do not exist at all.

It may well be that it makes no sense to have a generic method for all data types (like CSV), but those that are around should have a unified interface as well.

@TomAugspurger
Copy link
Contributor

@jreback do you think there's anything here that's a blocker for 1.0?

@jreback
Copy link
Contributor Author

jreback commented Dec 31, 2019

nothing immediately obvious

likely need to audit all of the IO methods for conformity and show a matrix of what apis we have for each

@TomAugspurger TomAugspurger modified the milestones: 1.0, Contributions Welcome Jan 2, 2020
@mroeschke
Copy link
Member

I think this is a duplicate issue as #15008 as they are covering similar points. Going to close this issue in favor of that one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Docs IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

4 participants