Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: categorical dataexport - graceful degradation #8633

Closed
fkaufer opened this issue Oct 25, 2014 · 3 comments · Fixed by #8767
Closed

ENH: categorical dataexport - graceful degradation #8633

fkaufer opened this issue Oct 25, 2014 · 3 comments · Fixed by #8767

Comments

@fkaufer
Copy link

fkaufer commented Oct 25, 2014

It would be great to generally apply graceful degradation for export of categorical data instead of raising exceptions.

Currently this is only the case for to_sql and to_csv, where the categories are exported, while to_pickle is the only option to persist categorical data

For Stata and HDF it is:

  • to_hdf: NotImplementedError: cannot store a category dtype
  • to_stata: ValueError: Data type category not currently understood. Please report an error to the developers.

As long as a backend does not support categoricals or the conversion is not yet implemented, why not generally export categories as a fallback? With the separately discussed decode method (#8628) this would be easy. If the same rigor (backend supports data type natively or fail) would be applied to CSV-IO we could only export string dtypes to CSV.

Thinking one step further, the to_... functions could have an optional parameter named something like convert_cat with options:

  • None: either try to export as a categorical (pickle, potentially HDF, Stata) or raise exception
  • 'category': only export categories (decode method)
  • 'code': export s.cat.codes
  • 'mapping' or 'emulate': export code:category mapping in one/two columns or separate table/frame/... with the code-category mapping.

The last option would probably need additional parameters to control the technical implementation (e.g. table name for mapping or suffixes as for join/merge, ...)

@fkaufer fkaufer closed this as completed Oct 25, 2014
@fkaufer fkaufer reopened this Oct 25, 2014
@jreback
Copy link
Contributor

jreback commented Oct 25, 2014

see #7621 for master issue

of course appreciate user contributions to extend to these formats
you can simply convert to object if u really need to do this atm

@jreback jreback closed this as completed Oct 25, 2014
@fkaufer
Copy link
Author

fkaufer commented Oct 25, 2014

Sorry, searched but somehow missed #7621, otherwise I would have commented there. But nevertheless I think my point is a bit different in the sense that I suggest to have a rather simple generic fallback mechanism whenever there is no dedicated backend support.

Yes, of course explicit conversion works, but since that is the natural generic approach, why not applying something like that internally as last resort instead of throwing NotImplemented?

Would love to contribute, but barely have time to report issues.

@jreback
Copy link
Contributor

jreback commented Oct 25, 2014

@fkaufer well, If am going to implement graceful degredation, then might as well implement the actual serialization. It requires nearly the same tests and effort. NotImplemented it just an explicit stop-gap until it can work. Its how a not implemented feature is signaled to the user.

Do appreciate the issues report. Serialization was pushed to later for lack of time.

So this is the same issue (as I don't think degradation is worth it, and it just hides it from the user, which is not good).

bashtage added a commit to bashtage/pandas that referenced this issue Nov 12, 2014
Add support for exporting DataFrames containing categorical data.

closes pandas-dev#8633
xref pandas-dev#7621
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants