-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support .astype('category') on DataFrame / aka co-factorization #12860
Comments
I'm not so sure what you are proposing here? That For my usecase My usecase is more:
|
well, you would oftentimes do this on a sub-set I think, e.g. the reason I bring this up is whether we should form the uniques FIRST before conversions, IOW if As opposed to individually create them per-column. |
IMO constructing the categories from all uniques makes sense. [How would one merge these subset back into the original DF? dropping the old columns and merging the new ones back in? Sounds like a lot of work which ends up as long as the for loop?] |
Here is a complete example
|
Note this can actually be implemented in a more performant way via https://github.com/pandas-dev/pandas/blob/master/pandas/core/reshape/merge.py#L1453 |
xref to #10696, #8709
We don't allow an astype of a DataFrame to category directly
Instead you can apply the astype per-column.
But if you have 'similar' cateogories then you would usually do this, automatically
astyping with the same uniques.
This is failry straightforward to actually implement, and I think is a nice easy way of coding, w/o having to actually support 2D categoricals internally (and we are moving away from internal 2-d structures anyhow).
The text was updated successfully, but these errors were encountered: