-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: deprecate setting of .ordered directly (GH9347, GH9190) #9622
Conversation
LGTM... :-) |
@@ -281,18 +290,26 @@ raised. | |||
s.sort() | |||
except TypeError as e: | |||
print("TypeError: " + str(e)) | |||
s = Series(["a","b","c","a"], dtype="category") # ordered per default! | |||
s = Series(["a","b","c","a"]).astype('category',ordered=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing a space before ordered=True
Grammar nits aside, what happened to the idea of using
This probably reflects my ignorance of groupby internals, but to me, this looks more like a bug in
Also, I didn't see a clear answer to @jorisvandenbossche's question about what |
The sort only affects
I think the groupby |
To all, could you also react to my comment here (the previous PR):#9611 (comment) Why not allowing sorting? Because, sorting can also be seen as just the operation of putting the same categories together in the series. And this is something I potentially also want to do with unordered categories, without having to explicitely give an order to the categories. |
ahh so, you want to all sort/argsort on even unordered categoricals, but disallow min/max. its fine by me as it makes less things break. its a bit non-strict, but as long as its clear that we order by categories I don't see any harm. (but THAT is a change yes, not the order of appearance, but in order of the categoricals)? |
@jorisvandenbossche Hmm, interesting. I don't know if I have an educated opinion here. It does seem reasonable to me (especially since it's what R does). |
@shoyer I also have to admit I don't really have an educated opinion. I was just triggered to look at it due to the discussion in the other PR, and then saw the behaviour of R. But the more I think about it, the more it seems natural to want to order a categorical even if is not strictly 'ordered'. |
ok, so bottom line is that we allowe all operations on both ordered/unordered except for min/max? |
cc @jseabold @mwaskom @njsmith (the question is about allowing to sort an unordered categorical, see my comment above #9622 (comment) and here #9611 (comment)) any opinions? |
I updated [here]jreback@7adad3b) |
add set_ordered method for setting ordered default for Categorical is now to NOT order unless explicity specified whatsnew doc updates for categorical api changes add ability to specify keywords to astype for creation defaults fix issue with grouping with sort=True on an unordered Categorical update categorical.rst docs test unsortable when ordered=True v0.16.0.txt / release notes updates clean up check for ordering allow groupby to work on an unordered categorical
API: deprecate setting of .ordered directly (GH9347, GH9190)
we could always change later, but this needs to go in the rc. |
alternate to #9611
closes #9347
closes #9190
closes #9148
This implementes the alternate, and IMHO a nice compromise.
Groupby's will succeed whether ordered or not
ordering ops (sort/argsort/min/max) will still raise.
one though is that we could detect a non-reduction where the ordering matters (maybe we do this in another PR), and show a warning, e.g. imagine
df.groupby('A').head(2)
is technically 'wrong', whiledf.groupby('A').sum()
would have no warning.