-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: deprecate setting of .ordered directly (GH9347, GH9190) #9611
Conversation
f73312b
to
d4e9254
Compare
One point to note. supplying a
|
|
||
In [2]: df['A'].order() | ||
TypeError: Categorical not ordered | ||
you can use .set_ordered(True) to change the Categorical to an ordered one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the wording suggest that set_ordered(True)
is inplace and it exposes the underlying object name instead of talking about "categorical data". On the other hand, I tried to get a better wording for 5 min, but couldn't find one :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying to discourage the inplace
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw that set_categories()
has the same issue... ok, I drop my case here :-)
The original discussion in #9347, which the current impl.
@JanSchulz does this look right? |
No, the first example is what #9347 is/was all about and which was kind of rejected by all "option 4" votes:
|
ok, so I should leave the sorting alone, then just set the |
ok, if I change back to what was there before (e.g. this should be option 4)
|
yep |
cat | ||
cat.ordered | ||
|
||
# you can set in the construtor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "constru_c_tor"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@jreback @JanSchulz I don't have time today to look at this in more detail, so if possible, can you keep it open a bit longer? But already one thing: getting errors with a plain groupby does really not look nice in my opinion. Can't we just sort it in the 'order' of the categories? I know it is not 'ordered' in the sense that the first element is regarded as lower as the last etc, but still, you have the 'order' of the categories in the |
So here is a possible compromise. automatically ordering basically violates the API guarantee that we shouldn't allow sorting (or sort-like operations) on unordered Categoricals. But could simply do an implicit
|
latest commit changes all of the raises to
|
5c61bc5
to
bba7f56
Compare
add set_ordered method for setting ordered default for Categorical is now to NOT order unless explicity specified
update categorical.rst docs test unsortable when ordered=True
show an OrderingWarning but let the operation succeed
I don't like the "warn + change instead of raise"-thing... The old thing (raise in groupby) is not so nice, but at least it's not changing a datatype. The way I think about categorical data is like an user defined integer: you have certain valid values (0...INTMAX), maybe an order (0<1, etc). If we had a "unorderable int", it would be bad if The only place I find that "not so nice" is the groupby and I think that is more of a problem of the default of the groupby. I see several ways to fix that, but all have drawbacks:
My preference would be |
I need to look into this more deeply, but my inclination would be to switch to |
this commit here implementes the and then we have the groupby back to 'working' (its doing what it was doing before we changed the default), but you force
|
If you guys can have a look today would be gr8. Need this for the rc. @JanSchulz made the list above. I think |
I think there is another option that was not included in @JanSchulz his list for possible fixes:
But to put in the other way around: why not sorting an unordered categorical? It is a convenience to be able to do this (see eg the groupby case), and what harm does it do that this works without having to convert it to an ordered categorical? Just sorting the values in the order as it is in the @jreback In the case of using |
@jorisvandenbossche I like your soln. It basically doesn't care on groupby's which is fine; they will be ordered as they are ordered now (and not custom ordered like ordered categoricals). If you think about it we actually have 2 dtypes here (ordered & unorderded) categoricals (kind of like a sub-dtype). |
closing in favor of #9622 |
closes #9347
closes #9190
closes #9148
so this is option 4).
Default is now
ordered=False
(independently of whether you specify categories or not)The ordering is still what factorize does (seen order, if no categories are specified).
Add
set_ordered
to change theordered
flag, which by default returns a new object.Further added the ability for astype to pass on keywords to the constructor
Furthermore I put a warning as seemingly simple operations will fail because the user had an implicity ordered Categorical.