Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: deprecate setting of .ordered directly (GH9347, GH9190) #9622

Merged
merged 2 commits into from
Mar 11, 2015

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Mar 9, 2015

alternate to #9611
closes #9347
closes #9190
closes #9148

This implementes the alternate, and IMHO a nice compromise.

Groupby's will succeed whether ordered or not
ordering ops (sort/argsort/min/max) will still raise.

one though is that we could detect a non-reduction where the ordering matters (maybe we do this in another PR), and show a warning, e.g. imagine df.groupby('A').head(2) is technically 'wrong', while df.groupby('A').sum() would have no warning.

In [1]: df = DataFrame({ 'A' : Series(list('aabc')).astype('category'), 'B' : np.arange(4) })

In [2]: df.groupby('A').sum()
Out[2]: 
   B
A   
a  1
b  2
c  3

In [3]: df['A'].order()

TypeError: Categorical is not ordered for operation argsort
you can use .as_ordered() to change the Categorical to an ordered one

@jreback jreback added the Categorical Categorical Data Type label Mar 9, 2015
@jreback jreback added this to the 0.16.0 milestone Mar 9, 2015
@jreback
Copy link
Contributor Author

jreback commented Mar 9, 2015

@jankatins
Copy link
Contributor

LGTM... :-)

@jreback jreback changed the title Cat fix API: deprecate setting of .ordered directly (GH9347, GH9190) Mar 10, 2015
@@ -281,18 +290,26 @@ raised.
s.sort()
except TypeError as e:
print("TypeError: " + str(e))
s = Series(["a","b","c","a"], dtype="category") # ordered per default!
s = Series(["a","b","c","a"]).astype('category',ordered=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing a space before ordered=True

@shoyer
Copy link
Member

shoyer commented Mar 10, 2015

Grammar nits aside, what happened to the idea of using sort=None in groupby? It seems a little bad to let sort=True pass silently for unordered categories.

one though is that we could detect a non-reduction where the ordering matters (maybe we do this in another PR), and show a warning, e.g. imagine df.groupby('A').head(2) is technically 'wrong', while df.groupby('A').sum() would have no warning.

This probably reflects my ignorance of groupby internals, but to me, this looks more like a bug in GroupBy.head -- it doesn't respect sort=True:

In [17]: df = pd.DataFrame({'A': [2, 1, 0, 0], 'B': [0, 1, 2, 3]})

In [18]: df.groupby('A', sort=True).head(1)
Out[18]:
   A  B
0  2  0
1  1  1
2  0  2

Also, I didn't see a clear answer to @jorisvandenbossche's question about what sort=None means for the order of unordered categories. It looks like you're just using the existing categories in that case, which is fine by me -- I don't think this particularly matters. Though again, I think sort=True should raise in that case.

@jreback
Copy link
Contributor Author

jreback commented Mar 10, 2015

The sort only affects first and the like (my example above was wrong). .head() is a filter.

In [17]: df = DataFrame(np.random.randint(1, 10, (100, 2)),dtype='int64')

In [18]: df.groupby(df[1]).first()
Out[18]: 
   0
1   
1  7
2  6
3  9
4  6
5  7
6  7
7  1
8  9
9  6

In [19]: df.groupby(df[1],sort=False).first()
Out[19]: 
   0
1   
4  6
6  7
5  7
9  6
2  6
8  9
3  9
7  1
1  7

I think the groupby sort=None is not really an elegant soln and introduces some boilerplate.

@jorisvandenbossche
Copy link
Member

To all, could you also react to my comment here (the previous PR):#9611 (comment)

Why not allowing sorting? Because, sorting can also be seen as just the operation of putting the same categories together in the series. And this is something I potentially also want to do with unordered categories, without having to explicitely give an order to the categories.

@jreback
Copy link
Contributor Author

jreback commented Mar 10, 2015

@jorisvandenbossche

ahh so, you want to all sort/argsort on even unordered categoricals, but disallow min/max. its fine by me as it makes less things break. its a bit non-strict, but as long as its clear that we order by categories I don't see any harm.

(but THAT is a change yes, not the order of appearance, but in order of the categoricals)?

@shoyer
Copy link
Member

shoyer commented Mar 10, 2015

@jorisvandenbossche Hmm, interesting. I don't know if I have an educated opinion here. It does seem reasonable to me (especially since it's what R does).

@jorisvandenbossche
Copy link
Member

@shoyer I also have to admit I don't really have an educated opinion. I was just triggered to look at it due to the discussion in the other PR, and then saw the behaviour of R. But the more I think about it, the more it seems natural to want to order a categorical even if is not strictly 'ordered'.

@jreback
Copy link
Contributor Author

jreback commented Mar 10, 2015

ok, so bottom line is that we allowe all operations on both ordered/unordered except for min/max?

@jorisvandenbossche
Copy link
Member

cc @jseabold @mwaskom @njsmith (the question is about allowing to sort an unordered categorical, see my comment above #9622 (comment) and here #9611 (comment)) any opinions?

@jreback
Copy link
Contributor Author

jreback commented Mar 10, 2015

I updated [here]jreback@7adad3b)

jreback added 2 commits March 11, 2015 06:32
     add set_ordered method for setting ordered
     default for Categorical is now to NOT order unless explicity specified

whatsnew doc updates for categorical api changes

add ability to specify keywords to astype for creation defaults

fix issue with grouping with sort=True on an unordered Categorical
update categorical.rst docs

test unsortable when ordered=True

v0.16.0.txt / release notes updates

clean up check for ordering

allow groupby to work on an unordered categorical
jreback added a commit that referenced this pull request Mar 11, 2015
API: deprecate setting of .ordered directly (GH9347, GH9190)
@jreback jreback merged commit edb0927 into pandas-dev:master Mar 11, 2015
@jreback
Copy link
Contributor Author

jreback commented Mar 11, 2015

we could always change later, but this needs to go in the rc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: categorical.reset_order Ordered vs. Unordered Categoricals
5 participants