-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Added key option to df/series.sort_values(key=...) and df/series.sort_index(key=...) sorting #27237
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR! Can you add support for Index to go with DataFrame / Series as well?
From typing import Callable at the top of the module then just Callable for the annotation
…Sent from my iPhone
On Jul 6, 2019, at 10:41 AM, Jacob Austin ***@***.***> wrote:
@ja3067 commented on this pull request.
In pandas/core/frame.py:
> @@ -4977,6 +4977,7 @@ def sort_values(
inplace=False,
kind="quicksort",
na_position="last",
+ key=None
Would you just like key : typing.Callable?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hello @jacobaustin123! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-04-27 02:12:02 UTC |
Just pushed a new commit that adds support for Index.sort_values(key=...) and adds a key flag to nargsort(...) and lexsort_indexer(...). I've also added static type hints. I still use Index.map to apply the key function for sort_index functions, because it seems simpler and semantically better to apply the key to the values and not the codes. |
@ja3067 is this still active? |
I will make the requested changes this weekend. The only decision I think needs to be made is how |
Without a |
@WillAyd Agreed. In the case, say, of one numeric column and one string column, should we for instance support a dictionary of {column_name : Callable} so that different keys can be passed, or the key can be passed only to the primary column. Same question for sort_index. |
I would leave that enhancement to a separate PR if requested |
@ja3067 is this still active? |
@WillAyd I'll make the necessary updates this weekend. Sorry for the delay. |
@WillAyd Ok. The final PR is almost done. Last question is about the
What do you think? |
c814ec8
to
47f9751
Compare
@WillAyd I've incorporated the review changes, and it passes tests now. Should be good. |
47f9751
to
4b51227
Compare
Can you merge master and repush? Not sure what 37 error was may be resolved |
4ae2bdd
to
9ce3b26
Compare
Thanks, that sounds good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few small nitpicks. For the rest looks good!
doc/source/whatsnew/v1.1.0.rst
Outdated
We've added a ``key`` argument to the DataFrame and Series sorting methods, including | ||
:meth:`DataFrame.sort_values`, :meth:`DataFrame.sort_index`, :meth:`Series.sort_values`, | ||
and :meth:`Series.sort_index`. The ``key`` can be any callable function which is applied | ||
to the each column of a DataFrame before sorting is performed (:issue:`27237`). See |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to the each column of a DataFrame before sorting is performed (:issue:`27237`). See | |
to each column of a DataFrame before sorting is performed (:issue:`27237`). See |
Apart from the typo, I find "each column" a bit confusing, as it is of course only applied to those columns that are used for sorting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if the new version is clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New version looks good!
@WillAyd @jorisvandenbossche @jreback This is badly timed since I know we want to finish this, but I made some changes today that add support for dictionary keys, i.e. df = DataFrame({0: ["Hello", "goodbye"], 1: [0, 1]})
result = df.sort_values([0, 1], key={0: lambda col: col.str.lower()}) So you can specify a different key function for different levels/columns. I have a commit ready I can push now, but if people prefer, we can finalize this and have that as a separate commit, although as part of this addition I refactored the code significantly so everything is pulled into |
let’s not add anything more |
_as = self.argsort() | ||
if not ascending: | ||
_as = _as[::-1] | ||
sorted_index = self.take(_as) | ||
return sorted_index, _as | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback do you know what this second branch is here for? I'm tempted to make the argsort
method the default since it's necessary for key
sorting, unless it's an important optimization.
pandas/core/sorting.py
Outdated
if axis == 0: | ||
new_df = DataFrame._from_arrays(new_levels, df.columns, df.index) | ||
else: | ||
new_df = DataFrame._from_arrays(new_levels, df.index, df.columns).transpose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to do this (construct a DataFrame along either row or column axis) that's better than transposing it like this?
@jreback Ok. I just reverted those changes but kept some of the refactoring so e.g. Dict keys can be an easy PR on top. |
@jreback gentle bump. would love to finish this up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacobaustin123 I would really like to merge this. but you keep changing an enormous amount of code. Please got back to when I make comments I think 10 or 12 days ago.
changed all of the ensure_key_mapped again, this is a mess. Please just go back to the much simpler version.
@jreback ok. I can force revert to the previous version, or just address your new comments here. Let me know what you prefer. The motivation for this was to do as you asked and move all logic to |
well I'ld lke to make the doc changes I have above, and not have 3 different ensure_key_mapped functions. you had 1 or maybe 2 before. Please simplify. I am happy to take changes as a followup, but this is too much now. |
@jreback. Understood. I'll address doc comments and revert. |
thanks. this is just a very large PR and hard to grok what has changed. |
@jreback ok I just reverted to the prior version and updated documentation. I had to force push, but I just reverted to the commit before the major changes and then added two commits on top. |
kk will look soon thanks @jacobaustin123 |
@@ -2986,6 +3039,9 @@ def sort_values( | |||
) | |||
|
|||
def _try_kind_sort(arr): | |||
arr = ensure_key_mapped(arr, key) | |||
arr = getattr(arr, "_values", arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in followon you can use extract_array
here
@@ -204,15 +218,14 @@ def lexsort_indexer(keys, orders=None, na_position: str = "last"): | |||
elif orders is None: | |||
orders = [True] * len(keys) | |||
|
|||
for key, order in zip(keys, orders): | |||
|
|||
for k, order in zip(keys, orders): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this
We've added a ``key`` argument to the DataFrame and Series sorting methods, including | ||
:meth:`DataFrame.sort_values`, :meth:`DataFrame.sort_index`, :meth:`Series.sort_values`, | ||
and :meth:`Series.sort_index`. The ``key`` can be any callable function which is applied | ||
column-by-column to each column used for sorting, before sorting is performed (:issue:`27237`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add the orginal issue number here (or rather replace this issue number )
thanks @jacobaustin123 3 followon comments if you would. Also happy to take a refactoring PR in sorting generally. Targeted PRs are best. |
@jreback awesome thank you! I'll address these and maybe do a small refactoring PR. And maybe a second PR to add dictionary key sorting. |
….sort_index(key=...) sorting (pandas-dev#27237)
git diff upstream/master -u -- "*.py" | flake8 --diff
Added a key parameter to the DataFrame/Series.sort_values and sort_index functions matching Python sorted semantics for allowing custom sorting orders. Address open issue #3942.