BUG: Nested sort with NaN #3917

hayd · 2013-06-15T18:48:37Z

Nested sort doesn't seem to work with NaNs, see this SO question.

In [11]: df
Out[11]:
    a   b
0   1   9
1   2 NaN
2 NaN   5
3   1   2
4   6   5
5   8   4
6   4   5

In [12]: df.sort(columns=["a","b"])
Out[12]:
    a   b
3   1   2
0   1   9
1   2 NaN
2 NaN   5
6   4   5
4   6   5
5   8   4

(It works as expected using a single columns)

The text was updated successfully, but these errors were encountered:

hayd · 2013-06-15T19:14:03Z

And failing hacks:

In [27]: df.sort("a").groupby("a", group_keys=False).apply(lambda x: x.sort("b"))
Out[27]:
   a   b
3  1   2
0  1   9
1  2 NaN
6  4   5
4  6   5
5  8   4
# missing 2.

In [28]: df.sort("a").groupby("a", group_keys=False).apply(lambda x: x)
Out[28]:
    a   b
0   1   9
3   1   2
1   2 NaN
6   4   5
4   6   5
5   8   4
2 NaN NaN

cpcloud · 2013-06-15T20:55:42Z

think inf should behave the same way too and respect the ascending param

hayd · 2013-06-15T21:24:49Z

Note that is the way it works with one col:

In [19]: df.sort("a")
Out[19]:
    a   b
0   1   9
3   1   2
1   2 NaN
6   4   5
4   6   5
5   8   4
2 NaN   5

Also Series order method (pretty much the same as sort) offers na_last argument:

na_last : boolean (optional, default=True)
    Put NaN's at beginning or end

jcjf · 2013-07-26T16:45:36Z

I was shocked to discover this issue as well. I think the problem is in the Cython function called within the else statement in pandas.core.groupby._indexer_from_factorized:

if max_group > 1e6:
    # Use mergesort to avoid memory errors in counting sort
    indexer = comp_ids.argsort(kind='mergesort')
else:
    indexer, _ = _algos.groupsort_indexer(comp_ids.astype(np.int64),
                                          max_group)

Unfortunately, I don't know enough about debugging Cython code to help out more than this.

unutbu mentioned this issue Oct 15, 2013

EHN/FIX: Add na_last parameter to DataFrame.sort. Fixes GH3917 #5231

Merged

jreback closed this as completed in #5231 Mar 27, 2014

This was referenced Apr 3, 2014

DEPR: create issues for the current FutureWarnings in pandas #6641

Closed

Remove number of deprecated parameters/functions/classes [fix #6641] #6813

Merged

jreback mentioned this issue Jul 26, 2016

DEPR: deprecations log for removed issues #13777

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Nested sort with NaN #3917

BUG: Nested sort with NaN #3917

hayd commented Jun 15, 2013

hayd commented Jun 15, 2013

cpcloud commented Jun 15, 2013

hayd commented Jun 15, 2013

jcjf commented Jul 26, 2013

BUG: Nested sort with NaN #3917

BUG: Nested sort with NaN #3917

Comments

hayd commented Jun 15, 2013

hayd commented Jun 15, 2013

cpcloud commented Jun 15, 2013

hayd commented Jun 15, 2013

jcjf commented Jul 26, 2013