-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected output for nlargest
function with multiple columns
#22752
Labels
Milestone
Comments
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 18, 2018
When asking for the n largest/smallest rows in a dataframe nlargest/nsmallest sometimes failed to differentiate the correct result based on the latter columns.
4 tasks
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 18, 2018
When asking for the n largest/smallest rows in a dataframe nlargest/nsmallest sometimes failed to differentiate the correct result based on the latter columns.
WillAyd
added
Bug
Algos
Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff
DataFrame
DataFrame data structure
labels
Sep 19, 2018
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 19, 2018
When asking for the n largest/smallest rows in a dataframe nlargest/nsmallest sometimes failed to differentiate the correct result based on the latter columns.
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 19, 2018
When asking for the n largest/smallest rows in a dataframe nlargest/nsmallest sometimes failed to differentiate the correct result based on the latter columns.
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 19, 2018
When asking for the n largest/smallest rows in a dataframe nlargest/nsmallest sometimes failed to differentiate the correct result based on the latter columns.
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 22, 2018
When asking for the n largest/smallest rows in a dataframe nlargest/nsmallest sometimes failed to differentiate the correct result based on the latter columns.
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 23, 2018
When asking for the n largest/smallest rows in a dataframe nlargest/nsmallest sometimes failed to differentiate the correct result based on the latter columns.
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 23, 2018
When asking for the n largest/smallest rows in a dataframe nlargest/nsmallest sometimes failed to differentiate the correct result based on the latter columns.
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 23, 2018
When asking for the n largest/smallest rows in a dataframe nlargest/nsmallest sometimes failed to differentiate the correct result based on the latter columns.
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 23, 2018
When asking for the n largest/smallest rows in a dataframe nlargest/nsmallest sometimes failed to differentiate the correct result based on the latter columns.
jreback
pushed a commit
that referenced
this issue
Sep 25, 2018
Sup3rGeo
pushed a commit
to Sup3rGeo/pandas
that referenced
this issue
Oct 1, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Code Sample
Actual Output
Text within square brackets added to call attention to rows with unexpected output.
Problem description
According to the documentation for nlargest, the
nlargest
function should function identically todf.sort_values(columns, ascending=False).head(n)
but be more performant. Presumably this is more performant due to not needing to sort the entire dataframe.I am observing different behavior. In the example above, I expect the first and second dataframes to be the same in both indices and values. (Note that I've sorted the output of the
nlargest
function to remove sort order as a difference).Similar issues, but different enough that I opened a new one
#21426 - Deals with unsigned ints, this issue uses signed
int64
s.#19563 - Different by sort order only, this issue is different in that the rows themselves are a different, non-unique subset of the original rows.
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-34-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: None
pip: 9.0.3
setuptools: 40.4.1
Cython: None
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.7
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: