Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Fix Series nsmallest and nlargest docstring/doctests #22731

Merged
merged 5 commits into from
Sep 18, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ci/doctests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ if [ "$DOCTEST" ]; then
fi

pytest --doctest-modules -v pandas/core/series.py \
-k"-nlargest -nonzero -nsmallest -reindex -searchsorted -to_dict"
-k"-nonzero -reindex -searchsorted -to_dict"

if [ $? -ne "0" ]; then
RET=1
Expand Down
157 changes: 114 additions & 43 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2743,17 +2743,20 @@ def nlargest(self, n=5, keep='first'):

Parameters
----------
n : int
Return this many descending sorted values
keep : {'first', 'last'}, default 'first'
Where there are duplicate values:
- ``first`` : take the first occurrence.
- ``last`` : take the last occurrence.
n : int, default 5
Return this many descending sorted values.
keep : str, default 'first'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep {'first', 'last', 'all'}, as I don't think there is any other value allowed. That applies to both docstrings.

When there are duplicate values that cannot all fit in a
Series of `n` elements:
- ``first`` : take the first occurrences based on the index order
- ``last`` : take the last occurrences based on the index order
- ``all`` : keep all occurrences. This can result in a Series of
size larger than `n`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the period here on the last bullet required to pass the docstring validation as-is? Shouldn't be necessary but if that's the intent here just something we should address separately @datapythonista

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that the validation fails if the last period is not present.


Returns
-------
top_n : Series
The n largest values in the Series, in sorted order
Series
The n largest values in the Series, sorted in decreasing order.

Notes
-----
Expand All @@ -2762,23 +2765,56 @@ def nlargest(self, n=5, keep='first'):

See Also
--------
Series.nsmallest
Series.nsmallest: Get the `n` smallest elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add sort_values and head here, as they're mentioned in the notes. In both docstrings.


Examples
--------
>>> s = pd.Series(np.random.randn(10**6))
>>> s.nlargest(10) # only sorts up to the N requested
219921 4.644710
82124 4.608745
421689 4.564644
425277 4.447014
718691 4.414137
43154 4.403520
283187 4.313922
595519 4.273635
503969 4.250236
121637 4.240952
dtype: float64
>>> countries_population = {"Italy": 59000000, "France": 65000000,
... "Malta": 434000, "Maldives": 434000,
... "Brunei": 434000, "Iceland": 337000,
... "Nauru": 11300, "Tuvalu": 11300,
... "Anguilla": 11300, "Monserat": 5200}
>>> s = pd.Series(countries_population)
>>> s
Italy 59000000
France 65000000
Malta 434000
Maldives 434000
Brunei 434000
Iceland 337000
Nauru 11300
Tuvalu 11300
Anguilla 11300
Monserat 5200
dtype: int64

>>> s.nlargest()
France 65000000
Italy 59000000
Malta 434000
Maldives 434000
Brunei 434000
dtype: int64

>>> s.nlargest(3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have just a quick one-liner to highlight the difference between this and the subsequent example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a comment at the end of the line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No just some text in between the examples to call out what the user should be looking at

France 65000000
Italy 59000000
Malta 434000
dtype: int64

>>> s.nlargest(3, keep='last')
France 65000000
Italy 59000000
Brunei 434000
dtype: int64

>>> s.nlargest(3, keep='all')
France 65000000
Italy 59000000
Malta 434000
Maldives 434000
Brunei 434000
dtype: int64
"""
return algorithms.SelectNSeries(self, n=n, keep=keep).nlargest()

Expand All @@ -2789,16 +2825,19 @@ def nsmallest(self, n=5, keep='first'):
Parameters
----------
n : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the default here?

Return this many ascending sorted values
keep : {'first', 'last'}, default 'first'
Where there are duplicate values:
- ``first`` : take the first occurrence.
- ``last`` : take the last occurrence.
Return this many ascending sorted values.
keep : str, default 'first'
When there are duplicate values that cannot all fit in a
Series of `n` elements:
- ``first`` : take the first occurrences based on the index order
- ``last`` : take the last occurrences based on the index order
- ``all`` : keep all occurrences. This can result in a Series of
size larger than `n`.

Returns
-------
bottom_n : Series
The n smallest values in the Series, in sorted order
Series
The n smallest values in the Series, sorted in increasing order.

Notes
-----
Expand All @@ -2807,23 +2846,55 @@ def nsmallest(self, n=5, keep='first'):

See Also
--------
Series.nlargest
Series.nlargest: Get the `n` largest elements.

Examples
--------
>>> s = pd.Series(np.random.randn(10**6))
>>> s.nsmallest(10) # only sorts up to the N requested
288532 -4.954580
732345 -4.835960
64803 -4.812550
446457 -4.609998
501225 -4.483945
669476 -4.472935
973615 -4.401699
621279 -4.355126
773916 -4.347355
359919 -4.331927
dtype: float64
>>> countries_population = {"Italy": 59000000, "France": 65000000,
... "Brunei": 434000, "Malta": 434000,
... "Maldives": 434000, "Iceland": 337000,
... "Nauru": 11300, "Tuvalu": 11300,
... "Anguilla": 11300, "Monserat": 5200}
>>> s = pd.Series(countries_population)
>>> s
Italy 59000000
France 65000000
Brunei 434000
Malta 434000
Maldives 434000
Iceland 337000
Nauru 11300
Tuvalu 11300
Anguilla 11300
Monserat 5200
dtype: int64

>>> s.nsmallest()
Monserat 5200
Nauru 11300
Tuvalu 11300
Anguilla 11300
Iceland 337000
dtype: int64

>>> s.nsmallest(3)
Monserat 5200
Nauru 11300
Tuvalu 11300
dtype: int64

>>> s.nsmallest(3, keep='last')
Monserat 5200
Anguilla 11300
Tuvalu 11300
dtype: int64

>>> s.nsmallest(3, keep='all')
Monserat 5200
Nauru 11300
Tuvalu 11300
Anguilla 11300
dtype: int64
"""
return algorithms.SelectNSeries(self, n=n, keep=keep).nsmallest()

Expand Down