Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update the DataFrame.mode method docstring #20241

Closed
wants to merge 5 commits into from
Closed
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 69 additions & 20 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@
# pylint: disable=E1101,E1103
# pylint: disable=W0212,W0231,W0703,W0622

import functools
import collections
import functools
import itertools
import sys
import types
Expand Down Expand Up @@ -111,9 +111,9 @@
by : str or list of str
Name or list of names to sort by.

- if `axis` is 0 or `'index'` then `by` may contain index
- if ``axis`` is 0 or ``'index'`` then `by` may contain index
levels and/or column labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try not to change unrelated things

- if `axis` is 1 or `'columns'` then `by` may contain column
- if ``axis`` is 1 or ``'columns'`` then `by` may contain column
levels and/or index labels

.. versionchanged:: 0.23.0
Expand Down Expand Up @@ -5873,35 +5873,84 @@ def _get_agg_axis(self, axis_num):

def mode(self, axis=0, numeric_only=False):
"""
Gets the mode(s) of each element along the axis selected. Adds a row
for each mode per label, fills in gaps with nan.

Note that there could be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
the reason why a dataframe is returned. If you want to impute missing
values with the mode in a dataframe ``df``, you can just do this:
``df.fillna(df.mode().iloc[0])``
Get the mode(s) of each element along the axis selected.

Adds a row for each mode per label, filling gaps with NaN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can add a line explaining what mode is


Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
* 0 or 'index' : get mode of each column
* 1 or 'columns' : get mode of each row
The axis to iterate over while searching for the mode.
To find the mode for each column, iterate over rows (``axis=0``,
default behaviour).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the parens, it doesn't read well

To find the mode for each row, iterate over columns (``axis=1``).
numeric_only : boolean, default False
if True, only apply to numeric columns
If True, only apply to numeric dimensions.

Returns
-------
modes : DataFrame (sorted)
A DataFrame containing the modes.
If ``axis=0``, there will be one column per column in the original
DataFrame, with as many rows as there are modes.
If ``axis=1``, there will be one row per row in the original
DataFrame, with as many columns as there are modes.

Notes
-----
There may be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no parens, just use a comma. capitalize DataFrame.

you can leave off the impute sentence

the reason why a dataframe is returned. If you want to impute missing
values with the mode in a dataframe ``df``, you can just do this:
``df.fillna(df.mode().iloc[0])``.

See Also
--------
Series.mode : Return the highest frequency value in a Series.
Series.value_counts : Returns a Series with all occuring values as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counts of values

indices and the number of occurences as values.

Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 2, 3]})
>>> df.mode()
A
0 1
1 2
"""

``mode`` returns a DataFrame with multiple rows if there is more than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this sentence as its above

one mode. Missing entries are imputed with NaN.

>>> grades = pd.DataFrame({
... 'Science': [80, 70, 80, 75, 80, 75, 85, 90, 80, 70],
... 'Math': [70, 70, 75, 75, 80, 80, 85, 85, 90, 90]
... })
>>> grades.apply(lambda x: x.value_counts())
Science Math
70 2 2
75 2 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the mode example first

80 4 2
85 1 2
90 1 2
>>> grades.mode()
Science Math
0 80.0 70
1 NaN 75
2 NaN 80
3 NaN 85
4 NaN 90

Use ``axis=1`` to apply mode over columns (get the mode of each row).

>>> student_grades = pd.DataFrame.from_dict({
... 'Alice': [80, 85, 90, 85, 95],
... 'Bob': [70, 80, 80, 75, 90]
... }, 'index')
>>> student_grades
0 1 2 3 4
Alice 80 85 90 85 95
Bob 70 80 80 75 90
>>> student_grades.mode(axis=1)
0
Alice 85
Bob 80
"""

data = self if not numeric_only else self._get_numeric_data()

def f(s):
Expand Down