Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating DataFrame.mode docstring. #22404

Merged
merged 12 commits into from
Sep 30, 2018
83 changes: 66 additions & 17 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -7213,38 +7213,87 @@ def _get_agg_axis(self, axis_num):

def mode(self, axis=0, numeric_only=False, dropna=True):
"""
Gets the mode(s) of each element along the axis selected. Adds a row
for each mode per label, fills in gaps with nan.
Get the mode(s) of each element along the selected axis.

Note that there could be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
the reason why a dataframe is returned. If you want to impute missing
values with the mode in a dataframe ``df``, you can just do this:
``df.fillna(df.mode().iloc[0])``
The mode of a set of values is the value that appears most often.
It can be multiple values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an example to show this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first example shows this (wings has two modes)


Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
* 0 or 'index' : get mode of each column
* 1 or 'columns' : get mode of each row
numeric_only : boolean, default False
if True, only apply to numeric columns
dropna : boolean, default True
The axis to iterate over while searching for the mode.
To find the mode for each column, use ``axis='index'``.
To find the mode for each row, use ``axis='columns'``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the first line of your summary. However, I would put back the two bullet points.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, fixed. And we should probably define a standard and be consistent when documenting axis (added it to #20298)

numeric_only : bool, default False
If True, only apply to numeric columns.
dropna : bool, default True
Don't consider counts of NaN/NaT.

.. versionadded:: 0.24.0

Returns
-------
modes : DataFrame (sorted)
DataFrame
The modes of each column or row.

See Also
--------
Series.mode : Return the highest frequency value in a Series.
Series.value_counts : Return the counts of values in a Series.

Notes
-----
Every column or row of the resulting DataFrame contains all its modes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May just be me but I don't understand the Notes section - is this necessary or better explained via examples?

And possibly NaN values at the end (if other columns or rows have a
higher number of modes).

Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 2, 3]})
>>> df = pd.DataFrame([('bird', 2, 2),
... ('mammal', 4, np.nan),
... ('arthropod', 8, 0),
... ('bird', 2, np.nan)],
... index=('falcon', 'horse', 'spider', 'ostrich'),
... columns=('species', 'legs', 'wings'))
>>> df
species legs wings
falcon bird 2 2.0
horse mammal 4 NaN
spider arthropod 8 0.0
ostrich bird 2 NaN

By default, missing values are not considered, and the mode of winds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you mean to say wings here and not winds; side note - ostriches do still have wings :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN means unknown in this case, not zero, and I need missing values to show the behavior. But yeah, agree. ;)

are both 0 and 2. The second row of species and legs contains NaN,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double backticks for NaN?

because they have only one mode, but the DataFrame has two rows.

>>> df.mode()
A
0 1
1 2
species legs wings
0 bird 2.0 0.0
1 NaN NaN 2.0

Setting ``dropna=False`` NaN values are considered and they can be the
mode (like for wings).

>>> df.mode(dropna=False)
species legs wings
0 bird 2 NaN

Setting ``numeric_only=True``, only the mode of numeric columns is
computed, and columns of other types are ignored.

>>> df.mode(numeric_only=True)
legs wings
0 2.0 0.0
1 NaN 2.0

To compute the mode over columns and not rows, use the axis parameter:

>>> df.mode(axis='columns', numeric_only=True)
0 1
falcon 2.0 NaN
horse 4.0 NaN
spider 0.0 8.0
ostrich 2.0 NaN
"""
data = self if not numeric_only else self._get_numeric_data()

Expand Down