Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating DataFrame.mode docstring. #22404

Merged
merged 12 commits into from
Sep 30, 2018
Merged
74 changes: 59 additions & 15 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -7248,38 +7248,82 @@ def _get_agg_axis(self, axis_num):

def mode(self, axis=0, numeric_only=False, dropna=True):
"""
Gets the mode(s) of each element along the axis selected. Adds a row
for each mode per label, fills in gaps with nan.
Get the mode(s) of each element along the selected axis.

Note that there could be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
the reason why a dataframe is returned. If you want to impute missing
values with the mode in a dataframe ``df``, you can just do this:
``df.fillna(df.mode().iloc[0])``
The mode of a set of values is the value that appears most often.
It can be multiple values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an example to show this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first example shows this (wings has two modes)


Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to iterate over while searching for the mode:

* 0 or 'index' : get mode of each column
* 1 or 'columns' : get mode of each row
numeric_only : boolean, default False
if True, only apply to numeric columns
dropna : boolean, default True
numeric_only : bool, default False
If True, only apply to numeric columns.
dropna : bool, default True
Don't consider counts of NaN/NaT.

.. versionadded:: 0.24.0

Returns
-------
modes : DataFrame (sorted)
DataFrame
The modes of each column or row.

See Also
--------
Series.mode : Return the highest frequency value in a Series.
Series.value_counts : Return the counts of values in a Series.

Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 2, 3]})
>>> df = pd.DataFrame([('bird', 2, 2),
... ('mammal', 4, np.nan),
... ('arthropod', 8, 0),
... ('bird', 2, np.nan)],
... index=('falcon', 'horse', 'spider', 'ostrich'),
... columns=('species', 'legs', 'wings'))
>>> df
species legs wings
falcon bird 2 2.0
horse mammal 4 NaN
spider arthropod 8 0.0
ostrich bird 2 NaN

By default, missing values are not considered, and the mode of wings
are both 0 and 2. The second row of species and legs contains ``NaN``,
because they have only one mode, but the DataFrame has two rows.

>>> df.mode()
A
0 1
1 2
species legs wings
0 bird 2.0 0.0
1 NaN NaN 2.0

Setting ``dropna=False`` ``NaN`` values are considered and they can be
the mode (like for wings).

>>> df.mode(dropna=False)
species legs wings
0 bird 2 NaN

Setting ``numeric_only=True``, only the mode of numeric columns is
computed, and columns of other types are ignored.

>>> df.mode(numeric_only=True)
legs wings
0 2.0 0.0
1 NaN 2.0

To compute the mode over columns and not rows, use the axis parameter:

>>> df.mode(axis='columns', numeric_only=True)
0 1
falcon 2.0 NaN
horse 4.0 NaN
spider 0.0 8.0
ostrich 2.0 NaN
"""
data = self if not numeric_only else self._get_numeric_data()

Expand Down