Skip to content

Commit

Permalink
DOC: followup to pandas-dev#20583, observed kwarg for .groupby
Browse files Browse the repository at this point in the history
  • Loading branch information
jreback committed May 3, 2018
1 parent ce4ab82 commit 243d087
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 28 deletions.
2 changes: 1 addition & 1 deletion doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -994,7 +994,7 @@ is only interesting over one column (here ``colname``), it may be filtered
Handling of (un)observed Categorical values
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When using a ``Categorical`` grouper (as a single or as part of multipler groupers), the ``observed`` keyword
When using a ``Categorical`` grouper (as a single grouper, or as part of multipler groupers), the ``observed`` keyword
controls whether to return a cartesian product of all possible groupers values (``observed=False``) or only those
that are observed groupers (``observed=True``).

Expand Down
8 changes: 5 additions & 3 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -419,9 +419,11 @@ documentation. If you build an extension array, publicize it on our
Categorical Groupers has gained an observed keyword
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, grouping by 1 or more categorical columns would result in an index that was the cartesian product of all of the categories for
each grouper, not just the observed values.``.groupby()`` has gained the ``observed`` keyword to toggle this behavior. The default remains backward
compatible (generate a cartesian product). (:issue:`14942`, :issue:`8138`, :issue:`15217`, :issue:`17594`, :issue:`8669`, :issue:`20583`)
Grouping by a categorical includes the unobserved categories in the output.
When grouping with multiple groupers, this means you get the cartesian product of all the
categories, including combinations where there are no observations, which can result in a large
number of groupers. We have added a keyword ``observed`` to control this behavior, it defaults to
``observed=False`` for backward-compatiblity. (:issue:`14942`, :issue:`8138`, :issue:`15217`, :issue:`17594`, :issue:`8669`, :issue:`20583`, :issue:`20902`)


.. ipython:: python
Expand Down
11 changes: 5 additions & 6 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -6584,7 +6584,7 @@ def clip_lower(self, threshold, axis=None, inplace=False):
axis=axis, inplace=inplace)

def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,
group_keys=True, squeeze=False, observed=None, **kwargs):
group_keys=True, squeeze=False, observed=False, **kwargs):
"""
Group series using mapper (dict or key function, apply given function
to group, return result as series) or by a series of columns.
Expand Down Expand Up @@ -6617,11 +6617,10 @@ def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,
squeeze : boolean, default False
reduce the dimensionality of the return type if possible,
otherwise return a consistent type
observed : boolean, default None
if True: only show observed values for categorical groupers.
if False: show all values for categorical groupers.
if None: if any categorical groupers, show a FutureWarning,
default to False.
observed : boolean, default False
This only applies if any if the groupers are Categoricals
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.
.. versionadded:: 0.23.0
Expand Down
32 changes: 14 additions & 18 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -556,7 +556,7 @@ class _GroupBy(PandasObject, SelectionMixin):
def __init__(self, obj, keys=None, axis=0, level=None,
grouper=None, exclusions=None, selection=None, as_index=True,
sort=True, group_keys=True, squeeze=False,
observed=None, **kwargs):
observed=False, **kwargs):

self._selection = selection

Expand Down Expand Up @@ -2907,7 +2907,7 @@ class Grouping(object):
"""

def __init__(self, index, grouper=None, obj=None, name=None, level=None,
sort=True, observed=None, in_axis=False):
sort=True, observed=False, in_axis=False):

self.name = name
self.level = level
Expand Down Expand Up @@ -2964,12 +2964,6 @@ def __init__(self, index, grouper=None, obj=None, name=None, level=None,
# a passed Categorical
elif is_categorical_dtype(self.grouper):

# observed can be True/False/None
# we treat None as False. If in the future
# we need to warn if observed is not passed
# then we have this option
# gh-20583

self.all_grouper = self.grouper
self.grouper = self.grouper._codes_for_groupby(
self.sort, observed)
Expand Down Expand Up @@ -3088,7 +3082,7 @@ def groups(self):


def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
observed=None, mutated=False, validate=True):
observed=False, mutated=False, validate=True):
"""
create and return a BaseGrouper, which is an internal
mapping of how to create the grouper indexers.
Expand Down Expand Up @@ -4734,26 +4728,28 @@ def _wrap_agged_blocks(self, items, blocks):

def _reindex_output(self, result):
"""
if we have categorical groupers, then we want to make sure that
If we have categorical groupers, then we want to make sure that
we have a fully reindex-output to the levels. These may have not
participated in the groupings (e.g. may have all been
nan groups)
nan groups);
This can re-expand the output space
"""

# TODO(jreback): remove completely
# when observed parameter is defaulted to True
# gh-20583

if self.observed:
return result

# we need to re-expand the output space to accomadate all values
# whethere observed or not in the cartesian product of our groupes
groupings = self.grouper.groupings
if groupings is None:
return result
elif len(groupings) == 1:
return result

# if we only care about the observed values
# we are done
elif self.observed:
return result

# reindexing only appiles to a Categorical grouper
elif not any(isinstance(ping.grouper, (Categorical, CategoricalIndex))
for ping in groupings):
return result
Expand Down

0 comments on commit 243d087

Please sign in to comment.