Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename categories with Series #17982

Merged
merged 9 commits into from
Oct 26, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,36 @@ Now, to find prices per store/product, we can simply do:
.pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum())
.unstack().round(2))


.. _whatsnew_0210.enhancements.reanme_categories:

``Categorical.rename_categories`` accepts a dict-like
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:`~Series.cat.rename_categories` now accepts a dict-like argument for
``new_categories``. The previous categories are looked up in the dictionary's
keys and replaced if found. The behavior of missing and extra keys is the same
as in :meth:`DataFrame.rename`.

.. ipython:: python

c = pd.Categorical(['a', 'a', 'b'])
c.rename_categories({"a": "eh", "b": "bee"})

.. warning::

To assist with upgrading pandas, ``rename_categories`` treats ``Series`` as
list-like. Typically, they are considered to be dict-like, and in a future
version of pandas ``rename_categories`` will change to treat them as
dict-like.

.. ipython:: python
:okwarning:

c.rename_categories(pd.Series([0, 1], index=['a', 'c']))

Follow the warning message's recommendations.

See the :ref:`documentation <groupby.pipe>` for more.

.. _whatsnew_0210.enhancements.other:
Expand Down Expand Up @@ -267,7 +297,6 @@ Other Enhancements
- :func:`DataFrame.items` and :func:`Series.items` are now present in both Python 2 and 3 and is lazy in all cases. (:issue:`13918`, :issue:`17213`)
- :func:`Styler.where` has been implemented as a convenience for :func:`Styler.applymap`. (:issue:`17474`)
- :func:`MultiIndex.is_monotonic_decreasing` has been implemented. Previously returned ``False`` in all cases. (:issue:`16554`)
- :func:`Categorical.rename_categories` now accepts a dict-like argument as ``new_categories`` and only updates the categories found in that dict. (:issue:`17336`)
- :func:`read_excel` raises ``ImportError`` with a better message if ``xlrd`` is not installed. (:issue:`17613`)
- :func:`read_json` now accepts a ``chunksize`` parameter that can be used when ``lines=True``. If ``chunksize`` is passed, read_json now returns an iterator which reads in ``chunksize`` lines with each iteration. (:issue:`17048`)
- :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names. (:issue:`14207`)
Expand Down
49 changes: 41 additions & 8 deletions pandas/core/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -866,11 +866,6 @@ def set_categories(self, new_categories, ordered=None, rename=False,
def rename_categories(self, new_categories, inplace=False):
""" Renames categories.
The new categories can be either a list-like dict-like object.
If it is list-like, all items must be unique and the number of items
in the new categories must be the same as the number of items in the
old categories.
Raises
------
ValueError
Expand All @@ -879,15 +874,30 @@ def rename_categories(self, new_categories, inplace=False):
Parameters
----------
new_categories : Index-like or dict-like (>=0.21.0)
The renamed categories.
new_categories : list-like or dict-like
* list-like: all items must be unique and the number of items in
the new categories must match the existing number of categories.
* dict-like: specifies a mapping from
old categories to new. Categories not contained in the mapping
are passed through and extra categories in the mapping are
ignored. *New in version 0.21.0*.
.. warning::
Currently, Series are considered list like. In a future version
of pandas they'll be considered dict-like.
inplace : boolean (default: False)
Whether or not to rename the categories inplace or return a copy of
this categorical with renamed categories.
Returns
-------
cat : Categorical with renamed categories added or None if inplace.
cat : Categorical or None
With ``inplace=False``, the new categorical is returned.
With ``inplace=True``, there is no return value.
See also
--------
Expand All @@ -896,10 +906,33 @@ def rename_categories(self, new_categories, inplace=False):
remove_categories
remove_unused_categories
set_categories
Examples
--------
>>> c = Categorical(['a', 'a', 'b'])
>>> c.rename_categories([0, 1])
[0, 0, 1]
Categories (2, int64): [0, 1]
For dict-like ``new_categories``, extra keys are ignored and
categories not in the dictionary are passed through
>>> c.rename_categories({'a': 'A', 'c': 'C'})
[A, A, b]
Categories (2, object): [A, b]
"""
inplace = validate_bool_kwarg(inplace, 'inplace')
cat = self if inplace else self.copy()

if isinstance(new_categories, ABCSeries):
msg = ("Treating Series 'new_categories' as a list-like and using "
"the values. In a future version, 'rename_categories' will "
"treat Series like a dictionary.\n"
"For dict-like, use 'new_categories.to_dict()'\n"
"For list-like, use 'new_categories.values'.")
warn(msg, FutureWarning, stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe convert the series to array (list-like), so then the rest of the code does not need to take care of it being a series or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go for "Series -> dict-like" behaviour, this is a breaking change, and we need to use a warning for that.

Sorry I think that was an example I added in the first commit of this PR, before we decided to treat Series as list-like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a breaking change at all? we simply did not support this before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old behavior required a list-like, and Series are list like. It's not unreasonable for a user to expect

cat.rename(Series([0, 1]))

to work, since it did! But we have a new feature that changes the behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I c, I think this was accidently supported before. ok so fine on the FutureWarning.

new_categories = list(new_categories)

if is_dict_like(new_categories):
cat.categories = [new_categories.get(item, item)
for item in cat.categories]
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -1203,6 +1203,18 @@ def test_rename_categories(self):
with pytest.raises(ValueError):
cat.rename_categories([1, 2])

def test_rename_categories_series(self):
# https://github.com/pandas-dev/pandas/issues/17981
c = pd.Categorical(['a', 'b'])
xpr = "Treating Series 'new_categories' as a list-like "
with tm.assert_produces_warning(FutureWarning) as rec:
result = c.rename_categories(pd.Series([0, 1]))

assert len(rec) == 1
assert xpr in str(rec[0].message)
expected = pd.Categorical([0, 1])
tm.assert_categorical_equal(result, expected)

def test_rename_categories_dict(self):
# GH 17336
cat = pd.Categorical(['a', 'b', 'c', 'd'])
Expand Down