ENH/BUG: Rename of MultiIndex DataFrames does not work #4160

aschilling · 2013-07-08T13:52:55Z

xref #14139 for empty MI

Hi everybody,

in the current version renaming of MultiIndex DataFrames does not work. Lets take the following example:

import datetime as DT
import pandas as pd
df = pd.DataFrame({
'Branch' : 'A A A A A B'.split(),
'Buyer': 'Carl Mark Carl Joe Mark Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
    DT.datetime(2013,9,1,13,0),
    DT.datetime(2013,9,1,13,5),
    DT.datetime(2013,10,1,20,0),
    DT.datetime(2013,10,3,10,0),
    DT.datetime(2013,12,2,12,0),                                      
    DT.datetime(2013,12,2,14,0),
    ]})

and the following query:

test_df = df[df['Buyer'].isin(['Carl', 'Mark'])].set_index('Buyer', append=True)[['Date']].unstack(['Buyer'])

Now, the following renaming does not work

test_df.rename(columns={('Date', 'Carl'): 'Carl'}, inplace=True)

Thanks in advance

Andy

The text was updated successfully, but these errors were encountered:

jreback · 2013-07-08T16:01:45Z

what are you trying to accomplish, that doesn't make sense, renaming a 2-level to a single-level

You can select out the level however; is this what you are after?

In [8]: test_df['Date']
Out[8]: 
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

aschilling · 2013-07-08T17:02:05Z

Hi everybody,

Sorry, the example ist not the best. Actually, I did some trend generation today and after updating to pandas current branch a lot of my code didn't work anymore because that rename function used to work as described above.
There are basically two scenarios where I used that hierarchical renaming:

When generating stock trends (a short variable name is much easier to use than those tuples)
When doing some complex operations as in the SO article, where the original variable (Date) does not matter any more but only that there is a matching date between Carl and Mark

hayd · 2013-07-08T17:17:18Z

I favour xs here, bit more explicit:

In [11]: test_df.xs('Date', axis=1)
Out[11]:
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

# or maybe
test_df.columns = test_df.columns.droplevel(0)
# or 
test_df.columns =  test_df.columns.get_level_values('Buyer')

The fact that replace was working before smells like a bug, as @jreback says, it doesn't really make any sense to rename like that...

cpcloud · 2013-07-08T17:43:02Z

renaming doesn't work for multiindexes period, whether it makes sense or not:

In [7]: df
Out[7]:
  Branch Buyer                Date  Quantity
0      A  Carl 2013-09-01 13:00:00         1
1      A  Mark 2013-09-01 13:05:00         3
2      A  Carl 2013-10-01 20:00:00         5
3      A   Joe 2013-10-03 10:00:00         8
4      A  Mark 2013-12-02 12:00:00         9
5      B  Carl 2013-12-02 14:00:00         3

In [8]: test_df = df[df['Buyer'].isin(['Carl', 'Mark'])].set_index('Buyer', append=True)[['Date']].unstack(['Buyer'])

In [9]: test_df
Out[9]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [10]: test_df.rename(columns={('Date', 'Carl'): ('Care')})
Out[10]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [11]: test_df.rename(columns={('Date', 'Carl'): ('Care', "sdf")})
Out[11]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [12]: test_df.rename(columns={('Date', 'Carl'): ('Care', "sdf")})

hayd · 2013-07-08T18:19:29Z

Now that is a bug/feature request :)

Maybe you ought to be able to replace on each level for a MultiIndex, say using

test_df.rename(columns={'Buyer': {'Carl' : 'sdf'}})

not sure...

jreback · 2013-07-08T18:20:40Z

need to add level arg to rename maybe?

hayd · 2013-07-08T18:23:05Z

not sure level argument works/makes sense since rename allows you to change both index and columns at the same time:s

hayd · 2013-07-08T19:11:44Z

Although perhaps my suggestion doesn't either (if want to replace same things as the level name/number)...

jreback · 2013-12-11T21:09:14Z

work-around here http://stackoverflow.com/questions/20529619/renaming-index-values-in-multiindex-dataframe

8one6 · 2014-11-18T16:24:41Z

I think this is still an open issue. It would be great to be able to treat the column labels as tuples and just use rename in the "natural" (at least natural to me) way. For example:

df.rename(columns={c: (str(c[0]) + 'foo', str(c[1]) + 'bar') for c in df.columns})

jreback · 2014-11-18T17:07:48Z

@8one6 this is an open issue currently.

this still waiting for an API to deal with the multi-level API.

I am not sure I like the stringifying idea. But haven't thought too much about this.

Maybe an actual example would help

8one6 · 2014-11-18T17:19:53Z

import numpy as np
import pandas as pd

rows = pd.Index(list('abcde'), names=['letter'])
columns = pd.MultiIndex.from_tuples([('px', c) for c in ['red', 'green', 'blue']], 
                                    names=['datum', 'color'])
df = pd.DataFrame(np.random.randn(len(rows), len(columns)), index=rows, columns=columns)

gives

datum        px                    
item        red     green      blue
a     -0.616822 -0.922983  0.148247
b     -0.383122 -0.451940  1.138330
c     -0.744860  2.299611  0.895295
d     -0.159886 -0.832159 -0.205430
e     -0.458384 -1.410207 -0.965780

So now I want to do this:

absdf = df.abs()
absdf.rename(columns={c: ('abspx', c[1]) for c in df.columns}, inplace=True)

but that doesn't do what I expect, it just gives back the unmodified frame. To accomplish what I want here, I would do:

newabsdf = df.abs()
newabsdf.columns = pd.MultiIndex.from_tuples([('abspx', c[1]) for c in df.columns], 
                                             names=df.columns.names)

which gives the desired result:

datum     abspx                    
item        red     green      blue
a      0.616822  0.922983  0.148247
b      0.383122  0.451940  1.138330
c      0.744860  2.299611  0.895295
d      0.159886  0.832159  0.205430
e      0.458384  1.410207  0.965780

Basically, in the multi-index context, I was expecting rename to "be happy" if the passed function/mapper/dictionary returned tuples with the correct number of elements. Am I doing something wrong above? Or would this be a new feature request? Or does this seem ambiguous in some way.

jreback · 2014-11-18T17:24:12Z

a multi-index renam at the moment does not work at all. The issue is how do you rename only part of a level

e.g.
red-> orange, how should I do this?
or
abspx -> foo

df.rename(columns={'red' : 'orange' }, level=1)

df.rename(columns={'abspx' : 'foo'},level=0)

but no way to do this (well it doesn't work), but does make sense

df.rename(columns={('abspx','red) : ('foo','orange')})

8one6 · 2014-11-18T17:28:30Z

Ah, ok. So that last code block in your comment, is that a reasonable thing to hope will work at some point? I.e. is there a reason that would be a bad API for doing fully general multilevel renaming? (I think that's what I had tried to achieve with my dict comprehension in my absdf.rename... line above)

And the other two lines...

df.rename(columns={'red' : 'orange' }, level=1)
df.rename(columns={'abspx' : 'foo'}, level=0)

is that the current working proposal? Or already implemented? Or up for debate?

jreback · 2014-11-18T20:28:28Z

I think the prior dont work (but prob don't need much). The last is a proposed API.

I think their is a pull-requests somewhere which does most of this but wasn't finished IIRC.

jorisvandenbossche · 2017-04-15T13:38:36Z

This was closed automatically by github, but that was not the intention (#15931 is not related to this)

jreback · 2017-04-15T14:04:24Z

hahah had a reference with the word fix in it!

mwiebusch78 · 2019-02-20T11:01:57Z

Is there any news on this? This bug is a constant annoyance when trying to do complex aggregations. Since the 'agg' method doesn't have a way to assign new names to the aggregate columns the recommended method (AFAIK) is to first aggregate and then rename. But something like

df.groupby('A').agg({'B': ['mean', 'median'], 'C': ['min', 'max']}

returns a multi-index which I then cannot rename. I actually had to write my own wrapper function for aggregations as a workaround. (Happy to share the code if there's interest.)

TomAugspurger · 2019-02-20T11:38:36Z

Still open if you're interested in working on it.

…

On Wed, Feb 20, 2019 at 5:02 AM mwiebusch78 ***@***.***> wrote: Is there any news on this? This bug is a constant annoyance when trying to do complex aggregations. Since the 'agg' method doesn't have a way to assign new names to the aggregate columns the recommended method (AFAIK) is to first aggregate and then rename. But something like df.groupby('A').agg({'B': ['mean', 'median'], 'C': ['mean', 'median']} returns a multi-index which I then cannot rename. I actually had to write my own wrapper function for aggregations as a workaround. (Happy to share the code if there's interest.) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4160 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIjgSL_9hodVQ7D7O9I4INHvpXtbRks5vPSsrgaJpZM4Ay_pV> .

normanius · 2020-11-26T00:13:02Z

Just sharing a workaround: renaming of tuples works for flattened indices.

df = pd.DataFrame([[1,2,3],[3,4,5],[5,6,7], [7,8,9]])
df.columns = pd.MultiIndex.from_tuples([('i','a'),('i','b'),('ii','a')])

# Alternative 1
df.columns = df.columns.to_flat_index()
df = df.rename(columns={('i','b'):('i','c')})
df.columns = pd.MultiIndex.from_tuples(df.columns)

# Alternative 2
i = df.columns.get_loc(('i','b'))
cols = df.columns.to_flat_index()
cols[i] = ('i','c')
df.columns = pd.MultiIndex.from_tuples(cols)

cklb · 2021-08-23T17:50:45Z

Just stumbled over this problem. The delicate part is that the function will always return successfully even if errors="raise" is passed.

Thanks to normanius for the workaround, if a fix is not that easy, maybe the docstring could be extended with a warning?

waitingkuo added a commit to waitingkuo/pandas that referenced this issue Aug 5, 2013

fix issue pandas-dev#4160, rename the MultiIndex

3c69575

waitingkuo mentioned this issue Aug 5, 2013

rename multi-index #4461

Closed

ghost assigned jtratner Sep 9, 2013

joeb1415 mentioned this issue Dec 6, 2013

rename multiindex bug in DataFrame #4023

Closed

jreback mentioned this issue Dec 6, 2013

BUG: Series.rename() ignores level argument on MultiIndex #5653

Closed

jreback mentioned this issue Apr 9, 2014

Change names of MultiIndex Index of a Dataframe #5273

Closed

jreback modified the milestones: 0.15.0, 0.14.0 Apr 22, 2014

jreback modified the milestones: 0.15.0, 0.15.1 Jul 6, 2014

jreback modified the milestones: 0.15.1, 0.15.0 Sep 8, 2014

jreback unassigned jtratner Sep 8, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

jreback modified the milestones: Next Minor Release, Next Major Release Mar 29, 2017

jreback mentioned this issue Apr 7, 2017

DEPR: deprecate relableling dicts in groupby.agg #15931

Merged

jreback modified the milestones: 0.20.0, Next Minor Release Apr 9, 2017

jreback closed this as completed in #15931 Apr 13, 2017

jorisvandenbossche reopened this Apr 15, 2017

jorisvandenbossche mentioned this issue Apr 15, 2017

Renaming MultiIndex values does not work at all (doesn't update values, or raise an exception) #16008

Closed

jreback modified the milestones: 0.20.0, Next Minor Release Apr 15, 2017

jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017

jreback mentioned this issue Mar 5, 2018

API: Allow dictionary argument in rename_axis to change some names of MultiIndex #19978

Closed

WillAyd mentioned this issue May 28, 2019

rename doesn't work with mutliindex #26498

Closed

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke removed Prio-high Internals Related to non-user accessible pandas implementation labels Apr 3, 2020

jreback mentioned this issue Nov 25, 2020

ENH: Rename multi-level columns or indices using their tupelized names #38069

Closed

mroeschke removed the API Design label Apr 11, 2021

jbrockmendel added the rename .rename, .rename_axis label Oct 29, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH/BUG: Rename of MultiIndex DataFrames does not work #4160

ENH/BUG: Rename of MultiIndex DataFrames does not work #4160

aschilling commented Jul 8, 2013 •

edited by jreback

Loading

jreback commented Jul 8, 2013

aschilling commented Jul 8, 2013

hayd commented Jul 8, 2013

cpcloud commented Jul 8, 2013

hayd commented Jul 8, 2013

jreback commented Jul 8, 2013

hayd commented Jul 8, 2013

hayd commented Jul 8, 2013

jreback commented Dec 11, 2013

8one6 commented Nov 18, 2014

jreback commented Nov 18, 2014

8one6 commented Nov 18, 2014

jreback commented Nov 18, 2014

8one6 commented Nov 18, 2014

jreback commented Nov 18, 2014

jorisvandenbossche commented Apr 15, 2017

jreback commented Apr 15, 2017

mwiebusch78 commented Feb 20, 2019 •

edited

Loading

TomAugspurger commented Feb 20, 2019 via email

normanius commented Nov 26, 2020 •

edited

Loading

cklb commented Aug 23, 2021

ENH/BUG: Rename of MultiIndex DataFrames does not work #4160

ENH/BUG: Rename of MultiIndex DataFrames does not work #4160

Comments

aschilling commented Jul 8, 2013 • edited by jreback Loading

jreback commented Jul 8, 2013

aschilling commented Jul 8, 2013

hayd commented Jul 8, 2013

cpcloud commented Jul 8, 2013

hayd commented Jul 8, 2013

jreback commented Jul 8, 2013

hayd commented Jul 8, 2013

hayd commented Jul 8, 2013

jreback commented Dec 11, 2013

8one6 commented Nov 18, 2014

jreback commented Nov 18, 2014

8one6 commented Nov 18, 2014

jreback commented Nov 18, 2014

8one6 commented Nov 18, 2014

jreback commented Nov 18, 2014

jorisvandenbossche commented Apr 15, 2017

jreback commented Apr 15, 2017

mwiebusch78 commented Feb 20, 2019 • edited Loading

TomAugspurger commented Feb 20, 2019 via email

normanius commented Nov 26, 2020 • edited Loading

cklb commented Aug 23, 2021

aschilling commented Jul 8, 2013 •

edited by jreback

Loading

mwiebusch78 commented Feb 20, 2019 •

edited

Loading

normanius commented Nov 26, 2020 •

edited

Loading