Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/BUG: Rename of MultiIndex DataFrames does not work #4160

Open
aschilling opened this issue Jul 8, 2013 · 25 comments · Fixed by #15931
Open

ENH/BUG: Rename of MultiIndex DataFrames does not work #4160

aschilling opened this issue Jul 8, 2013 · 25 comments · Fixed by #15931
Labels
Bug MultiIndex rename .rename, .rename_axis

Comments

@aschilling
Copy link

aschilling commented Jul 8, 2013

xref #14139 for empty MI

Hi everybody,

in the current version renaming of MultiIndex DataFrames does not work. Lets take the following example:

import datetime as DT
import pandas as pd
df = pd.DataFrame({
'Branch' : 'A A A A A B'.split(),
'Buyer': 'Carl Mark Carl Joe Mark Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
    DT.datetime(2013,9,1,13,0),
    DT.datetime(2013,9,1,13,5),
    DT.datetime(2013,10,1,20,0),
    DT.datetime(2013,10,3,10,0),
    DT.datetime(2013,12,2,12,0),                                      
    DT.datetime(2013,12,2,14,0),
    ]})

and the following query:

test_df = df[df['Buyer'].isin(['Carl', 'Mark'])].set_index('Buyer', append=True)[['Date']].unstack(['Buyer'])

Now, the following renaming does not work

test_df.rename(columns={('Date', 'Carl'): 'Carl'}, inplace=True)

Thanks in advance

Andy

@jreback
Copy link
Contributor

jreback commented Jul 8, 2013

what are you trying to accomplish, that doesn't make sense, renaming a 2-level to a single-level

You can select out the level however; is this what you are after?

In [8]: test_df['Date']
Out[8]: 
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

@aschilling
Copy link
Author

Hi everybody,

Sorry, the example ist not the best. Actually, I did some trend generation today and after updating to pandas current branch a lot of my code didn't work anymore because that rename function used to work as described above.
There are basically two scenarios where I used that hierarchical renaming:

  1. When generating stock trends (a short variable name is much easier to use than those tuples)
  2. When doing some complex operations as in the SO article, where the original variable (Date) does not matter any more but only that there is a matching date between Carl and Mark

@hayd
Copy link
Contributor

hayd commented Jul 8, 2013

I favour xs here, bit more explicit:

In [11]: test_df.xs('Date', axis=1)
Out[11]:
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

# or maybe
test_df.columns = test_df.columns.droplevel(0)
# or 
test_df.columns =  test_df.columns.get_level_values('Buyer')

The fact that replace was working before smells like a bug, as @jreback says, it doesn't really make any sense to rename like that...

@cpcloud
Copy link
Member

cpcloud commented Jul 8, 2013

renaming doesn't work for multiindexes period, whether it makes sense or not:

In [7]: df
Out[7]:
  Branch Buyer                Date  Quantity
0      A  Carl 2013-09-01 13:00:00         1
1      A  Mark 2013-09-01 13:05:00         3
2      A  Carl 2013-10-01 20:00:00         5
3      A   Joe 2013-10-03 10:00:00         8
4      A  Mark 2013-12-02 12:00:00         9
5      B  Carl 2013-12-02 14:00:00         3

In [8]: test_df = df[df['Buyer'].isin(['Carl', 'Mark'])].set_index('Buyer', append=True)[['Date']].unstack(['Buyer'])

In [9]: test_df
Out[9]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [10]: test_df.rename(columns={('Date', 'Carl'): ('Care')})
Out[10]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [11]: test_df.rename(columns={('Date', 'Carl'): ('Care', "sdf")})
Out[11]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [12]: test_df.rename(columns={('Date', 'Carl'): ('Care', "sdf")})

@hayd
Copy link
Contributor

hayd commented Jul 8, 2013

Now that is a bug/feature request :)

Maybe you ought to be able to replace on each level for a MultiIndex, say using

test_df.rename(columns={'Buyer': {'Carl' : 'sdf'}})

not sure...

@jreback
Copy link
Contributor

jreback commented Jul 8, 2013

need to add level arg to rename maybe?

@hayd
Copy link
Contributor

hayd commented Jul 8, 2013

not sure level argument works/makes sense since rename allows you to change both index and columns at the same time:s

@hayd
Copy link
Contributor

hayd commented Jul 8, 2013

Although perhaps my suggestion doesn't either (if want to replace same things as the level name/number)...

@jreback
Copy link
Contributor

jreback commented Dec 11, 2013

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Apr 22, 2014
@jreback jreback modified the milestones: 0.15.0, 0.15.1 Jul 6, 2014
@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 8, 2014
@8one6
Copy link

8one6 commented Nov 18, 2014

I think this is still an open issue. It would be great to be able to treat the column labels as tuples and just use rename in the "natural" (at least natural to me) way. For example:

df.rename(columns={c: (str(c[0]) + 'foo', str(c[1]) + 'bar') for c in df.columns})

@jreback
Copy link
Contributor

jreback commented Nov 18, 2014

@8one6 this is an open issue currently.

this still waiting for an API to deal with the multi-level API.

I am not sure I like the stringifying idea. But haven't thought too much about this.

Maybe an actual example would help

@8one6
Copy link

8one6 commented Nov 18, 2014

import numpy as np
import pandas as pd

rows = pd.Index(list('abcde'), names=['letter'])
columns = pd.MultiIndex.from_tuples([('px', c) for c in ['red', 'green', 'blue']], 
                                    names=['datum', 'color'])
df = pd.DataFrame(np.random.randn(len(rows), len(columns)), index=rows, columns=columns)

gives

datum        px                    
item        red     green      blue
a     -0.616822 -0.922983  0.148247
b     -0.383122 -0.451940  1.138330
c     -0.744860  2.299611  0.895295
d     -0.159886 -0.832159 -0.205430
e     -0.458384 -1.410207 -0.965780

So now I want to do this:

absdf = df.abs()
absdf.rename(columns={c: ('abspx', c[1]) for c in df.columns}, inplace=True)

but that doesn't do what I expect, it just gives back the unmodified frame. To accomplish what I want here, I would do:

newabsdf = df.abs()
newabsdf.columns = pd.MultiIndex.from_tuples([('abspx', c[1]) for c in df.columns], 
                                             names=df.columns.names)

which gives the desired result:

datum     abspx                    
item        red     green      blue
a      0.616822  0.922983  0.148247
b      0.383122  0.451940  1.138330
c      0.744860  2.299611  0.895295
d      0.159886  0.832159  0.205430
e      0.458384  1.410207  0.965780

Basically, in the multi-index context, I was expecting rename to "be happy" if the passed function/mapper/dictionary returned tuples with the correct number of elements. Am I doing something wrong above? Or would this be a new feature request? Or does this seem ambiguous in some way.

@jreback
Copy link
Contributor

jreback commented Nov 18, 2014

a multi-index renam at the moment does not work at all. The issue is how do you rename only part of a level

e.g.
red-> orange, how should I do this?
or
abspx -> foo

df.rename(columns={'red' : 'orange' }, level=1)
df.rename(columns={'abspx' : 'foo'},level=0)

but no way to do this (well it doesn't work), but does make sense

df.rename(columns={('abspx','red) : ('foo','orange')})

@8one6
Copy link

8one6 commented Nov 18, 2014

Ah, ok. So that last code block in your comment, is that a reasonable thing to hope will work at some point? I.e. is there a reason that would be a bad API for doing fully general multilevel renaming? (I think that's what I had tried to achieve with my dict comprehension in my absdf.rename... line above)

And the other two lines...

df.rename(columns={'red' : 'orange' }, level=1)
df.rename(columns={'abspx' : 'foo'}, level=0)

is that the current working proposal? Or already implemented? Or up for debate?

@jreback
Copy link
Contributor

jreback commented Nov 18, 2014

I think the prior dont work (but prob don't need much). The last is a proposed API.

I think their is a pull-requests somewhere which does most of this but wasn't finished IIRC.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback modified the milestones: Next Minor Release, Next Major Release Mar 29, 2017
@jreback jreback modified the milestones: 0.20.0, Next Minor Release Apr 9, 2017
@jorisvandenbossche
Copy link
Member

This was closed automatically by github, but that was not the intention (#15931 is not related to this)

@jreback
Copy link
Contributor

jreback commented Apr 15, 2017

hahah had a reference with the word fix in it!

@jreback jreback modified the milestones: 0.20.0, Next Minor Release Apr 15, 2017
@jreback jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017
@mwiebusch78
Copy link

mwiebusch78 commented Feb 20, 2019

Is there any news on this? This bug is a constant annoyance when trying to do complex aggregations. Since the 'agg' method doesn't have a way to assign new names to the aggregate columns the recommended method (AFAIK) is to first aggregate and then rename. But something like

df.groupby('A').agg({'B': ['mean', 'median'], 'C': ['min', 'max']}

returns a multi-index which I then cannot rename. I actually had to write my own wrapper function for aggregations as a workaround. (Happy to share the code if there's interest.)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Feb 20, 2019 via email

@normanius
Copy link

normanius commented Nov 26, 2020

Just sharing a workaround: renaming of tuples works for flattened indices.

df = pd.DataFrame([[1,2,3],[3,4,5],[5,6,7], [7,8,9]])
df.columns = pd.MultiIndex.from_tuples([('i','a'),('i','b'),('ii','a')])

# Alternative 1
df.columns = df.columns.to_flat_index()
df = df.rename(columns={('i','b'):('i','c')})
df.columns = pd.MultiIndex.from_tuples(df.columns)

# Alternative 2
i = df.columns.get_loc(('i','b'))
cols = df.columns.to_flat_index()
cols[i] = ('i','c')
df.columns = pd.MultiIndex.from_tuples(cols)

@cklb
Copy link

cklb commented Aug 23, 2021

Just stumbled over this problem. The delicate part is that the function will always return successfully even if errors="raise" is passed.

Thanks to normanius for the workaround, if a fix is not that easy, maybe the docstring could be extended with a warning?

@jbrockmendel jbrockmendel added the rename .rename, .rename_axis label Oct 29, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug MultiIndex rename .rename, .rename_axis
Projects
None yet
Development

Successfully merging a pull request may close this issue.