Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: allow DataFrame.rename to take a list-like of colums #14829

Open
chris-b1 opened this issue Dec 8, 2016 · 16 comments
Open

API: allow DataFrame.rename to take a list-like of colums #14829

chris-b1 opened this issue Dec 8, 2016 · 16 comments
Labels
Enhancement rename .rename, .rename_axis

Comments

@chris-b1
Copy link
Contributor

chris-b1 commented Dec 8, 2016

This is closely related to #12392, but I think separate issue. Proposal would be to be able to pass a list-like to DataFrame.rename to make method-chaining easier. I think it would also be consistent with #11980 (Series rename)

df = pd.DataFrame({'a': [1,2], 'b': [3, 4]})

# make this
df = df.rename(columns=['j', 'k'])

# equivalent to 
df.columns = ['j', 'k']
@jreback
Copy link
Contributor

jreback commented Jan 1, 2017

note that Series.rename is affecting the .name and NOT an index. So though I like that this works like .set_index it might be a bit of an odd duck.

@chris-b1
Copy link
Contributor Author

chris-b1 commented Jan 1, 2017

Yeah, it's a bit odd - I actually could make [5] below work - but at best confusing next to [4]

In [1]: s = pd.Series(['a', 'b'])

In [2]: s
Out[2]: 
0    a
1    b
dtype: object

In [3]: s.rename('my_series')
Out[3]: 
0    a
1    b
Name: my_series, dtype: object

In [4]: s.rename((1,2))
Out[4]: 
0    a
1    b
Name: (1, 2), dtype: object

In [5]: s.rename([1,2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: Series.name must be a hashable type

@jorisvandenbossche
Copy link
Member

There is also the idea to call this relabel (I thought there was an issue about this, but it is mentioned here: #12392 (comment), #11980 (comment)).
If we do this, this makes it possible to more easily expand the possibilities here without conflicting with the series name renaming case.

@chris-b1
Copy link
Contributor Author

chris-b1 commented Jan 3, 2017

Thanks @jorisvandenbossche - although I'm a little wary of adding another DataFrame method, I am in favor of calling this relabel vs adding functionality to rename

@chris-b1 chris-b1 mentioned this issue Jan 11, 2017
4 tasks
@jorisvandenbossche
Copy link
Member

although I'm a little wary of adding another DataFrame method

fully agree, but the double use case of rename is also not nice ..

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 17, 2017

From @toobaz in #16990 (comment):

My proposal is the following:

  • add a new method .relabel, which changes the content of the axis, as in
df.relabel(['l1', 'l2', 'l3']) # changes df.index labels - assuming df.index is flat
df.relabel([['l1a', 'l1b', 'l1c'], [...], [...]]) # changes df.index labels - assuming df.index has 3 levels
df.relabel(a_dict) # again, changes the index labels, analogously to what df.rename(a_dict) currently does
  • deprecate the use of .rename and .rename_index for doing the same operation, that is when the index or mapper argument (respectively) is a callable or dict-like; keep them (or, even better, keep one and deprecate the other) for changing only index names

I would add that df.relabel would also take a function mapping old index labels to new.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 17, 2017

Could we do an up / down vote on:

a. Adding .relabel method for changing the index / column labels
b. Deprecating the relabeling behavior of .rename

I'm +1 on a and -0 on b.

@jreback
Copy link
Contributor

jreback commented Jul 19, 2017

.set_axis does the job with the new signature. (#16994)

So I would be ok with

  • use this new impl .set_axis as .relabel and deprecate .set_axis
  • remove .rename/.rename_axis relableing behavior (b)

@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Jul 19, 2017
@jreback
Copy link
Contributor

jreback commented Jul 19, 2017

another possibility is .relabel_index, slightly longer but in the same vein as .set_index so maybe more clear

@toobaz
Copy link
Member

toobaz commented Jul 19, 2017

.set_axis does the job with the new signature

OK. We should add the support for mappings, callables, and level=, but then yes, I'm +1 on b.

I'm quite indifferent between .relabel and keeping .set_index, while I'm slightly against .relabel_index (it is longer, and I don't think there is risk of ambiguity).

@toobaz
Copy link
Member

toobaz commented Jul 26, 2017

OK. We should add the support for mappings, callables, and level=

[to set_axis, and adding a FutureWarning for the .rename/.rename_axis relabeling behaviour.]

Does anybody disagree on this?

(regardless of whether we then want to rename set_axis to relabel)

@jreback
Copy link
Contributor

jreback commented Jul 26, 2017

@toobaz

[to set_axis, and adding a FutureWarning for the .rename/.rename_axis relabeling behaviour.]

good to deprecate on that. pls add a sub-section in the docs (both whatsnew and maybe in main docs) on what do (e.g. use .set_axis)

@jorisvandenbossche
Copy link
Member

I am -1 on moving towards set_axis as our recommended 'renaming index labels' method (instead of rename).
I find that personally a very confusing name. I know 'axis' is internally used as a general name for the actual index or columns object, but I don't think we should do that in user API (apart from using axis= to indicate the direction on which the method should be applied). Also, set_axis in combination with a dict for renaming certain labels is also not really 'setting the axis'.

But maybe that is more a naming question, and the actual proposal is good apart from the name (I still have to go a bit more through all the notifications and discussion after my holidays :-))

@TomAugspurger
Copy link
Contributor

Some options:

I think we'd like to deprecate Series.rename(scalar) changing Series.name. This was a mistake; apologies. We would replace it with Series.set_name(scalar)

One proposal is to enhance DataFrame.rename to take a correctly-sized array to and just set it there.

Another issue is to add a DataFrame.set_columns that acts like set_index(array). We wouldn't want the set_index(labels) behavior, as I don't think it's that useful. Often the dtype of your column lablels won't match the dtypes of your data.

@shoyer
Copy link
Member

shoyer commented Jul 13, 2018

I think we'd like to deprecate Series.rename(scalar) changing Series.name. This was a mistake; apologies. We would replace it with Series.set_name(scalar)

We could definitely do this, but rename will still be an obvious place to look for how to set a name. This is one of the reasons why we added Series.rename(scalar) in the first place (#9494).

The other reason for adding Series.rename(scalar) is that I think about a "name" in pandas as referring most specifically to column names (DataFrame.columns / Series.name), rather row-names (DataFrame.index / Series.index) which are usually not even strings.

Unfortunately, we have already conflated the concept of "name" in rename with row names, and in fact by default, rename refers specifically to rows! We could try to change this (e.g., by switching to set_axis), but we are already quite constrained by other naming choices (e.g., set_index) in the pandas API, so we don't have good alternatives for row relabeling either.

Given the poor alternatives, I think my preferred choice would be to stick with the existing API, where
rename() can refer to either index or columns/name.

@TomAugspurger
Copy link
Contributor

@shoyer thanks. Do you have any thoughts on adding a .relabel method that does everything .rename does, except setting Series.name? We would update the docs to all use .relabel, but not deprecate the current behavior.

@jreback jreback removed this from the 0.24.0 milestone Nov 6, 2018
@jreback jreback added this to the Contributions Welcome milestone Nov 6, 2018
@ghost ghost mentioned this issue Jul 22, 2019
4 tasks
@mroeschke mroeschke added Enhancement and removed API Design Indexing Related to indexing on series/frames, not to indexes themselves labels May 2, 2021
@jbrockmendel jbrockmendel added the rename .rename, .rename_axis label Oct 29, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement rename .rename, .rename_axis
Projects
None yet
8 participants