Add method to flatten all multi-index levels #19950

cbrnr · 2018-03-01T10:36:30Z

Sometimes it is useful to flatten all levels of a multi-index. For example, when pivoting data into a wide format, the new columns are generally multi-indexed. However, when exporting to CSV, sometimes it might be desirable to have only one header row.

AFAIK, there is no dedicated method to flatten an existing multi-index. Assuming I want to combine the multi-index levels with an underscore, I would do this to get one header row:

df.columns = ["_".join(v) for v in df.columns.values]

This is not really obvious, and maybe a dedicated MultiIndex method should be added, e.g. df.columns.flatten(sep="_", inplace=False). What do you think?

The text was updated successfully, but these errors were encountered:

jreback · 2018-03-01T11:04:20Z

having a .to_index(sep=None) would be useful, the default would be equivalent of .tolist(), IOW return tuples, while a sep could stringify and join. Note that this is a bit tricky because for example what do you do with a Timestamp?

cbrnr · 2018-03-01T11:13:58Z

I like your idea of a to_index method. Good point, specifying a sep character implies that individual entries are converted to strings. How do Timestamp objects behave when I try to str them? If this is not possible, the method could raise an error in this case (and only work for string or int index objects).

cbrnr · 2018-03-01T11:17:01Z

Actually, is this even possible in the use case I described? I mean, I was referring to a multi-index created by pivot_table - can a level consist of timestamps?

jreback · 2018-03-01T11:20:38Z

In [29]: index=pd.MultiIndex.from_product([list('abc'), pd.date_range('20180101', periods=3)])

In [30]: index
Out[30]: 
MultiIndex(levels=[['a', 'b', 'c'], [2018-01-01 00:00:00, 2018-01-02 00:00:00, 2018-01-03 00:00:00]],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])


In [32]: pd.Index(['_'.join([str(_) for _ in v]) for v in index.values])
Out[32]: 
Index(['a_2018-01-01 00:00:00', 'a_2018-01-02 00:00:00',
       'a_2018-01-03 00:00:00', 'b_2018-01-01 00:00:00',
       'b_2018-01-02 00:00:00', 'b_2018-01-03 00:00:00',
       'c_2018-01-01 00:00:00', 'c_2018-01-02 00:00:00',
       'c_2018-01-03 00:00:00'],
      dtype='object')

jreback · 2018-03-01T11:22:19Z

slightly OT: note that .wide_to_long basically does the inverse of this, meaning it splits columns names on regex.

cbrnr · 2018-03-01T11:25:05Z

Looks good! So I guess no problem then? I didn't know about .wide_to_long, maybe adding a .long_to_wide would make sense?

Just to motivate this idea: I love how unite and spread from the tidyr R package do exactly what I describe here.

joelostblom · 2018-07-10T01:27:26Z

FWIW There is a very popular question related to this on SO, so by that metric it seems like quite a few people could benefit from a solution similar to what is suggested here.

oguzhanogreden · 2019-12-30T16:16:39Z

The answer to the the popular question hints that it's okay to overwrite columns with the result of this method.

That quickly got me here, which doesn't work as I expected with version 0.25.1:

df = pd.DataFrame({"a": [1,1,2,2], "b": [1, 2, 3, 4], "c": [1, 2, 3, 4]})
df_agg = df.groupby("a").agg({"b": ["min", "max"], "c":"sum"})
df_agg[("b", "min")] # works
df_agg.loc[:, ("b", "min")] # works

df_agg.columns = df_agg.columns.to_flat_index()
df_agg[("b", "min")] # works
df_agg.loc[:, ("b", "min")] # won't work.

Do you consider this expected behavior? I'm not very familiar with selector implementations, but my user intuition tells me this should work.

jreback added API Design MultiIndex labels Mar 1, 2018

TomAugspurger mentioned this issue Sep 27, 2018

Deprecation of relabeling dicts in groupby.agg brings many issues #18366

Closed

WillAyd mentioned this issue Sep 27, 2018

Add to_flat_index method to MultiIndex #22866

Merged

4 tasks

TomAugspurger closed this as completed in #22866 Nov 13, 2018

h-vetinari mentioned this issue Feb 4, 2019

API: harmonize Index.to_flat_index to MultiIndex.to_flat_index #25094

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add method to flatten all multi-index levels #19950

Add method to flatten all multi-index levels #19950

cbrnr commented Mar 1, 2018

jreback commented Mar 1, 2018

cbrnr commented Mar 1, 2018

cbrnr commented Mar 1, 2018

jreback commented Mar 1, 2018

jreback commented Mar 1, 2018

cbrnr commented Mar 1, 2018

joelostblom commented Jul 10, 2018

oguzhanogreden commented Dec 30, 2019

Add method to flatten all multi-index levels #19950

Add method to flatten all multi-index levels #19950

Comments

cbrnr commented Mar 1, 2018

jreback commented Mar 1, 2018

cbrnr commented Mar 1, 2018

cbrnr commented Mar 1, 2018

jreback commented Mar 1, 2018

jreback commented Mar 1, 2018

cbrnr commented Mar 1, 2018

joelostblom commented Jul 10, 2018

oguzhanogreden commented Dec 30, 2019