Add to_flat_index method to MultiIndex #22866

WillAyd · 2018-09-27T22:39:57Z

closes Add method to flatten all multi-index levels #19950
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Very simple implementation at the moment. The thought here is to introduce this method and perhaps subsequently extend to allow for string concatenation of the elements. Longer term there could also be a keyword added to .agg of GroupBy which will dispatch to this instead of simply returning a MultiIndex column, which could alleviate some of the pain users are experience when trying to rename columns after an aggregation.

@TomAugspurger and @jorisvandenbossche from the dev chat today

pep8speaks · 2018-09-27T22:40:00Z

Hello @WillAyd! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/indexes/multi.py !
There are no PEP8 issues in the file pandas/tests/indexes/multi/test_conversion.py !

codecov · 2018-09-28T20:12:35Z

Codecov Report

Merging #22866 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #22866      +/-   ##
==========================================
+ Coverage   92.24%   92.24%   +<.01%     
==========================================
  Files         161      161              
  Lines       51317    51318       +1     
==========================================
+ Hits        47338    47339       +1     
  Misses       3979     3979

Flag	Coverage Δ
#multiple	`90.63% <100%> (ø)`	⬆️
#single	`42.31% <50%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/multi.py	`95.47% <100%> (ø)`	⬆️
pandas/core/indexes/base.py	`96.46% <100%> (ø)`	⬆️
pandas/core/series.py	`93.68% <0%> (-0.03%)`	⬇️
pandas/core/frame.py	`97.02% <0%> (-0.01%)`	⬇️
pandas/core/generic.py	`96.82% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20bdb3e...bea4e85. Read the comment docs.

jreback · 2018-10-01T12:00:35Z

can you add to api.rst. I would add the sep=None arg here, maybe can defer the actual handling of the string concat to a later PR

WillAyd · 2018-10-01T19:19:19Z

Yep my goal is to defer the actual sep implementation, if only because I'm not sure how we'd want to handle say timestamp and categorical objects so probably needs further discussion.

For now I've stubbed it out and raise a NotImplementedError but if you want to go about in a different way lmk

jorisvandenbossche · 2018-10-03T08:51:37Z

I'm not sure how we'd want to handle say timestamp and categorical objects so probably needs further discussion.

I think if you specify sep, everything gets stringified? (just using the default conversion to string, without any ability to use a custom format, the user can always do that before they use this method)

About the name, do we want to call this to_index ? It might feel a bit strange, since you already have an index to start with (it's only a MultiIndex, but which is still an index).
So you would get something like df.index.to_index() or df.columns.to_index(), which seems a bit strange.

Alternative would be .flatten(..), but I don't know to what extent it is a problem that this clashes with the numpy method.
Or to_single, collapse_levels, ... (but I don't really like those :-))

TomAugspurger · 2018-10-03T11:17:18Z

Yesterday, I almost left the same comment as Joris about the name :) Something about MultiIndex.to_index bothered me, because it implies that MultiIndex isn't an Index. In the end, I deleted the comment because 1.) there's a nice symmetry with the other to_* methods 2.) I couldn't come up with a better name.

…

On Wed, Oct 3, 2018 at 3:52 AM Joris Van den Bossche < ***@***.***> wrote: I'm not sure how we'd want to handle say timestamp and categorical objects so probably needs further discussion. I think if you specify sep, everything gets stringified? (just using the default conversion to string, without any ability to use a custom format, the user can always do that before they use this method) ------------------------------ About the name, do we want to call this to_index ? It might feel a bit strange, since you already have an index to start with (it's only a MultiIndex, but which is still an index). So you would get something like df.index.to_index() or df.columns.to_index(), which seems a bit strange. Alternative would be .flatten(..), but I don't know to what extent it is a problem that this clashes with the numpy method. Or to_single, collapse_levels, ... (but I don't really like those :-)) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#22866 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIliuvpWPdt0UAEJ0_B5-hfB37kaqks5uhHqvgaJpZM4W9ksx> .

jreback · 2018-10-03T13:24:25Z

maybe as_index(), though I agree I like the symmetry. btw is Index.to_index() tested? (iow identity)

WillAyd · 2018-10-03T14:36:36Z

Agreed with all comments so far - the name was initially off-putting to me but ultimately went with it given the consistency of it with the API, and I didn't want to obfuscate what flatten meant from a NumPy perspective.

I think if you specify sep, everything gets stringified?

Probably. I'm probably being ultra-conservative here but something about the string conversion and "round-tripability" of this is why I was planning on holding off until a separate PR to even go down that path. If it's a blocker here can certainly add the simple implementation.

Index.to_index() tested? (iow identity)

No but I also don't believe this is implemented atm. Are you thinking this implementation in this PR is better served in the base.py module?

WillAyd · 2018-10-23T04:51:23Z

Any other thoughts on this?

jreback · 2018-10-23T09:44:28Z

pandas/core/indexes/multi.py

@@ -1198,6 +1199,28 @@ def to_frame(self, index=True, name=None):
            result.index = self
        return result



how about to_flat ?

also i would add the identity method to Index itself

WillAyd · 2018-10-26T23:36:04Z

Added the identity method. I did not change the name from to_index to to_flat because I personally prefer the API consistency with to_series and to_frame though I'm not going to get terribly hung up on that if we feel to_flat is better.

I also kept the documentation geared only towards MultiIndex atm. Not sure if we want to document the identity and if it serves a purpose outside of not breaking chained ops (at least in its current state) but happy to update with whatever we feel necessary

shoyer · 2018-10-27T19:27:50Z

What about to_flat_index()? I think to_index() is pretty confusing for transforming a MultiIndex to a base pandas.Index (my expectation would be that this is the identity function), and it would be better to pick something slightly verbose rather than slightly confusing.

jorisvandenbossche · 2018-10-28T09:05:50Z

As said above, I am also -1 on to_index, this is just a confusing name. I agree with @shoyer that it's then better to have a verbose but at least readable / clear name.

Other names I was thinking about: .flatten_levels() or flatten_index_levels (this last one is maybe too verbose, although explicit). But to_flat_index() is fine for me.

Now you added it to the Index/MultiIndex. But if you want to do be able to directly do this is in a method chain (eg after a groupby operation), it would be needed on the DataFrame as well. Is this something we would also want to consider (but perfectly fine with start to add it at least to the index itself)

toobaz · 2018-11-07T15:07:16Z

Sorry for coming late. I'm also not convinced by to_index, and weakly in favor of to_flat (although flatten would not disturb me).

I'm also not convinced by sep. I think that:

the necessity that emerged lately (e.g. in the mailing list, when discussing MultiIndexes produced by GroupBy.apply) was for instance to transform a level of operation names ('mean', 'min', 'max') and one of variables ('a', 'b', 'c') in an index of strings such as 'mean of a', 'mean of b' and so on. Now, sep=' of ' does (once implemented) allow this, but just because it happens that there are two levels, and in the correct order, not because it makes much sense to think to the labels for each level as a flat ordered list
I think giving less freedom today in terms of aggregation methods just calls for adding more arguments later

My idea would be to pass a callable (e.g. fmt) accepting an iterable and returning anything (not necessarily a string, by the way). The sep solution would just become, on a MultiIndex with 2 levels, fmt='{} of {}'.format.

In any case, if we want to discuss this and not block this PR, I would just remove the sep argument for the moment.

EDIT: one objection I can see to fmt is that it would be doing precisely what can be done as mi.to_flat().apply(fmt). But this is an objection to sep too I think. And probably a weak objection in any case.

jorisvandenbossche · 2018-11-08T13:36:51Z

My idea would be to pass a callable (e.g. fmt) accepting an iterable and returning anything

The idea is that this would be a keyword of this (to be renamed) to_index method, or a method on its own?

In any case, I like this idea, it gives indeed more flexibility than a sep.

We had some discussion about this before (I think in light of deprecating add_prefix/add_suffix) to have an easier way to format values in an Index/Series (related issues: #18347, #17211)

jorisvandenbossche · 2018-11-08T13:39:26Z

This is kind of

In [142]: idx = pd.MultiIndex.from_product([['mean', 'sum'], ['a', 'b']])

In [143]: idx.map(lambda x: "{0} of {1}".format(*x))
Out[143]: Index(['mean of a', 'mean of b', 'sum of a', 'sum of b'], dtype='object')

toobaz · 2018-11-08T14:26:22Z

The idea is that this would be a keyword of this (to be renamed) to_index method, or a method on its own?

A keyword of this.

This is kind of

Right, I had missed this. Basically idx.map(lambda x: "{0} of {1}".format(*x)) would just become idx.to_flat("{0} of {1}".format). So maybe this would be a bit redundant. Still, sep seems too restrictive to me, and this added sugar candy would have a very low implementation cost.

pandas/core/indexes/base.py

jreback · 2018-11-11T16:35:13Z

after reading #23141 maybe we should call this .squeeze()?

TomAugspurger · 2018-11-11T17:24:02Z

This is a bit different from the squeeze semantics.

squeeze will reduce dimensionality if possible (only one item on that axis).

.flatten / .to_index / whatever will always reduce the dimensionality.

jreback · 2018-11-11T17:37:03Z

from #22866 (comment)

ahh right, ok then.

toobaz · 2018-11-11T20:14:56Z

This is a bit different from the squeeze semantics.

squeeze will reduce dimensionality if possible (only one item on that axis).

.flatten / .to_index / whatever will always reduce the dimensionality.

Just for pickiness: the two differ in fact also for the "only one item on that axis" case, as the former will return the labels as items, while the latter will return length one tuples as items.

WillAyd · 2018-11-13T05:48:16Z

Renamed to to_flat_index and removed sep parameter based off feedback above

toobaz · 2018-11-13T07:58:17Z

Seems good to me

TomAugspurger

Shall we merge?

TomAugspurger · 2018-11-13T13:28:10Z

pandas/core/indexes/base.py

+               ('bar', 'baz'), ('bar', 'qux')],
+              dtype='object')
+        """
+        if not isinstance(self, ABCMultiIndex):


Idle style question (no need to change here): what are people's preferences on

Defining a method for both Index and MultiIndex here, but using an isnstance to figure out which one we're working with

Defining a default implementation here that's return self, and overriding in MultiIndex, so no if isintance?

Preference for 2, and in general, for leaving the base Index class unaware (every time it is possible/reasonable) of the existence and specificities of Index subclasses.

WillAyd · 2018-11-13T15:35:43Z

The inheritance model does seem to make more sense here so I've updated the latest commit to reflect that

TomAugspurger · 2018-11-13T16:57:13Z

Thanks @WillAyd!

toobaz · 2018-11-13T16:57:43Z

pandas/core/indexes/base.py

+
+        .. versionadded:: 0.24.0
+
+        This is implemented for compatability with subclass implementations


compatability -> compatibility

By the way: reusing the same docstring would be a + for me. I would rather describe the actual operation (so on MuiltiIndex) and then say that it is idempotent on flat Indexes.

* upstream/master: (25 commits) DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660) CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655) ...

…fixed * upstream/master: DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)

WillAyd added 2 commits September 27, 2018 15:21

Added initial test

ccaea63

Method implementation

c9b2629

WillAyd added API Design MultiIndex labels Sep 27, 2018

Merge remote-tracking branch 'upstream/master' into mi-to-index

36a74ce

WillAyd added 3 commits October 1, 2018 11:59

Merge remote-tracking branch 'upstream/master' into HEAD

4774368

Stubbed out sep parameter

b7150bb

Documentation updates

d96f293

Merge remote-tracking branch 'upstream/master' into mi-to-index

7e651f7

jreback reviewed Oct 23, 2018

View reviewed changes

WillAyd added 3 commits October 26, 2018 16:19

Merge remote-tracking branch 'upstream/master' into mi-to-index

b9c993c

Moved to_index to base

9e5a305

Removed unused import

bc60292

WillAyd added 2 commits October 27, 2018 11:57

Returned identity instead of copy for non MI

380595f

Fixed up docstring

be06e7d

toobaz reviewed Nov 8, 2018

View reviewed changes

pandas/core/indexes/base.py Outdated Show resolved Hide resolved

jreback mentioned this pull request Nov 11, 2018

ENH: MultiIndex.from_frame #23141

Merged

4 tasks

WillAyd added 3 commits November 12, 2018 21:36

Merge remote-tracking branch 'upstream/master' into mi-to-index

4376bf6

Changed method name; removed sep parameter

1ac6200

Fixed broken test

f7a2342

TomAugspurger approved these changes Nov 13, 2018

View reviewed changes

Moved to inheritance model

bea4e85

WillAyd changed the title ~~Add to_index method to MultiIndex~~ Add to_flat_index method to MultiIndex Nov 13, 2018

TomAugspurger merged commit 454ecfc into pandas-dev:master Nov 13, 2018

toobaz reviewed Nov 13, 2018

View reviewed changes

WillAyd deleted the mi-to-index branch November 13, 2018 16:58

toobaz mentioned this pull request Nov 13, 2018

Rethinking to_flat_index on flat Index #23670

Open

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

Add to_flat_index method to MultiIndex (pandas-dev#22866)

c1640c6

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

Add to_flat_index method to MultiIndex (pandas-dev#22866)

7643073

toobaz mentioned this pull request Feb 2, 2019

API: harmonize Index.to_flat_index to MultiIndex.to_flat_index #25094

Closed

4 tasks

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Add to_flat_index method to MultiIndex (pandas-dev#22866)

ad56f0f

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Add to_flat_index method to MultiIndex (pandas-dev#22866)

d1c5b7f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add to_flat_index method to MultiIndex #22866

Add to_flat_index method to MultiIndex #22866

WillAyd commented Sep 27, 2018 •

edited

Loading

pep8speaks commented Sep 27, 2018

codecov bot commented Sep 28, 2018 •

edited

Loading

jreback commented Oct 1, 2018

WillAyd commented Oct 1, 2018

jorisvandenbossche commented Oct 3, 2018

TomAugspurger commented Oct 3, 2018 via email

jreback commented Oct 3, 2018 •

edited

Loading

WillAyd commented Oct 3, 2018

WillAyd commented Oct 23, 2018

jreback Oct 23, 2018

jreback Oct 23, 2018

WillAyd commented Oct 26, 2018

shoyer commented Oct 27, 2018

jorisvandenbossche commented Oct 28, 2018

toobaz commented Nov 7, 2018 •

edited

Loading

jorisvandenbossche commented Nov 8, 2018

jorisvandenbossche commented Nov 8, 2018

toobaz commented Nov 8, 2018

jreback commented Nov 11, 2018

TomAugspurger commented Nov 11, 2018

jreback commented Nov 11, 2018

toobaz commented Nov 11, 2018

WillAyd commented Nov 13, 2018

toobaz commented Nov 13, 2018

TomAugspurger left a comment

TomAugspurger Nov 13, 2018

toobaz Nov 13, 2018

WillAyd commented Nov 13, 2018

TomAugspurger commented Nov 13, 2018

toobaz Nov 13, 2018

		@@ -1198,6 +1199,28 @@ def to_frame(self, index=True, name=None):
		result.index = self
		return result


		.. versionadded:: 0.24.0

		This is implemented for compatability with subclass implementations

Add to_flat_index method to MultiIndex #22866

Add to_flat_index method to MultiIndex #22866

Conversation

WillAyd commented Sep 27, 2018 • edited Loading

pep8speaks commented Sep 27, 2018

codecov bot commented Sep 28, 2018 • edited Loading

Codecov Report

jreback commented Oct 1, 2018

WillAyd commented Oct 1, 2018

jorisvandenbossche commented Oct 3, 2018

TomAugspurger commented Oct 3, 2018 via email

jreback commented Oct 3, 2018 • edited Loading

WillAyd commented Oct 3, 2018

WillAyd commented Oct 23, 2018

jreback Oct 23, 2018

Choose a reason for hiding this comment

jreback Oct 23, 2018

Choose a reason for hiding this comment

WillAyd commented Oct 26, 2018

shoyer commented Oct 27, 2018

jorisvandenbossche commented Oct 28, 2018

toobaz commented Nov 7, 2018 • edited Loading

jorisvandenbossche commented Nov 8, 2018

jorisvandenbossche commented Nov 8, 2018

toobaz commented Nov 8, 2018

jreback commented Nov 11, 2018

TomAugspurger commented Nov 11, 2018

jreback commented Nov 11, 2018

toobaz commented Nov 11, 2018

WillAyd commented Nov 13, 2018

toobaz commented Nov 13, 2018

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger Nov 13, 2018

Choose a reason for hiding this comment

toobaz Nov 13, 2018

Choose a reason for hiding this comment

WillAyd commented Nov 13, 2018

TomAugspurger commented Nov 13, 2018

toobaz Nov 13, 2018

Choose a reason for hiding this comment

WillAyd commented Sep 27, 2018 •

edited

Loading

codecov bot commented Sep 28, 2018 •

edited

Loading

jreback commented Oct 3, 2018 •

edited

Loading

toobaz commented Nov 7, 2018 •

edited

Loading