-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add to_flat_index method to MultiIndex #22866
Conversation
Hello @WillAyd! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #22866 +/- ##
==========================================
+ Coverage 92.24% 92.24% +<.01%
==========================================
Files 161 161
Lines 51317 51318 +1
==========================================
+ Hits 47338 47339 +1
Misses 3979 3979
Continue to review full report at Codecov.
|
can you add to api.rst. I would add the |
Yep my goal is to defer the actual sep implementation, if only because I'm not sure how we'd want to handle say timestamp and categorical objects so probably needs further discussion. For now I've stubbed it out and raise a |
I think if you specify About the name, do we want to call this Alternative would be |
Yesterday, I almost left the same comment as Joris about the name :)
Something about MultiIndex.to_index bothered
me, because it implies that MultiIndex isn't an Index.
In the end, I deleted the comment because
1.) there's a nice symmetry with the other to_* methods
2.) I couldn't come up with a better name.
…On Wed, Oct 3, 2018 at 3:52 AM Joris Van den Bossche < ***@***.***> wrote:
I'm not sure how we'd want to handle say timestamp and categorical objects
so probably needs further discussion.
I think if you specify sep, everything gets stringified? (just using the
default conversion to string, without any ability to use a custom format,
the user can always do that before they use this method)
------------------------------
About the name, do we want to call this to_index ? It might feel a bit
strange, since you already have an index to start with (it's only a
MultiIndex, but which is still an index).
So you would get something like df.index.to_index() or
df.columns.to_index(), which seems a bit strange.
Alternative would be .flatten(..), but I don't know to what extent it is
a problem that this clashes with the numpy method.
Or to_single, collapse_levels, ... (but I don't really like those :-))
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22866 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIliuvpWPdt0UAEJ0_B5-hfB37kaqks5uhHqvgaJpZM4W9ksx>
.
|
maybe |
Agreed with all comments so far - the name was initially off-putting to me but ultimately went with it given the consistency of it with the API, and I didn't want to obfuscate what
Probably. I'm probably being ultra-conservative here but something about the string conversion and "round-tripability" of this is why I was planning on holding off until a separate PR to even go down that path. If it's a blocker here can certainly add the simple implementation.
No but I also don't believe this is implemented atm. Are you thinking this implementation in this PR is better served in the |
Any other thoughts on this? |
pandas/core/indexes/multi.py
Outdated
@@ -1198,6 +1199,28 @@ def to_frame(self, index=True, name=None): | |||
result.index = self | |||
return result | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about to_flat ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also i would add the identity method to Index itself
Added the identity method. I did not change the name from I also kept the documentation geared only towards MultiIndex atm. Not sure if we want to document the identity and if it serves a purpose outside of not breaking chained ops (at least in its current state) but happy to update with whatever we feel necessary |
What about |
As said above, I am also -1 on Other names I was thinking about: Now you added it to the Index/MultiIndex. But if you want to do be able to directly do this is in a method chain (eg after a groupby operation), it would be needed on the DataFrame as well. Is this something we would also want to consider (but perfectly fine with start to add it at least to the index itself) |
Sorry for coming late. I'm also not convinced by I'm also not convinced by
My idea would be to pass a callable (e.g. In any case, if we want to discuss this and not block this PR, I would just remove the EDIT: one objection I can see to |
The idea is that this would be a keyword of this (to be renamed) In any case, I like this idea, it gives indeed more flexibility than a We had some discussion about this before (I think in light of deprecating |
This is kind of
|
A keyword of this.
Right, I had missed this. Basically |
after reading #23141 maybe we should call this |
This is a bit different from the squeeze semantics. squeeze will reduce dimensionality if possible (only one item on that axis).
|
from #22866 (comment) ahh right, ok then. |
Just for pickiness: the two differ in fact also for the "only one item on that axis" case, as the former will return the labels as items, while the latter will return length one tuples as items. |
Renamed to |
Seems good to me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we merge?
pandas/core/indexes/base.py
Outdated
('bar', 'baz'), ('bar', 'qux')], | ||
dtype='object') | ||
""" | ||
if not isinstance(self, ABCMultiIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idle style question (no need to change here): what are people's preferences on
- Defining a method for both Index and MultiIndex here, but using an
isnstance
to figure out which one we're working with - Defining a default implementation here that's
return self
, and overriding inMultiIndex
, so noif isintance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preference for 2, and in general, for leaving the base Index
class unaware (every time it is possible/reasonable) of the existence and specificities of Index
subclasses.
The inheritance model does seem to make more sense here so I've updated the latest commit to reflect that |
Thanks @WillAyd! |
|
||
.. versionadded:: 0.24.0 | ||
|
||
This is implemented for compatability with subclass implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compatability -> compatibility
By the way: reusing the same docstring would be a + for me. I would rather describe the actual operation (so on MuiltiIndex
) and then say that it is idempotent on flat Index
es.
* upstream/master: (25 commits) DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660) CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655) ...
…fixed * upstream/master: DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)
git diff upstream/master -u -- "*.py" | flake8 --diff
Very simple implementation at the moment. The thought here is to introduce this method and perhaps subsequently extend to allow for string concatenation of the elements. Longer term there could also be a keyword added to
.agg
of GroupBy which will dispatch to this instead of simply returning a MultiIndex column, which could alleviate some of the pain users are experience when trying to rename columns after an aggregation.@TomAugspurger and @jorisvandenbossche from the dev chat today