BUG: inconsistent naming when combining indices of various types #35847

iamlemec · 2020-08-22T05:52:00Z

This is building off of some issues seen in #13475 and subsequent PR. Essentially, the resulting index name when combining existing named indices with union_indexes is not consistent across index types. This primarily affects concat and the DataFrame constructor, which call union_indexes.

When combining named indices, there are three main name resolution rules I can think of:

ignore: assign no name to output
unanimous: assign name if all names agree
consensus: assign name only if one unique non-null name

With some testing, below is my best understanding of what resolution rule the various index types use. Note that the behavior may differ depending on whether the indices are numerically equal or not, as with RangeIndex. For the not numerically equal case:

Index: consensus
RangeIndex: ignore
Int64Index: unanimous
Float64Index: unanimous
DateTimeIndex: consensus
TimeDeltaIndex: consensus
PeriodIndex: unanimous
CategoricalIndex: unanimous
MultiIndex: unanimous (over all levels)

I'm not really taking a stand on the correct name resolution rule, but I think they should at least be consistent across index type! And of course MultiIndex is a bit more complicated. Seems possible things could be implemented in common for non-multi-indices in the higher level union function? I'm not really sure, but I'm happy to put some work into it.

The test is slightly different for each index type, but for the RangeIndex case, here's an MWE:

idx1 = pd.RangeIndex(0, 5, name='idx')
idx2 = pd.RangeIndex(2, 7, name='idx')
pd.core.indexes.api.union_indexes([idx1, idx2])

and the result will have no name (checked on master).

The text was updated successfully, but these errors were encountered:

jreback · 2020-08-23T00:30:46Z

all index combinations should have the consensus name or None (if different)

am pretty sure we test this

but it's possible it's not fully tested on - happy to have a PR to do this and isolate and edge cases

iamlemec · 2020-08-23T07:14:26Z

Ok, sounds good. Just looking at the RangeIndex case, there doesn't appear to be testing of union or naming, so I can go through and add those in as needed.

As for the implementation, would it be okay to add common naming logic in Index.union rather than reimplement it in _union for each subclass?

The other thing is that these same issue arise with other operations such as intersection. These tend to use get_op_result_name for name resolution. However, that appears to use the unanimous rule. It sort of tries to be consensus, but since it uses hasattr(obj, name) instead of obj.name is not None, it doesn't quite get there.

iamlemec · 2020-08-26T01:19:45Z

Ok, seems like these changes are feasible. Working through getting testing 100% now. Just to double check, we're saying that all of union, intersection, difference, and symmetric_difference should use consensus naming? It seems that indexes/test_base.py::TestIndex::test_intersection_name_preservation currently expects naming to be unanimous for intersection, so I just want to make sure.

iamlemec added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 22, 2020

dsaxton added API - Consistency Internal Consistency of API/Behavior Needs Discussion Requires discussion from core team before further action and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 22, 2020

dsaxton mentioned this issue Aug 24, 2020

groupby AssertionError with datetime column name #35876

Closed

3 tasks

iamlemec mentioned this issue Sep 17, 2020

fix inconsistent index naming with union/intersect #35847 #36413

Merged

5 tasks

jreback added this to the 1.2 milestone Sep 19, 2020

jreback closed this as completed in #36413 Oct 7, 2020

DanielFEvans mentioned this issue Oct 28, 2020

BUG: pd.concat(..., axis="columns") inconsistently keeps/drops index name #37464

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: inconsistent naming when combining indices of various types #35847

BUG: inconsistent naming when combining indices of various types #35847

iamlemec commented Aug 22, 2020 •

edited

Loading

jreback commented Aug 23, 2020

iamlemec commented Aug 23, 2020

iamlemec commented Aug 26, 2020

BUG: inconsistent naming when combining indices of various types #35847

BUG: inconsistent naming when combining indices of various types #35847

Comments

iamlemec commented Aug 22, 2020 • edited Loading

jreback commented Aug 23, 2020

iamlemec commented Aug 23, 2020

iamlemec commented Aug 26, 2020

iamlemec commented Aug 22, 2020 •

edited

Loading