0.23.1 concat drops the name of the merge axis when not aligned #21629

pdemarti · 2018-06-25T21:54:28Z

Code Sample, a copy-pastable example if possible

When the columns are aligned, no problem, the columns in the result have the correct name ('ID' here).

pd.concat([
    pd.DataFrame([[0, 1]], index=['r0'], columns=pd.Index(['a', 'b'], name='ID')),
    pd.DataFrame([[2, 3]], index=['r1'], columns=pd.Index(['a', 'b'], name='ID')),
], sort=True)

# out:
# ID  a  b
# r0  0  1
# r1  2  3

However, when the columns are not aligned, then the name seems to disappear:

pd.concat([
    pd.DataFrame([[0, 1]], index=['r0'], columns=pd.Index(['a', 'b'], name='ID')),
    pd.DataFrame([[2, 3]], index=['r1'], columns=pd.Index(['a', 'c'], name='ID')),
], sort=True)

# out:
#     a    b    c      <-- notice how the columns have lost their name ('ID').
# r0  0  1.0  NaN
# r1  2  NaN  3.0

Problem description

When concatenating DataFrames, I expect the non-concatenating axis (the columns axis, in the examples above) to keep its name(s).

An interesting question occurs if the instances of the non-concatenating axis are not only misaligned, but also have different names. In that case, we could use the majority value or drop altogether (None). In our code, we use names = collections.Counter([df.axes[nc_axis].names for df in objs]).most_common(1)[0][0].

Expected Output

pd.concat([
    pd.DataFrame([[0, 1]], index=['r0'], columns=pd.Index(['a', 'b'], name='ID')),
    pd.DataFrame([[2, 3]], index=['r1'], columns=pd.Index(['a', 'c'], name='ID')),
], sort=True)

# out:
# ID  a    b    c      <-- name 'ID' should be retained, since there is no ambiguity
# r0  0  1.0  NaN
# r1  2  NaN  3.0

# notice how the columns have lost their name ('ID').

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-77-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.1
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.5
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.1.3
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-06-26T00:21:46Z

I think there's a typo in your first example. That said, what is the use case for this? Seems kind of counter-intuitive to me to have two different Index objects with the same name. You could just as easily assign that name after the concat

jorisvandenbossche · 2018-06-26T07:51:34Z

Fixed the typo.

what is the use case for this? Seems kind of counter-intuitive to me to have two different Index objects with the same name.

The use case can be to keep the name, if you have an identical name. There can be many reasons that for some reason the dataframes you want to concat got somehow mis-aligned.

However, I don't know to what extent we have prior art in pandas with regard to keeping the name or not if the indexes are not identical.

At least union seems to keep it:

In [24]: pd.Index(['a', 'b'], name='ID').union(pd.Index(['a', 'c'], name='ID'))
Out[24]: Index(['a', 'b', 'c'], dtype='object', name='ID')

which seems an indication to me that concat could also keep the name ?

pdemarti · 2018-06-26T15:07:57Z

Thanks for fixing the typo.

There are many use cases. For us, the most prevalent one is when we deal with large multivariate time-series. We split them by time (the Index) for easier storage and update (typically the last few slices are most frequently updated). The columns are in the thousands, and typically their intersection is at least 99% of their union. When we concatenate these frames, we would like the name of the axis to remain.

That said, I was under the false impression that the behavior had changed in 0.23, but it is not the case (I checked many versions from 0.15.0 to 0.22.0). The reason I thought that was, in previous versions, our code was different and just building the index union by itself, then reindex all frames and then only concat (this was faster). As @jorisvandenbossche pointed out, index.union() keeps the index name.

We had to change that part in response to the way 0.23 concat now handles mis-aligned non-concatenating index. I still believe the behavior should be to retain the name of the index during concat. Either take the first one (as index0.union(index1).union(index2)... does) or by taking the majority name (or the single name if they are all the same and None otherwise).

FANGOD · 2018-08-25T08:08:36Z

If the index of multiple df is different, copy the index name of the first df to the df after concat, no matter whether the index of multiple df is different or the index name is different, it is feasible.Of course, it doesn't solve the problem fundamentally.

dsm054 · 2018-11-13T21:20:06Z

Is this the same as #13475? I was working on a PR for that one and it seems to handle this case as well.

0anton · 2019-06-15T10:21:13Z

same here on pandas 0.24.2

phofl · 2020-09-11T14:55:33Z

Closing as duplicate of #13475, was fixed with that pr

WillAyd added the Needs Info Clarification about behavior needed to assess issue label Jun 26, 2018

jorisvandenbossche removed the Needs Info Clarification about behavior needed to assess issue label Jun 26, 2018

jorisvandenbossche added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jun 26, 2018

WillAyd mentioned this issue Aug 24, 2018

Loss of index name when using concat function #22495

Closed

WillAyd mentioned this issue Jun 27, 2019

pandas.concat removes the name of the index column for string index columns #27053

Closed

WillAyd mentioned this issue Jul 4, 2019

Dataframe loses level name after concat, only if columns types are str #27230

Closed

WillAyd added this to the Contributions Welcome milestone Jul 4, 2019

phofl closed this as completed Sep 11, 2020

phofl added Duplicate Report Duplicate issue or pull request and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 11, 2020

phofl modified the milestones: Contributions Welcome, No action Sep 11, 2020

phofl added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Sep 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.23.1 concat drops the name of the merge axis when not aligned #21629

0.23.1 concat drops the name of the merge axis when not aligned #21629

pdemarti commented Jun 25, 2018 •

edited by jorisvandenbossche

Loading

WillAyd commented Jun 26, 2018

jorisvandenbossche commented Jun 26, 2018

pdemarti commented Jun 26, 2018

FANGOD commented Aug 25, 2018

dsm054 commented Nov 13, 2018 •

edited

Loading

0anton commented Jun 15, 2019

phofl commented Sep 11, 2020

0.23.1 concat drops the name of the merge axis when not aligned #21629

0.23.1 concat drops the name of the merge axis when not aligned #21629

Comments

pdemarti commented Jun 25, 2018 • edited by jorisvandenbossche Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

WillAyd commented Jun 26, 2018

jorisvandenbossche commented Jun 26, 2018

pdemarti commented Jun 26, 2018

FANGOD commented Aug 25, 2018

dsm054 commented Nov 13, 2018 • edited Loading

0anton commented Jun 15, 2019

phofl commented Sep 11, 2020

pdemarti commented Jun 25, 2018 •

edited by jorisvandenbossche

Loading

Output of `pd.show_versions()`

dsm054 commented Nov 13, 2018 •

edited

Loading