BUG: index.name not preserved in concat in case of unequal object index #13475

dllahr · 2016-06-17T18:17:26Z

xref #13742 for addl cases.

In [23]: df1 = pd.DataFrame({'a':[1,2]}, index=pd.Index(['a', 'b'], name='idx'))

In [24]: df2 = pd.DataFrame({'b':[2,3]}, index=pd.Index(['b', 'c'], name='idx'))

In [26]: pd.concat([df1, df2], axis=1)
Out[26]:
     a    b
a  1.0  NaN
b  2.0  2.0
c  NaN  3.0

In [27]: print pd.concat([df1, df2], axis=1).index.name
None

So the issue seems to be with a string index that is not equal, as when the index of the two frames is equal (no NaNs are introduced), the name is kept and also when using numerical indexes, see #13475 (comment)

When I use the concat function with input dataframes that have index.name assigned, sometimes the resulting dataframe has the index.name assigned, sometimes it does not.

I ran the code below from the python interpreter, using a conda environment with pandas-0.18.1

I don't see any odd / extra characters around the "pert_well" column in the files between the files.

Code Sample, a copy-pastable example if possible

import pandas

a_data = """x_amount_mg x_annotation    x_mmoles_per_liter  mfc_plate_name  x_avg_mol_weight    x_volume_ul pert_mfc_desc   pert_iname  x_purity    pert_id_vendor  pert_well   pert_vehicle    pert_mfc_id x_smiles    x_mg_per_ml pert_dose_unit  pert_dose   pert_id pert_plate  pert_type
0.04784 ACCEPT  10.0    B-REPO-01-B64-101   405.4084    11  Taltirelin  Taltirelin  86.52   HY-B0596    C18 DMSO    BRD-K93869735-001-01-1  CN1C(=O)C[C@H](NC1=O)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N1CCC[C@H]1C(N)=O    4.054084    um  20.0    BRD-K93869735   PMEL008 trt_cp"""

b_data = """pert_well   pert_2_type pert_2_id   pert_2_mfc_id   pert_2_mfc_desc pert_2_id_vendor    pert_2_iname    pert_2_dose pert_2_dose_unit    pert_2_vehicle  pert_3_type pert_3_idpert_3_mfc_id  pert_3_mfc_desc pert_3_id_vendor    pert_3_iname    pert_3_dose pert_3_dose_unit    pert_3_vehicle
A01 ctl_vehicle DMSO    DMSO    DMSO    -666    DMSO    -666    -666    -666    ctl_untrt   CMAP-000    -666    UnTrt   -666    -666    -666    -666    -666"""

d_data = """x_amount_mg x_annotation    x_mmoles_per_liter  mfc_plate_name  x_avg_mol_weight    x_volume_ul pert_mfc_desc   pert_iname  x_purity    pert_id_vendor  pert_well   pert_vehicle    pert_mfc_id x_smiles    x_mg_per_ml pert_dose_unit  pert_dose   pert_id pert_plate  pert_type
0.0 -666    -666    B-REPO-01-B64-107   -666    0   -666    -666    -666    -666    A01 -666    -666    -666    -666    -666    -666    CMAP-000    PMEL001 ctl_untrt"""

a = pandas.read_csv(StringIO(a_data), sep="\t", index_col="pert_well")
b = pandas.read_csv(StringIO(b_data), sep="\t", index_col="pert_well")
c = pandas.concat([a,b], axis=1)
c.index

d = pandas.read_csv(StringIO(d_data), sep="\t", index_col="pert_well")
e = pandas.concat([d,b], axis=1)
e.index

results:

Index([u'A01', u'A02', u'A03', u'A04', u'A05', u'A06', u'A07', u'A08', u'A09',
       u'A10',
       ...
       u'P15', u'P16', u'P17', u'P18', u'P19', u'P20', u'P21', u'P22', u'P23',
       u'P24'],
      dtype='object', length=384)

Index([u'A01', u'A02', u'A03', u'A04', u'A05', u'A06', u'A07', u'A08', u'A09',
       u'A10',
       ...
       u'P15', u'P16', u'P17', u'P18', u'P19', u'P20', u'P21', u'P22', u'P23',
       u'P24'],
      dtype='object', name=u'pert_well', length=384)

Expected Output

c.index.name should be "pert_well"

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-573.7.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 23.0.0
Cython: None
numpy: 1.11.0
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

PMEL_input_files_for_pandas_issue.zip

The text was updated successfully, but these errors were encountered:

jreback · 2016-06-17T18:18:59Z

pls make this copy pastable rather than using actual files

dllahr · 2016-06-17T18:29:10Z

Do you want me to paste in tab-delimited text?

dllahr · 2016-06-17T18:50:39Z

Done.

jreback · 2016-06-17T19:32:00Z

@dllahr pls edit so I can literally copy and past it,
e.g. somethign like

import pandas as pd
from StringIO import StringIO
data = """
......
"""
df = pd.read_csv(StringIO(data))

then one can simply copy and paste. This is most useful for reproductions, but also to facilitate checking if this is even an existing bug.

vkcelik · 2016-06-21T17:19:06Z

I think I have the same problem. Should be reproducible with this code:

import pandas as pd
try:
  from io import StringIO
except ImportError:
  from StringIO import StringIO

data1 = "Cores\tServer\n20,000\tS000\n-20,000\tS003\n16,000\tS140\n2,000\tS148\n2,000\tS149\n"

data2 = "Cores\tServer\n20,000\tS103\n16,000\tS140\n2,000\tS148\n2,000\tS149\n4,000\tS150\n"

df1 = pd.read_csv(StringIO(data1), sep='\t', index_col=['Server'], decimal=',')
df2 = pd.read_csv(StringIO(data2), sep='\t', index_col=['Server'], decimal=',')

df1.rename(columns=lambda x: x + '_1', inplace=True)
df2.rename(columns=lambda x: x + '_2', inplace=True)

joined = pd.concat([df1, df2], axis=1)

print(df1)
print(df2)
print(joined)
print(joined.index.name)
print(joined.index.names)

I expected the output of print(joined.index.names) to be ['Server'], but it is [None].

Can anybody reproduce this? Is this expected behavior, and if so why?

Output of pd.show_versions():

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 37 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: 1.3.7
pip: None
setuptools: 19.6.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.0.3
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
Jinja2: 2.8

dllahr · 2016-07-12T17:21:59Z

Can you remove the can't Repro label, because the new comment has the formatting you require?

jreback · 2016-07-12T17:26:44Z

@dllahr how so? need an example that one can simply literaally copy-paste. Don't use files.
Instead

data = """
filedata you need
"""
df = pd.read_csv(StringIO(data))
.....

jreback · 2016-07-12T17:27:59Z

@dllahr pls put this actual example and replace the top part of the PR in that case. Its too confusing.

dllahr · 2016-07-12T17:32:08Z

Well there's the comment by vkcelik before I did anything.

dllahr · 2016-07-12T17:34:32Z

How about now?

jreback · 2016-07-13T00:40:08Z

the tabs don't reproduce when copy-pasting. and the example should be much simpler. The idea is to make this into a test. If its too much effort it won't happen.

dllahr · 2016-07-13T03:58:18Z

Awesome. If the problem is related to parsing the tab delimited input, then how are you going to reproduce it? I've given you more than enough information to reproduce this bug, if you choose to ignore it, that's your business. I'll keep that in mind next time someone asks me about Pandas.

jreback · 2016-07-13T04:16:43Z

@dllahr that's quite a poor attitude

dllahr · 2016-07-13T04:34:30Z

I agree you are displaying a poor attitude.

shoyer · 2016-07-13T05:43:19Z

@dllahr please see http://stackoverflow.com/help/mcve

Tools like pandas are successful because of community contributions. We get a lot of bug reports, so every bit of help making an issue easier to reproduce helps.

jorisvandenbossche · 2016-07-13T09:56:59Z

@dllahr As an example, I tried to reproduce this with a simple example:

In [23]: df1 = pd.DataFrame({'a':[1,2]}, index=pd.Index(['a', 'b'], name='idx'))

In [24]: df2 = pd.DataFrame({'b':[2,3]}, index=pd.Index(['b', 'c'], name='idx'))

In [26]: pd.concat([df1, df2], axis=1)
Out[26]:
     a    b
a  1.0  NaN
b  2.0  2.0
c  NaN  3.0

In [27]: print pd.concat([df1, df2], axis=1).index.name
None

So the issue seems to be with a string index that is not equal, as when the index of the two frames is equal (no NaNs are introduced), the name is kept:

In [29]: df2.index = pd.Index(['a', 'b'], name='idx')

In [30]: pd.concat([df1, df2], axis=1)
Out[30]:
     a  b
idx
a    1  2
b    2  3

In [31]: print pd.concat([df1, df2], axis=1).index.name
idx

and when using a numerical index, the name is also kept:

In [32]: df1.index = pd.Index([0, 1], name='idx')

In [33]: df2.index = pd.Index([1, 2], name='idx')

In [34]: pd.concat([df1, df2], axis=1)
Out[34]:
       a    b
idx
0    1.0  NaN
1    2.0  2.0
2    NaN  3.0

In [35]: print pd.concat([df1, df2], axis=1).index.name
idx

dllahr · 2016-07-13T12:08:15Z

@jorisvandenbossche thank you for finding a much better example.

@shoyer Thank you for that, but I provided the minimal example that I could deduce given my time constraints and abilities. If I could have provided a simpler example, I would have. As some feedback for you (collectively), given this "experience" I'm pretty sure next time I just won't bother reporting any bug or issue I find.

TomAugspurger · 2016-07-13T12:44:22Z

Apologies if any of that came of brusque @dllahr, trying to manage the crazy numbers of issues efficiently. Need to leverage your first-hand experience with the bug as much as possible to diagnose the issue. Thanks for the report!

0anton · 2019-06-15T10:18:56Z

experiencing the same bug on pandas 0.24.2

iamlemec · 2020-04-21T19:05:21Z

I believe I have a 2 line fix to union_indexes that takes care of this. Should I submit a PR or just paste the diff here?

jreback added the Can't Repro label Jun 17, 2016

jorisvandenbossche added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Can't Repro labels Jul 13, 2016

jorisvandenbossche changed the title ~~index.name not assigned sometimes when using concat~~ BUG: index.name not preserved in concat in case of unequal object index Jul 13, 2016

jreback added this to the Next Major Release milestone Jul 13, 2016

jreback mentioned this issue Jul 21, 2016

Can it impossible to fill names parameter of concat() when keys is specified with Index/MultiIndex object? #13742

Closed

jreback added Difficulty Intermediate labels Jul 21, 2016

dsm054 mentioned this issue Nov 13, 2018

0.23.1 concat drops the name of the merge axis when not aligned #21629

Closed

jbrockmendel removed the Difficulty Intermediate label Oct 21, 2019

jbrockmendel removed the Effort Low label Oct 21, 2019

iamlemec mentioned this issue Jul 18, 2020

BUG: assign consensus name to index union in array case GH13475 #35338

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.2 Aug 6, 2020

jreback added Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses labels Aug 7, 2020

jreback closed this as completed in #35338 Aug 7, 2020

iamlemec mentioned this issue Aug 22, 2020

BUG: inconsistent naming when combining indices of various types #35847

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: index.name not preserved in concat in case of unequal object index #13475

BUG: index.name not preserved in concat in case of unequal object index #13475

dllahr commented Jun 17, 2016 •

edited by jreback

Loading

jreback commented Jun 17, 2016

dllahr commented Jun 17, 2016

dllahr commented Jun 17, 2016

jreback commented Jun 17, 2016

vkcelik commented Jun 21, 2016 •

edited

Loading

dllahr commented Jul 12, 2016

jreback commented Jul 12, 2016

jreback commented Jul 12, 2016

dllahr commented Jul 12, 2016

dllahr commented Jul 12, 2016

jreback commented Jul 13, 2016

dllahr commented Jul 13, 2016

jreback commented Jul 13, 2016

dllahr commented Jul 13, 2016

shoyer commented Jul 13, 2016

jorisvandenbossche commented Jul 13, 2016

dllahr commented Jul 13, 2016

TomAugspurger commented Jul 13, 2016

0anton commented Jun 15, 2019 •

edited

Loading

iamlemec commented Apr 21, 2020

BUG: index.name not preserved in concat in case of unequal object index #13475

BUG: index.name not preserved in concat in case of unequal object index #13475

Comments

dllahr commented Jun 17, 2016 • edited by jreback Loading

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Jun 17, 2016

dllahr commented Jun 17, 2016

dllahr commented Jun 17, 2016

jreback commented Jun 17, 2016

vkcelik commented Jun 21, 2016 • edited Loading

dllahr commented Jul 12, 2016

jreback commented Jul 12, 2016

jreback commented Jul 12, 2016

dllahr commented Jul 12, 2016

dllahr commented Jul 12, 2016

jreback commented Jul 13, 2016

dllahr commented Jul 13, 2016

jreback commented Jul 13, 2016

dllahr commented Jul 13, 2016

shoyer commented Jul 13, 2016

jorisvandenbossche commented Jul 13, 2016

dllahr commented Jul 13, 2016

TomAugspurger commented Jul 13, 2016

0anton commented Jun 15, 2019 • edited Loading

iamlemec commented Apr 21, 2020

dllahr commented Jun 17, 2016 •

edited by jreback

Loading

output of `pd.show_versions()`

vkcelik commented Jun 21, 2016 •

edited

Loading

0anton commented Jun 15, 2019 •

edited

Loading