Reading from Excel mangles columns #10523

atheyjohnc · 2015-07-07T13:52:27Z

As far as I can tell, both methods of reading an Excel sheet (through ExcelFile and read_excel) do not have the option to avoid mangling duplicate columns, which exists for read_csv (#3468).

I have an Excel file that looks like this:

  foo  foo  bar  bar  baz  baz
    A    B    A    B    A    B
  123  456  789   12  345  678
  901  234  567  890  123  456
  789   12  345  678  901  234

I initially encountered this problem while trying to find a workaround for not being able to specify a multirow-header (#4679). Reading in this Excel file with header = 0 (default) mangles the column names:

>>> df = pd.read_excel("dupe_cols.xlsx", header = 0)
>>> print df
   foo foo.1  bar bar.1  baz baz.1
0    A     B    A     B    A     B
1  123   456  789    12  345   678
2  901   234  567   890  123   456
3  789    12  345   678  901   234

Unlike read_csv, read_excel does not have the option to avoid mangling duplicate columns (using ExcelFile.parse works the same as far as I can see). Specifying header = None in read_excel and then assigning the column names to the first row will effectively allow you to avoid mangling the column names. You could also read in the Excel sheet with header = None, save a .csv with header = False and index = False, then read that csv and specify mangle_dupe_cols = False to get the dataframe you want. (Incidentally, this also allows you to specify multiple rows as the header, which is the behavior I was originally trying to emulate.)

The text was updated successfully, but these errors were encountered:

jreback · 2015-07-07T15:55:47Z

can you post a small example that can be used as a tests in the top section (e.g. create a frame, write to excel), show the arguments used to read back

gfyoung · 2018-11-13T22:36:42Z

Because we pass in **kwds from read_excel to TextFileReader, we actually do accept mangle_dupe_cols now. 🎉 We just need to surface it explicitly.

Otherwise, this issue is superseded by #13262.

xref pandas-devgh-10523.

xref gh-10523.

xref pandas-devgh-10523.

geminizb · 2019-04-15T09:53:44Z

I'm afraid ExcelFile.parse is not changed yet

dhimmel · 2020-04-06T22:59:08Z

With pandas 1.0.3 I got Setting mangle_dupe_cols=False is not supported yet.

~/anaconda3/lib/python3.6/site-packages/pandas/io/excel/_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
    332         convert_float=convert_float,
    333         mangle_dupe_cols=mangle_dupe_cols,
--> 334         **kwds,
    335     )
    336 

~/anaconda3/lib/python3.6/site-packages/pandas/io/excel/_base.py in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
    886             convert_float=convert_float,
    887             mangle_dupe_cols=mangle_dupe_cols,
--> 888             **kwds,
    889         )
    890 

~/anaconda3/lib/python3.6/site-packages/pandas/io/excel/_base.py in parse(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
    510                     usecols=usecols,
    511                     mangle_dupe_cols=mangle_dupe_cols,
--> 512                     **kwds,
    513                 )
    514 

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in TextParser(*args, **kwds)
   2199     """
   2200     kwds["engine"] = "python"
-> 2201     return TextFileReader(*args, **kwds)
   2202 
   2203 

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    865         self._currow = 0
    866 
--> 867         options = self._get_options_with_defaults(engine)
    868 
    869         self.chunksize = options.pop("chunksize", None)

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _get_options_with_defaults(self, engine)
    893             # see gh-12935
    894             if argname == "mangle_dupe_cols" and not value:
--> 895                 raise ValueError("Setting mangle_dupe_cols=False is not supported yet")
    896             else:
    897                 options[argname] = value

ValueError: Setting mangle_dupe_cols=False is not supported yet

talatccan · 2020-10-12T09:11:05Z

I got ValueError: Setting mangle_dupe_cols=False is not supported yet for pandas version 1.0.4

Code:

df = pd.read_excel(DATA_PATH, mangle_dupe_cols=False)

Error Details:

 File "site-packages\pandas\io\excel\_base.py", line 334, in read_excel
    **kwds,
  File "site-packages\pandas\io\excel\_base.py", line 888, in parse
    **kwds,
  File "site-packages\pandas\io\excel\_base.py", line 512, in parse
    **kwds,
  File "site-packages\pandas\io\parsers.py", line 2201, in TextParser
    return TextFileReader(*args, **kwds)
  File "site-packages\pandas\io\parsers.py", line 867, in __init__
    options = self._get_options_with_defaults(engine)
  File "site-packages\pandas\io\parsers.py", line 895, in _get_options_with_defaults
    raise ValueError("Setting mangle_dupe_cols=False is not supported yet")
ValueError: Setting mangle_dupe_cols=False is not supported yet

ronald8192 · 2021-01-06T09:18:38Z

Using pandas-1.2.0, got an error: ValueError: Setting mangle_dupe_cols=False is not supported yet

>>> pd.read_excel('test.xlsx',mangle_dupe_cols=False)
<stdin>:1: FutureWarning: Your version of xlrd is 1.2.0. In xlrd >= 2.0, only the xls format is supported. As a result, the openpyxl engine will be used if it is installed and the engine argument is not specified. Install openpyxl instead.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/util/_decorators.py", line 299, in wrapper
    return func(*args, **kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 344, in read_excel
    data = io.parse(
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 1153, in parse
    return self._reader.parse(
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 532, in parse
    parser = TextParser(
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 2224, in TextParser
    return TextFileReader(*args, **kwds)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 801, in __init__
    options = self._get_options_with_defaults(engine)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 829, in _get_options_with_defaults
    raise ValueError("Setting mangle_dupe_cols=False is not supported yet")
ValueError: Setting mangle_dupe_cols=False is not supported yet

jreback added Enhancement IO Excel read_excel, to_excel Difficulty Intermediate labels Jul 7, 2015

jreback added this to the Next Major Release milestone Jul 7, 2015

gfyoung modified the milestones: Contributions Welcome, No action Nov 13, 2018

gfyoung closed this as completed Nov 13, 2018

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 13, 2018

DOC: Surface / doc mangle_dupe_cols in read_excel

5954940

xref pandas-devgh-10523.

gfyoung mentioned this issue Nov 13, 2018

DOC: Surface / doc mangle_dupe_cols in read_excel #23678

Merged

jreback pushed a commit that referenced this issue Nov 14, 2018

DOC: Surface / doc mangle_dupe_cols in read_excel (#23678)

2d2b214

xref gh-10523.

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this issue Nov 14, 2018

DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678)

7dab45f

xref pandas-devgh-10523.

tm9k1 pushed a commit to tm9k1/pandas that referenced this issue Nov 19, 2018

DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678)

45eecbc

xref pandas-devgh-10523.

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678)

775ee04

xref pandas-devgh-10523.

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678)

e64939e

xref pandas-devgh-10523.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading from Excel mangles columns #10523

Reading from Excel mangles columns #10523

atheyjohnc commented Jul 7, 2015

jreback commented Jul 7, 2015

gfyoung commented Nov 13, 2018 •

edited

Loading

geminizb commented Apr 15, 2019

dhimmel commented Apr 6, 2020

talatccan commented Oct 12, 2020

ronald8192 commented Jan 6, 2021

Reading from Excel mangles columns #10523

Reading from Excel mangles columns #10523

Comments

atheyjohnc commented Jul 7, 2015

jreback commented Jul 7, 2015

gfyoung commented Nov 13, 2018 • edited Loading

geminizb commented Apr 15, 2019

dhimmel commented Apr 6, 2020

talatccan commented Oct 12, 2020

ronald8192 commented Jan 6, 2021

gfyoung commented Nov 13, 2018 •

edited

Loading