-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading from Excel mangles columns #10523
Comments
can you post a small example that can be used as a tests in the top section (e.g. create a frame, write to excel), show the arguments used to read back |
Because we pass in Otherwise, this issue is superseded by #13262. |
I'm afraid ExcelFile.parse is not changed yet |
With pandas 1.0.3 I got ~/anaconda3/lib/python3.6/site-packages/pandas/io/excel/_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
332 convert_float=convert_float,
333 mangle_dupe_cols=mangle_dupe_cols,
--> 334 **kwds,
335 )
336
~/anaconda3/lib/python3.6/site-packages/pandas/io/excel/_base.py in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
886 convert_float=convert_float,
887 mangle_dupe_cols=mangle_dupe_cols,
--> 888 **kwds,
889 )
890
~/anaconda3/lib/python3.6/site-packages/pandas/io/excel/_base.py in parse(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
510 usecols=usecols,
511 mangle_dupe_cols=mangle_dupe_cols,
--> 512 **kwds,
513 )
514
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in TextParser(*args, **kwds)
2199 """
2200 kwds["engine"] = "python"
-> 2201 return TextFileReader(*args, **kwds)
2202
2203
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
865 self._currow = 0
866
--> 867 options = self._get_options_with_defaults(engine)
868
869 self.chunksize = options.pop("chunksize", None)
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _get_options_with_defaults(self, engine)
893 # see gh-12935
894 if argname == "mangle_dupe_cols" and not value:
--> 895 raise ValueError("Setting mangle_dupe_cols=False is not supported yet")
896 else:
897 options[argname] = value
ValueError: Setting mangle_dupe_cols=False is not supported yet |
I got Code: df = pd.read_excel(DATA_PATH, mangle_dupe_cols=False) Error Details: File "site-packages\pandas\io\excel\_base.py", line 334, in read_excel
**kwds,
File "site-packages\pandas\io\excel\_base.py", line 888, in parse
**kwds,
File "site-packages\pandas\io\excel\_base.py", line 512, in parse
**kwds,
File "site-packages\pandas\io\parsers.py", line 2201, in TextParser
return TextFileReader(*args, **kwds)
File "site-packages\pandas\io\parsers.py", line 867, in __init__
options = self._get_options_with_defaults(engine)
File "site-packages\pandas\io\parsers.py", line 895, in _get_options_with_defaults
raise ValueError("Setting mangle_dupe_cols=False is not supported yet")
ValueError: Setting mangle_dupe_cols=False is not supported yet |
Using pandas-1.2.0, got an error: >>> pd.read_excel('test.xlsx',mangle_dupe_cols=False)
<stdin>:1: FutureWarning: Your version of xlrd is 1.2.0. In xlrd >= 2.0, only the xls format is supported. As a result, the openpyxl engine will be used if it is installed and the engine argument is not specified. Install openpyxl instead.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/util/_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 344, in read_excel
data = io.parse(
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 1153, in parse
return self._reader.parse(
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 532, in parse
parser = TextParser(
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 2224, in TextParser
return TextFileReader(*args, **kwds)
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 801, in __init__
options = self._get_options_with_defaults(engine)
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 829, in _get_options_with_defaults
raise ValueError("Setting mangle_dupe_cols=False is not supported yet")
ValueError: Setting mangle_dupe_cols=False is not supported yet |
As far as I can tell, both methods of reading an Excel sheet (through ExcelFile and read_excel) do not have the option to avoid mangling duplicate columns, which exists for read_csv (#3468).
I have an Excel file that looks like this:
I initially encountered this problem while trying to find a workaround for not being able to specify a multirow-header (#4679). Reading in this Excel file with header = 0 (default) mangles the column names:
Unlike
read_csv
,read_excel
does not have the option to avoid mangling duplicate columns (using ExcelFile.parse works the same as far as I can see). Specifying header = None inread_excel
and then assigning the column names to the first row will effectively allow you to avoid mangling the column names. You could also read in the Excel sheet with header = None, save a .csv with header = False and index = False, then read that csv and specify mangle_dupe_cols = False to get the dataframe you want. (Incidentally, this also allows you to specify multiple rows as the header, which is the behavior I was originally trying to emulate.)The text was updated successfully, but these errors were encountered: