ExcelFile class has no attribute 'encoding'. Is it correct? #25523

Dimaskuaskus · 2019-03-03T17:00:27Z

Code Sample

import os
import pandas as pd

directory = '/Data'
# more than 300 files
for d, dirs, files in os.walk(directory):
    # i can not use read_excel function because i do not know how many sheets in each xls
    # i use instance of ExcelFile
    # in fact it returns xlrd.Book instance (because engine: Acceptable values are None or ``xlrd``)
    try
        current_file = pd.ExcelFile(os.path.join(d,f))
        #do not work. My files use cp1251 charset. See more in output
    except UnicodeDecodeError:
        pass

    sheet_names_ = current_file.sheet_names()
    for sn in current_file.sheet_names():
        # this Data Frame i shoud parse and import to my Pstgresql DB
        current_file.parse(sheet_name=sn)

Problem description

I have a lot of XLS files encoded with cp1251. Each file contains 1 or more sheets. I need to import them into my Postgesql DB. But Pandas can not do it simple. I can load one sheet (list of, but i don`t have it) at a time in any encoding using pandas.read_excel(file,encoding=my_encoding). Or I can load all sheets from all files in UTF encoding using instance of pandas.ExcelFile(file). It looks like some bad recursion.
I can use xlrd to read my files with specific charset to get xlrd.Book instance. Then pass it to pandas.read_excel(). But I do not like this decision.

If pandas.ExcelFile() would have an attribute of the encoding, then it could be passed to any "engine". I saw a new version of excel package. The same problem.

(I dont have MS Excel and can not create "bad" file for test)
Sorry for my english(((

Expected Output

Output pandas/xlrd Error

*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'
Traceback (most recent call last):
File "TableLoader.py", line 13, in
c = ExelLoader()
File "TableLoader.py", line 10, in ExelLoader
my_xls_x = pandas.read_excel('Data/qq.xls')
File "/home/dz/.local/lib/python3.6/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/home/dz/.local/lib/python3.6/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/home/dz/.local/lib/python3.6/site-packages/pandas/io/excel.py", line 350, in read_excel
io = ExcelFile(io, engine=engine)
File "/home/dz/.local/lib/python3.6/site-packages/pandas/io/excel.py", line 653, in init
self._reader = self._enginesengine
File "/home/dz/.local/lib/python3.6/site-packages/pandas/io/excel.py", line 424, in init
self.book = xlrd.open_workbook(filepath_or_buffer)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/init.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/book.py", line 120, in open_workbook_xls
bk.get_sheets()
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/book.py", line 723, in get_sheets
self.get_sheet(sheetno)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/book.py", line 714, in get_sheet
sh.read(self)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/sheet.py", line 820, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/biffh.py", line 250, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/timemachine.py", line 31, in
unicode = lambda b, enc: b.decode(enc)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 0: ordinal not in range(128)

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.7.final.0
python-bits: 32
OS: Linux
OS-release: 4.15.0-45-generic
machine: i686
processor: i686
byteorder: little
LC_ALL: None
LANG: ru_RU.UTF-8
LOCALE: ru_RU.UTF-8

pandas: 0.24.1
pytest: None
pip: 9.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-03-04T02:12:35Z

This is very closely related to #23444 @gfyoung

gfyoung · 2019-03-04T03:28:57Z

@WillAyd : I would be inclined to call this a duplicate in fact, since the underlying Enhancement request is the same, even though the premise is different.

WillAyd · 2021-01-19T18:00:55Z

Closing as a duplicate

WillAyd added the IO Excel read_excel, to_excel label Mar 4, 2019

WillAyd added this to the Contributions Welcome milestone Mar 4, 2019

gfyoung added the Enhancement label Mar 4, 2019

Hugovdberg mentioned this issue Mar 14, 2019

Excel Document Passing Kwargs to Engine #25723

Closed

twoertwein mentioned this issue Aug 16, 2020

BUG: read_excel not accepting encoding on 1.1.0 #35753

Closed

3 tasks

WillAyd added the Duplicate Report Duplicate issue or pull request label Jan 19, 2021

WillAyd closed this as completed Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExcelFile class has no attribute 'encoding'. Is it correct? #25523

ExcelFile class has no attribute 'encoding'. Is it correct? #25523

Dimaskuaskus commented Mar 3, 2019

INSTALLED VERSIONS

WillAyd commented Mar 4, 2019

gfyoung commented Mar 4, 2019 •

edited

Loading

WillAyd commented Jan 19, 2021

ExcelFile class has no attribute 'encoding'. Is it correct? #25523

ExcelFile class has no attribute 'encoding'. Is it correct? #25523

Comments

Dimaskuaskus commented Mar 3, 2019

Code Sample

Problem description

Expected Output

Output pandas/xlrd Error

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Mar 4, 2019

gfyoung commented Mar 4, 2019 • edited Loading

WillAyd commented Jan 19, 2021

Output of `pd.show_versions()`

gfyoung commented Mar 4, 2019 •

edited

Loading