You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importosimportpandasaspddirectory='/Data'# more than 300 filesford, dirs, filesinos.walk(directory):
# i can not use read_excel function because i do not know how many sheets in each xls# i use instance of ExcelFile# in fact it returns xlrd.Book instance (because engine: Acceptable values are None or ``xlrd``)trycurrent_file=pd.ExcelFile(os.path.join(d,f))
#do not work. My files use cp1251 charset. See more in outputexceptUnicodeDecodeError:
passsheet_names_=current_file.sheet_names()
forsnincurrent_file.sheet_names():
# this Data Frame i shoud parse and import to my Pstgresql DBcurrent_file.parse(sheet_name=sn)
Problem description
I have a lot of XLS files encoded with cp1251. Each file contains 1 or more sheets. I need to import them into my Postgesql DB. But Pandas can not do it simple. I can load one sheet (list of, but i don`t have it) at a time in any encoding using pandas.read_excel(file,encoding=my_encoding). Or I can load all sheets from all files in UTF encoding using instance of pandas.ExcelFile(file). It looks like some bad recursion.
I can use xlrd to read my files with specific charset to get xlrd.Book instance. Then pass it to pandas.read_excel(). But I do not like this decision.
If pandas.ExcelFile() would have an attribute of the encoding, then it could be passed to any "engine". I saw a new version of excel package. The same problem.
(I dont have MS Excel and can not create "bad" file for test)
Sorry for my english(((
Expected Output
Output pandas/xlrd Error
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'
Traceback (most recent call last):
File "TableLoader.py", line 13, in
c = ExelLoader()
File "TableLoader.py", line 10, in ExelLoader
my_xls_x = pandas.read_excel('Data/qq.xls')
File "/home/dz/.local/lib/python3.6/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/home/dz/.local/lib/python3.6/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/home/dz/.local/lib/python3.6/site-packages/pandas/io/excel.py", line 350, in read_excel
io = ExcelFile(io, engine=engine)
File "/home/dz/.local/lib/python3.6/site-packages/pandas/io/excel.py", line 653, in init
self._reader = self._enginesengine
File "/home/dz/.local/lib/python3.6/site-packages/pandas/io/excel.py", line 424, in init
self.book = xlrd.open_workbook(filepath_or_buffer)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/init.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/book.py", line 120, in open_workbook_xls
bk.get_sheets()
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/book.py", line 723, in get_sheets
self.get_sheet(sheetno)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/book.py", line 714, in get_sheet
sh.read(self)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/sheet.py", line 820, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/biffh.py", line 250, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/timemachine.py", line 31, in
unicode = lambda b, enc: b.decode(enc)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 0: ordinal not in range(128)
@WillAyd : I would be inclined to call this a duplicate in fact, since the underlying Enhancement request is the same, even though the premise is different.
Code Sample
Problem description
I have a lot of XLS files encoded with cp1251. Each file contains 1 or more sheets. I need to import them into my Postgesql DB. But Pandas can not do it simple. I can load one sheet (list of, but i don`t have it) at a time in any encoding using pandas.read_excel(file,encoding=my_encoding). Or I can load all sheets from all files in UTF encoding using instance of pandas.ExcelFile(file). It looks like some bad recursion.
I can use xlrd to read my files with specific charset to get xlrd.Book instance. Then pass it to pandas.read_excel(). But I do not like this decision.
If pandas.ExcelFile() would have an attribute of the encoding, then it could be passed to any "engine". I saw a new version of excel package. The same problem.
(I dont have MS Excel and can not create "bad" file for test)
Sorry for my english(((
Expected Output
Output pandas/xlrd Error
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'
Traceback (most recent call last):
File "TableLoader.py", line 13, in
c = ExelLoader()
File "TableLoader.py", line 10, in ExelLoader
my_xls_x = pandas.read_excel('Data/qq.xls')
File "/home/dz/.local/lib/python3.6/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/home/dz/.local/lib/python3.6/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/home/dz/.local/lib/python3.6/site-packages/pandas/io/excel.py", line 350, in read_excel
io = ExcelFile(io, engine=engine)
File "/home/dz/.local/lib/python3.6/site-packages/pandas/io/excel.py", line 653, in init
self._reader = self._enginesengine
File "/home/dz/.local/lib/python3.6/site-packages/pandas/io/excel.py", line 424, in init
self.book = xlrd.open_workbook(filepath_or_buffer)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/init.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/book.py", line 120, in open_workbook_xls
bk.get_sheets()
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/book.py", line 723, in get_sheets
self.get_sheet(sheetno)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/book.py", line 714, in get_sheet
sh.read(self)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/sheet.py", line 820, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/biffh.py", line 250, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
File "/home/dz/.local/lib/python3.6/site-packages/xlrd/timemachine.py", line 31, in
unicode = lambda b, enc: b.decode(enc)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 0: ordinal not in range(128)
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.7.final.0
python-bits: 32
OS: Linux
OS-release: 4.15.0-45-generic
machine: i686
processor: i686
byteorder: little
LC_ALL: None
LANG: ru_RU.UTF-8
LOCALE: ru_RU.UTF-8
pandas: 0.24.1
pytest: None
pip: 9.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: