-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Auto Detect engine in read_excel #35416
Comments
Attached a sample file to test. |
Thanks @fzumstein for the report. I'm not sure of the history here, but the release note when the xlsb functionality was added is
so I assume that this was a conscious design choice and therefore labelling as an enhancement. |
Hi @simonjayhawkins, thanks for your reply and I am not familiar with the history either, it just hit me as an inconsistency. |
I don't think the readers infer the appropriate engine based off of the file extension; we historically only ever used xlrd but in the past year or longer have added a lot more without adding auto detection I think would take a PR to auto detect if there's a reasonable way to do it |
@WillAyd I see - what about just refactoring _get_default_writer(ext) into something like |
I think its' separate from that issue number but the code you've linked to is a great reference. We do it on the writer side, so if you would like to implement something on the reader side and push a PR would definitely be appreciated! |
Related to #34252, as that ticket mentions the fact that the settings that are used to auto-detect the engine do not work. Which makes sense as they are not used at all. |
take |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
This works:
This doesn't:
Problem description
All supported Excel formats (
xls
,xlsx
,xlsm
) have a default engine based on the extension, so you can simply do:For
xlsb
, when you dodf = pd.read_excel('Book2.xlsb')
, you get this error:Expected Output
No error, i.e. it should figure out that for
xlsb
extensions,pyxlsb
is the default engine.Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
pandas : 1.0.5
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 49.2.0.post20200714
Cython : None
pytest : 5.4.3
hypothesis : None
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pytest : 5.4.3
pyxlsb : 1.0.6
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : None
The text was updated successfully, but these errors were encountered: