Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel Document Passing Kwargs to Engine #25723

Closed
Hugovdberg opened this issue Mar 14, 2019 · 11 comments · Fixed by #26465
Closed

Excel Document Passing Kwargs to Engine #25723

Hugovdberg opened this issue Mar 14, 2019 · 11 comments · Fixed by #26465
Labels

Comments

@Hugovdberg
Copy link

Code Sample, a copy-pastable example if possible

xls = pandas.ExcelFile(path, on_demand = True)
sheets = xls.sheet_names

Problem description

This code used to work, since extra keyword arguments were passed to xlrd.open_workbook. As pd.ExcelFile is now decoupled from xlrd this behaviour was removed. It now returns an error:

TypeError: __init__() got an unexpected keyword argument 'on_demand'

In a way this is related to #25523, but this is a more general issue, and might or might not solve the problem in the other issue.

Expected Output

No error, and the keyword arguments respected. I understand that when other engines are implemented the on_demand argument might not make sense, but they might take similar arguments not explicitly made available in the interface of pandas.ExcelFile.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.1
pytest: 4.2.0
pip: 19.0.1
setuptools: 40.7.3
Cython: 0.29.5
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.4
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.14
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.2.17
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@TomAugspurger
Copy link
Contributor

cc @WillAyd do you recall when this changed?

@Hugovdberg does passing that to ExelFile.parse work instead?

@Hugovdberg
Copy link
Author

@TomAugspurger on_demand is used to open a connection to the file without loading all data in the file, which is the default behaviour of xlrd. The example I posted was from this question on StackExchange: https://stackoverflow.com/q/12250024/4398595.

This appears to have changed in version 0.24, when ExcelFile was decoupled from the xlrd implementation.

@TomAugspurger
Copy link
Contributor

Thanks for the clarification. So using .parse won't work because pandas has already opened the file?

In that case, is opening the file yourself with on_demand=True and passing that to pandas.ExcelFile an option?

It looks somewhat difficult to pass kwargs through to the opening of the file.

@Hugovdberg
Copy link
Author

I would think that the current implementation could pass the kwargs straight to the reader implementation, which could then handle the arguments as necessary. In this case I think it would suffice to add **kwds on lines 380, 422, 424, 642, and 653. Future readers should then also accept a **kwds argument, and handle those as necessary (possibly throwing an error that the engine doesn't accept additional arguments).

@WillAyd
Copy link
Member

WillAyd commented Mar 14, 2019 via email

@WillAyd
Copy link
Member

WillAyd commented Mar 14, 2019

Right so this was intentionally changed in #24423. The keywords before were silently ignored so this actually had no effect - the error now is intentional and being explicit about this.

As mentioned can you try something like:

book = xlrd.open_workbook('your_book.xlsx', on_demand=True)
with pd.ExcelFile(book) as the_file:
    ...

That should actually do something with the keyword

@TomAugspurger
Copy link
Contributor

Thanks for verifying @WillAyd. I agree that if you need control over how the file is opened, then opening it yourself and supplying the opened object to ExcelFile is the best option.

@Hugovdberg would you be interested in submitting a PR explaining that at http://pandas-docs.github.io/pandas-docs-travis/user_guide/io.html?highlight=excelfile?

@TomAugspurger TomAugspurger added Docs IO Excel read_excel, to_excel labels Mar 14, 2019
@WillAyd WillAyd changed the title Restore ExcelFile **kwds Excel Document Passing Kwargs to Engine Mar 19, 2019
@codehard123
Copy link

Can I work on this?

@codehard123
Copy link

In master branch there is no excel.py so where to make changes?

@codehard123
Copy link

Everything's Normal! Earlier (before pandas version 0.24) you could pass any number of parameters into the ExcelFile() function without any error as its function signature is (io,**kwds), which allows any number of parameters to be passed. However, with the new release of pandas they have changed the function signature to (io,engine=None), which allows only two arguments passed, which justifies the error.

@WillAyd
Copy link
Member

WillAyd commented Apr 9, 2019

@codehard123 the purpose of this issue is to add documentation around how to pass keyword arguments to a particular engine. If interested you could use the information above and add to the section linked by @TomAugspurger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants