-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xlrd no longer supports xlsx, unhelpful error #2823
Comments
This is a recent change (December 2020) and perhaps implies that some other library should be used for loading .xlsx files in future versions of datatable. xlrd's documentation provides a link to the following for suggestions: http://www.python-excel.org/ |
I feel openpyxl should be used instead, as it is quite well maintained |
That's not such a clear-cut issue as it seems. There have been a similar switch in pandas from xlrd to openpyxl, and some user reported 10x slow-down with openpyxl engine as compared to xlrd. I have a very limited set of excel files for testing, the largest file only 2Mb, but even that goes from 4.7s to 5.9s in pandas when switching to openpyxl. I presume there could be cases of bigger files (perhaps irregularly shaped) where openpyxl gets much much slower -- I'd love to see those examples. |
For future reference, this seems to be a decent outline of XLSX's format: https://github.com/TeamworkGuy2/xlsx-spec-models/blob/master/open-xml.d.ts |
Without pinning an old version of xlrd, fread on an xlsx file will report "AttributeError: module 'xlrd' has no attribute 'xlsx'"
What was the expected behavior?
An error specifying that xlrd must be version 1.2.0 to read xlsx files (like the error if xlrd is not installed).
Environment:
python 3.8.3, xlrd 2.0.0, datatable 0.11.1
Suggested workaround: install xlrd==1.2.0
The text was updated successfully, but these errors were encountered: