Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xlrd no longer supports xlsx, unhelpful error #2823

Closed
kajuberdut opened this issue Dec 31, 2020 · 4 comments · Fixed by #2851
Closed

xlrd no longer supports xlsx, unhelpful error #2823

kajuberdut opened this issue Dec 31, 2020 · 4 comments · Fixed by #2851
Assignees
Milestone

Comments

@kajuberdut
Copy link

kajuberdut commented Dec 31, 2020

Without pinning an old version of xlrd, fread on an xlsx file will report "AttributeError: module 'xlrd' has no attribute 'xlsx'"

  • How to reproduce the bug?
import datatable as dt


path = "some xlsx path"
return dt.fread(path)

  • What was the expected behavior?
    An error specifying that xlrd must be version 1.2.0 to read xlsx files (like the error if xlrd is not installed).

  • Environment:
    python 3.8.3, xlrd 2.0.0, datatable 0.11.1

Suggested workaround: install xlrd==1.2.0

@kajuberdut
Copy link
Author

This is a recent change (December 2020) and perhaps implies that some other library should be used for loading .xlsx files in future versions of datatable.
https://xlrd.readthedocs.io/en/latest/changes.html

xlrd's documentation provides a link to the following for suggestions: http://www.python-excel.org/

@samukweku
Copy link
Contributor

I feel openpyxl should be used instead, as it is quite well maintained

@st-pasha
Copy link
Contributor

st-pasha commented Jan 6, 2021

That's not such a clear-cut issue as it seems. There have been a similar switch in pandas from xlrd to openpyxl, and some user reported 10x slow-down with openpyxl engine as compared to xlrd.

I have a very limited set of excel files for testing, the largest file only 2Mb, but even that goes from 4.7s to 5.9s in pandas when switching to openpyxl. I presume there could be cases of bigger files (perhaps irregularly shaped) where openpyxl gets much much slower -- I'd love to see those examples.
But frankly, even for xlrd the speed is far to be desired -- 5s for 2Mb, really?

@st-pasha
Copy link
Contributor

st-pasha commented Jan 7, 2021

For future reference, this seems to be a decent outline of XLSX's format: https://github.com/TeamworkGuy2/xlsx-spec-models/blob/master/open-xml.d.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants