-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update wording about when xlrd engine can be used #38456
Changes from 1 commit
e107c1e
a70795d
d7d8fd3
51706b2
f77ca98
41c14cf
4bc6a08
47e26b5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -105,25 +105,24 @@ | |
Supported engines: "xlrd", "openpyxl", "odf", "pyxlsb". | ||
Engine compatibility : | ||
|
||
- "xlrd" supports most old/new Excel file formats. | ||
- "xlrd" supports old-style Excel files (.xls). | ||
- "openpyxl" supports newer Excel file formats. | ||
- "odf" supports OpenDocument file formats (.odf, .ods, .odt). | ||
- "pyxlsb" supports Binary Excel files. | ||
|
||
.. versionchanged:: 1.2.0 | ||
The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_ | ||
is no longer maintained, and is not supported with | ||
python >= 3.9. When ``engine=None``, the following logic will be | ||
used to determine the engine. | ||
now only supports old-style ``.xls`` files. | ||
When ``engine=None``, the following logic will be | ||
used to determine the engine: | ||
|
||
- If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt), | ||
then `odf <https://pypi.org/project/odfpy/>`_ will be used. | ||
- Otherwise if ``path_or_buffer`` is a bytes stream, the file has the | ||
- Otherwise if the file has the | ||
extension ``.xls``, or is an ``xlrd`` Book instance, then ``xlrd`` will | ||
be used. | ||
- Otherwise if `openpyxl <https://pypi.org/project/openpyxl/>`_ is installed, | ||
then ``openpyxl`` will be used. | ||
- Otherwise ``xlrd`` will be used and a ``FutureWarning`` will be raised. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have not made the code changes related to this line or the change on line 121 (these changes are echoed in the section around line ~900), but that's what I think the approach should be. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, so that will be the approach in the future. But in pandas we decided to first raise a warning about it (in case the user only upgraded pandas, and not xlrd). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While I respect the choice of approach, I can't really condone it. As I said above: I have concerns around security with xlrd<2, and so anything that can be done to stop people using it would be a very good thing. If a user is upgrading such that they have pandas 1.2, that would be a very good time for them to also upgrade to xlrd 2 and whatever the latest openpyxl release is... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Just so I understand the intentions here, are you planning to make those code changes (taking into account @jorisvandenbossche request)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's an open question, see #38424 (comment) |
||
|
||
Specifying ``engine="xlrd"`` will continue to be allowed for the | ||
indefinite future. | ||
|
@@ -920,7 +919,7 @@ class ExcelFile: | |
""" | ||
Class for parsing tabular excel sheets into DataFrame objects. | ||
|
||
Uses xlrd engine by default. See read_excel for more documentation | ||
See read_excel for more documentation | ||
|
||
Parameters | ||
---------- | ||
|
@@ -933,26 +932,25 @@ class ExcelFile: | |
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb`` | ||
Engine compatibility : | ||
|
||
- ``xlrd`` supports most old/new Excel file formats. | ||
- ``xlrd`` old-style Excel files (.xls). | ||
- ``openpyxl`` supports newer Excel file formats. | ||
- ``odf`` supports OpenDocument file formats (.odf, .ods, .odt). | ||
- ``pyxlsb`` supports Binary Excel files. | ||
|
||
.. versionchanged:: 1.2.0 | ||
|
||
The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_ | ||
is no longer maintained, and is not supported with | ||
python >= 3.9. When ``engine=None``, the following logic will be | ||
used to determine the engine. | ||
The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_ | ||
now only supports old-style ``.xls`` files. | ||
When ``engine=None``, the following logic will be | ||
used to determine the engine: | ||
|
||
- If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt), | ||
then `odf <https://pypi.org/project/odfpy/>`_ will be used. | ||
- Otherwise if ``path_or_buffer`` is a bytes stream, the file has the | ||
- Otherwise if the file has the | ||
extension ``.xls``, or is an ``xlrd`` Book instance, then ``xlrd`` | ||
will be used. | ||
- Otherwise if `openpyxl <https://pypi.org/project/openpyxl/>`_ is installed, | ||
then ``openpyxl`` will be used. | ||
- Otherwise ``xlrd`` will be used and a ``FutureWarning`` will be raised. | ||
|
||
Specifying ``engine="xlrd"`` will continue to be allowed for the | ||
indefinite future. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's only the case with the latest xlrd, though, I think?
(while many users will still have xlrd < 2.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but if I could, I would totally enforce the use of only xlrd >= 2.0 now that it is out.
To be explicit: that comes from concerns I have, particularly around security, about code that hasn't been maintained for years being used to interact with formats (zip and xml, both of which are used in xlsx) that have known security issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjw296 yes, but we cann't force anyone to update (and in all likely hood they will simply update pandas and continue using the current version of xlrd).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback - I mean, in terms of implementation, you can: raise an exception if
xlrd.__version__ < '2'
. Given that people will only hit this when they're already upgrading at least one package, ie: pandas, this feels reasonable to me, but pandas is not my project, so while I'd be disappointed at the security risk you're prepared to expose your users to, I'd have to accept it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
believe me users are way more exposed to security risks that this. its hard to immediately just do something which causes current code to break. yes I get your point, but the reverse is true too. people update pandas and not other packages and expect things to work. we generally like to warn if at all possible first.