Skip to content

Commit

Permalink
Backport PR #38456: DOC: update wording about when xlrd engine can be…
Browse files Browse the repository at this point in the history
… used (#38660)

Co-authored-by: Chris Withers <chris@simplistix.co.uk>
  • Loading branch information
meeseeksmachine and cjw296 authored Dec 23, 2020
1 parent 2a4c3c6 commit 9c1efbb
Show file tree
Hide file tree
Showing 3 changed files with 56 additions and 28 deletions.
31 changes: 28 additions & 3 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2820,15 +2820,40 @@ parse HTML tables in the top-level pandas io function ``read_html``.
Excel files
-----------

The :func:`~pandas.read_excel` method can read Excel 2003 (``.xls``)
files using the ``xlrd`` Python module. Excel 2007+ (``.xlsx``) files
can be read using either ``xlrd`` or ``openpyxl``. Binary Excel (``.xlsb``)
The :func:`~pandas.read_excel` method can read Excel 2007+ (``.xlsx``) files
using the ``openpyxl`` Python module. Excel 2003 (``.xls``) files
can be read using ``xlrd``. Binary Excel (``.xlsb``)
files can be read using ``pyxlsb``.
The :meth:`~DataFrame.to_excel` instance method is used for
saving a ``DataFrame`` to Excel. Generally the semantics are
similar to working with :ref:`csv<io.read_csv_table>` data.
See the :ref:`cookbook<cookbook.excel>` for some advanced strategies.

.. warning::

The `xlwt <https://xlwt.readthedocs.io/en/latest/>`__ package for writing old-style ``.xls``
excel files is no longer maintained.
The `xlrd <https://xlrd.readthedocs.io/en/latest/>`__ package is now only for reading
old-style ``.xls`` files.

Previously, the default argument ``engine=None`` to :func:`~pandas.read_excel`
would result in using the ``xlrd`` engine in many cases, including new
Excel 2007+ (``.xlsx``) files.
If `openpyxl <https://openpyxl.readthedocs.io/en/stable/>`__ is installed,
many of these cases will now default to using the ``openpyxl`` engine.
See the :func:`read_excel` documentation for more details.

Thus, it is strongly encouraged to install ``openpyxl`` to read Excel 2007+
(``.xlsx``) files.
**Please do not report issues when using ``xlrd`` to read ``.xlsx`` files.**
This is no longer supported, switch to using ``openpyxl`` instead.

Attempting to use the the ``xlwt`` engine will raise a ``FutureWarning``
unless the option :attr:`io.excel.xls.writer` is set to ``"xlwt"``.
While this option is now deprecated and will also raise a ``FutureWarning``,
it can be globally set and the warning suppressed. Users are recommended to
write ``.xlsx`` files using the ``openpyxl`` engine instead.

.. _io.excel_reader:

Reading Excel files
Expand Down
29 changes: 15 additions & 14 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,22 @@ including other versions of pandas.

.. warning::

The packages `xlrd <https://xlrd.readthedocs.io/en/latest/>`_ for reading excel
files and `xlwt <https://xlwt.readthedocs.io/en/latest/>`_ for
writing excel files are no longer maintained. These are the only engines in pandas
that support the xls format.

Previously, the default argument ``engine=None`` to ``pd.read_excel``
would result in using the ``xlrd`` engine in many cases. If
`openpyxl <https://openpyxl.readthedocs.io/en/stable/>`_ is installed,
The `xlwt <https://xlwt.readthedocs.io/en/latest/>`_ package for writing old-style ``.xls``
excel files is no longer maintained.
The `xlrd <https://xlrd.readthedocs.io/en/latest/>`_ package is now only for reading
old-style ``.xls`` files.

Previously, the default argument ``engine=None`` to :func:`~pandas.read_excel`
would result in using the ``xlrd`` engine in many cases, including new
Excel 2007+ (``.xlsx``) files.
If `openpyxl <https://openpyxl.readthedocs.io/en/stable/>`_ is installed,
many of these cases will now default to using the ``openpyxl`` engine.
See the :func:`read_excel` documentation for more details. Attempting to read
``.xls`` files or specifying ``engine="xlrd"`` to ``pd.read_excel`` will not
raise a warning. However users should be aware that ``xlrd`` is already
broken with certain package configurations, for example with Python 3.9
when `defusedxml <https://github.com/tiran/defusedxml/>`_ is installed, and
is anticipated to be unusable in the future.
See the :func:`read_excel` documentation for more details.

Thus, it is strongly encouraged to install ``openpyxl`` to read Excel 2007+
(``.xlsx``) files.
**Please do not report issues when using ``xlrd`` to read ``.xlsx`` files.**
This is no longer supported, switch to using ``openpyxl`` instead.

Attempting to use the the ``xlwt`` engine will raise a ``FutureWarning``
unless the option :attr:`io.excel.xls.writer` is set to ``"xlwt"``.
Expand Down
24 changes: 13 additions & 11 deletions pandas/io/excel/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,16 +105,16 @@
Supported engines: "xlrd", "openpyxl", "odf", "pyxlsb".
Engine compatibility :
- "xlrd" supports most old/new Excel file formats.
- "xlrd" supports old-style Excel files (.xls).
- "openpyxl" supports newer Excel file formats.
- "odf" supports OpenDocument file formats (.odf, .ods, .odt).
- "pyxlsb" supports Binary Excel files.
.. versionchanged:: 1.2.0
The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_
is no longer maintained, and is not supported with
python >= 3.9. When ``engine=None``, the following logic will be
used to determine the engine.
now only supports old-style ``.xls`` files.
When ``engine=None``, the following logic will be
used to determine the engine:
- If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt),
then `odf <https://pypi.org/project/odfpy/>`_ will be used.
Expand Down Expand Up @@ -920,7 +920,7 @@ class ExcelFile:
"""
Class for parsing tabular excel sheets into DataFrame objects.
Uses xlrd engine by default. See read_excel for more documentation
See read_excel for more documentation
Parameters
----------
Expand All @@ -933,17 +933,17 @@ class ExcelFile:
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``
Engine compatibility :
- ``xlrd`` supports most old/new Excel file formats.
- ``xlrd`` supports old-style Excel files (.xls).
- ``openpyxl`` supports newer Excel file formats.
- ``odf`` supports OpenDocument file formats (.odf, .ods, .odt).
- ``pyxlsb`` supports Binary Excel files.
.. versionchanged:: 1.2.0
The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_
is no longer maintained, and is not supported with
python >= 3.9. When ``engine=None``, the following logic will be
used to determine the engine.
now only supports old-style ``.xls`` files.
When ``engine=None``, the following logic will be
used to determine the engine:
- If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt),
then `odf <https://pypi.org/project/odfpy/>`_ will be used.
Expand All @@ -954,8 +954,10 @@ class ExcelFile:
then ``openpyxl`` will be used.
- Otherwise ``xlrd`` will be used and a ``FutureWarning`` will be raised.
Specifying ``engine="xlrd"`` will continue to be allowed for the
indefinite future.
.. warning::
Please do not report issues when using ``xlrd`` to read ``.xlsx`` files.
This is not supported, switch to using ``openpyxl`` instead.
"""

from pandas.io.excel._odfreader import ODFReader
Expand Down

0 comments on commit 9c1efbb

Please sign in to comment.