Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Read mutiple excel sheets in single API call #9450

Merged
merged 1 commit into from
Feb 23, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 86 additions & 32 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1949,56 +1949,106 @@ module and use the same parsing code as the above to convert tabular data into
a DataFrame. See the :ref:`cookbook<cookbook.excel>` for some
advanced strategies

Besides ``read_excel`` you can also read Excel files using the ``ExcelFile``
class. The following two commands are equivalent:
Reading Excel Files
~~~~~~~~~~~~~~~~~~~

.. versionadded:: 0.16

``read_excel`` can read more than one sheet, by setting ``sheetname`` to either
a list of sheet names, a list of sheet positions, or ``None`` to read all sheets.

.. versionadded:: 0.13

Sheets can be specified by sheet index or sheet name, using an integer or string,
respectively.

.. versionadded:: 0.12

``ExcelFile`` has been moved to the top level namespace.

There are two approaches to reading an excel file. The ``read_excel`` function
and the ``ExcelFile`` class. ``read_excel`` is for reading one file
with file-specific arguments (ie. identical data formats across sheets).
``ExcelFile`` is for reading one file with sheet-specific arguments (ie. various data
formats across sheets). Choosing the approach is largely a question of
code readability and execution speed.

Equivalent class and function approaches to read a single sheet:

.. code-block:: python

# using the ExcelFile class
xls = pd.ExcelFile('path_to_file.xls')
xls.parse('Sheet1', index_col=None, na_values=['NA'])
data = xls.parse('Sheet1', index_col=None, na_values=['NA'])

# using the read_excel function
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
data = read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])

The class based approach can be used to read multiple sheets or to introspect
the sheet names using the ``sheet_names`` attribute.
Equivalent class and function approaches to read multiple sheets:

.. note::
.. code-block:: python

The prior method of accessing ``ExcelFile`` has been moved from
``pandas.io.parsers`` to the top level namespace starting from pandas
0.12.0.
data = {}
# For when Sheet1's format differs from Sheet2
xls = pd.ExcelFile('path_to_file.xls')
data['Sheet1'] = xls.parse('Sheet1', index_col=None, na_values=['NA'])
data['Sheet2'] = xls.parse('Sheet2', index_col=1)

# For when Sheet1's format is identical to Sheet2
data = read_excel('path_to_file.xls', ['Sheet1','Sheet2'], index_col=None, na_values=['NA'])

Specifying Sheets
+++++++++++++++++
.. _io.specifying_sheets:

.. versionadded:: 0.13
.. note :: The second argument is ``sheetname``, not to be confused with ``ExcelFile.sheet_names``

There are now two ways to read in sheets from an Excel file. You can provide
either the index of a sheet or its name to by passing different values for
``sheet_name``.
.. note :: An ExcelFile's attribute ``sheet_names`` provides access to a list of sheets.

- The arguments ``sheetname`` allows specifying the sheet or sheets to read.
- The default value for ``sheetname`` is 0, indicating to read the first sheet
- Pass a string to refer to the name of a particular sheet in the workbook.
- Pass an integer to refer to the index of a sheet. Indices follow Python
convention, beginning at 0.
- The default value is ``sheet_name=0``. This reads the first sheet.

Using the sheet name:
- Pass a list of either strings or integers, to return a dictionary of specified sheets.
- Pass a ``None`` to return a dictionary of all available sheets.

.. code-block:: python

# Returns a DataFrame
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])

Using the sheet index:

.. code-block:: python

read_excel('path_to_file.xls', 0, index_col=None, na_values=['NA'])
# Returns a DataFrame
read_excel('path_to_file.xls', 0, index_col=None, na_values=['NA'])

Using all default values:

.. code-block:: python

# Returns a DataFrame
read_excel('path_to_file.xls')

Using None to get all sheets:

.. code-block:: python

# Returns a dictionary of DataFrames
read_excel('path_to_file.xls',sheetname=None)

Using a list to get multiple sheets:

.. code-block:: python

# Returns the 1st and 4th sheet, as a dictionary of DataFrames.
read_excel('path_to_file.xls',sheetname=['Sheet1',3])

Parsing Specific Columns
++++++++++++++++++++++++

It is often the case that users will insert columns to do temporary computations
in Excel and you may not want to read in those columns. `read_excel` takes
a `parse_cols` keyword to allow you to specify a subset of columns to parse.
Expand All @@ -2017,26 +2067,30 @@ indices to be parsed.

read_excel('path_to_file.xls', 'Sheet1', parse_cols=[0, 2, 3])

.. note::
Cell Converters
+++++++++++++++

It is possible to transform the contents of Excel cells via the `converters`
option. For instance, to convert a column to boolean:
It is possible to transform the contents of Excel cells via the `converters`
option. For instance, to convert a column to boolean:

.. code-block:: python
.. code-block:: python

read_excel('path_to_file.xls', 'Sheet1', converters={'MyBools': bool})
read_excel('path_to_file.xls', 'Sheet1', converters={'MyBools': bool})

This options handles missing values and treats exceptions in the converters
as missing data. Transformations are applied cell by cell rather than to the
column as a whole, so the array dtype is not guaranteed. For instance, a
column of integers with missing values cannot be transformed to an array
with integer dtype, because NaN is strictly a float. You can manually mask
missing data to recover integer dtype:
This options handles missing values and treats exceptions in the converters
as missing data. Transformations are applied cell by cell rather than to the
column as a whole, so the array dtype is not guaranteed. For instance, a
column of integers with missing values cannot be transformed to an array
with integer dtype, because NaN is strictly a float. You can manually mask
missing data to recover integer dtype:

.. code-block:: python
.. code-block:: python

cfun = lambda x: int(x) if x else -1
read_excel('path_to_file.xls', 'Sheet1', converters={'MyInts': cfun})
cfun = lambda x: int(x) if x else -1
read_excel('path_to_file.xls', 'Sheet1', converters={'MyInts': cfun})

Writing Excel Files
~~~~~~~~~~~~~~~~~~~

To write a DataFrame object to a sheet of an Excel file, you can use the
``to_excel`` instance method. The arguments are largely the same as ``to_csv``
Expand Down
8 changes: 8 additions & 0 deletions doc/source/whatsnew/v0.16.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,14 @@ Enhancements
- Added ``StringMethods.find()`` and ``rfind()`` which behave as the same as standard ``str`` (:issue:`9386`)

- Added ``StringMethods.isnumeric`` and ``isdecimal`` which behave as the same as standard ``str`` (:issue:`9439`)
- The ``read_excel()`` function's :ref:`sheetname <_io.specifying_sheets>` argument now accepts a list and ``None``, to get multiple or all sheets respectively. If more than one sheet is specified, a dictionary is returned. (:issue:`9450`)

.. code-block:: python

# Returns the 1st and 4th sheet, as a dictionary of DataFrames.
pd.read_excel('path_to_file.xls',sheetname=['Sheet1',3])

- A ``verbose`` argument has been augmented in ``io.read_excel()``, defaults to False. Set to True to print sheet names as they are parsed. (:issue:`9450`)
- Added ``StringMethods.ljust()`` and ``rjust()`` which behave as the same as standard ``str`` (:issue:`9352`)
- ``StringMethods.pad()`` and ``center()`` now accept ``fillchar`` option to specify filling character (:issue:`9352`)
- Added ``StringMethods.zfill()`` which behave as the same as standard ``str`` (:issue:`9387`)
Expand Down
Loading