Skip to content

Commit

Permalink
API: df.rolling(..).corr()/cov() when pairwise=True to return MI Data…
Browse files Browse the repository at this point in the history
…Frame

xref pandas-dev#15601
  • Loading branch information
jreback committed Mar 13, 2017
1 parent 32df1e6 commit db9f2c0
Show file tree
Hide file tree
Showing 4 changed files with 291 additions and 211 deletions.
20 changes: 15 additions & 5 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -505,13 +505,18 @@ two ``Series`` or any combination of ``DataFrame/Series`` or
- ``DataFrame/DataFrame``: by default compute the statistic for matching column
names, returning a DataFrame. If the keyword argument ``pairwise=True`` is
passed then computes the statistic for each pair of columns, returning a
``Panel`` whose ``items`` are the dates in question (see :ref:`the next section
``MultiIndexed DataFrame`` whose ``index`` are the dates in question (see :ref:`the next section
<stats.moments.corr_pairwise>`).

For example:

.. ipython:: python
df = pd.DataFrame(np.random.randn(1000, 4),
index=pd.date_range('1/1/2000', periods=1000),
columns=['A', 'B', 'C', 'D'])
df = df.cumsum()
df2 = df[:20]
df2.rolling(window=5).corr(df2['B'])
Expand All @@ -520,11 +525,16 @@ For example:
Computing rolling pairwise covariances and correlations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. warning::

Prior to version 0.20.0 if ``pairwise=True`` was passed, a ``Panel`` would be returned.
This will now return a 2-level MultiIndexed DataFrame, see the whatsnew :ref:`here <whatsnew_0200.api_breaking.rolling_pairwise>`

In financial data analysis and other fields it's common to compute covariance
and correlation matrices for a collection of time series. Often one is also
interested in moving-window covariance and correlation matrices. This can be
done by passing the ``pairwise`` keyword argument, which in the case of
``DataFrame`` inputs will yield a ``Panel`` whose ``items`` are the dates in
``DataFrame`` inputs will yield a ``MultiIndexed DataFrame`` whose ``index`` are the dates in
question. In the case of a single DataFrame argument the ``pairwise`` argument
can even be omitted:

Expand All @@ -539,12 +549,12 @@ can even be omitted:
.. ipython:: python
covs = df[['B','C','D']].rolling(window=50).cov(df[['A','B','C']], pairwise=True)
covs[df.index[-50]]
covs.iloc[-50].unstack()
.. ipython:: python
correls = df.rolling(window=50).corr()
correls[df.index[-50]]
correls.iloc[-50].unstack()
You can efficiently retrieve the time series of correlations between two
columns using ``.loc`` indexing:
Expand All @@ -557,7 +567,7 @@ columns using ``.loc`` indexing:
.. ipython:: python
@savefig rolling_corr_pairwise_ex.png
correls.loc[:, 'A', 'C'].plot()
correls[('A', 'C')].plot()
.. _stats.aggregate:

Expand Down
44 changes: 44 additions & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Highlights include:
- The ``.ix`` indexer has been deprecated, see :ref:`here <whatsnew_0200.api_breaking.deprecate_ix>`
- Switched the test framework to `pytest`_ (:issue:`13097`)
- A new orient for JSON serialization, ``orient='table'``, that uses the Table Schema spec, see :ref:`here <whatsnew_0200.enhancements.table_schema>`
- Window Binary Corr/Cov operations return a MultiIndex DataFrame rather than a Panel, see :ref:`here <whhatsnew_0200.api_breaking.rolling_pairwise>`


.. _pytest: http://doc.pytest.org/en/latest/

Expand Down Expand Up @@ -644,6 +646,48 @@ New Behavior:

df.groupby('A').agg([np.mean, np.std, np.min, np.max])

.. _whatsnew_0200.api_breaking.rolling_pairwise:

Window Binary Corr/Cov operations return a MultiIndex DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A binary window operation, like ``.corr()`` or ``.cov()``, when operating on a ``.rolling(..)``, ``.expanding(..)``, or ``.ewm(..)`` object,
will now return a 2-level ``MultiIndexed DataFrame`` rather than a ``Panel``. These are equivalent in function,
but MultiIndexed DataFrames enjoy more support in pandas.
See the section on :ref:`Windowed Binary Operations <stats.moments.binary>` for more information. (:issue:`15677`)

.. ipython:: python

np.random.seed(1234)
df = DataFrame(np.random.rand(100, 2))
df

Old Behavior:

.. code-block:: ipython

In [28]: df.rolling(12).corr()
Out[28]:
<class 'pandas.core.panel.Panel'>
Dimensions: 100 (items) x 2 (major_axis) x 2 (minor_axis)
Items axis: 0 to 99
Major_axis axis: 0 to 1
Minor_axis axis: 0 to 1

New Behavior:

.. ipython:: python

res = df.rolling(12).corr()
res

Retrieving a correlation matrix for a specified index

.. ipython:: python

res.iloc[-1].unstack()


.. _whatsnew_0200.api_breaking.hdfstore_where:

HDFStore where string comparison
Expand Down
12 changes: 10 additions & 2 deletions pandas/core/window.py
Original file line number Diff line number Diff line change
Expand Up @@ -1651,7 +1651,8 @@ def _cov(x, y):


def _flex_binary_moment(arg1, arg2, f, pairwise=False):
from pandas import Series, DataFrame, Panel
from pandas import Series, DataFrame

if not (isinstance(arg1, (np.ndarray, Series, DataFrame)) and
isinstance(arg2, (np.ndarray, Series, DataFrame))):
raise TypeError("arguments to moment function must be of type "
Expand Down Expand Up @@ -1702,12 +1703,19 @@ def dataframe_from_int_dict(data, frame_template):
else:
results[i][j] = f(*_prep_binary(arg1.iloc[:, i],
arg2.iloc[:, j]))

from pandas import Panel
p = Panel.from_dict(results).swapaxes('items', 'major')
if len(p.major_axis) > 0:
p.major_axis = arg1.columns[p.major_axis]
if len(p.minor_axis) > 0:
p.minor_axis = arg2.columns[p.minor_axis]
return p

result = (p.to_frame(filter_observations=False)
.T
)
return result

else:
raise ValueError("'pairwise' is not True/False")
else:
Expand Down
Loading

0 comments on commit db9f2c0

Please sign in to comment.