Skip to content

Commit

Permalink
Merge pull request #4950 from snth/pairwise
Browse files Browse the repository at this point in the history
Pairwise versions for rolling_cov, ewmcov and expanding_cov
  • Loading branch information
jreback committed Mar 28, 2014
2 parents 1ff776a + 1fcb94e commit aa166bf
Show file tree
Hide file tree
Showing 4 changed files with 295 additions and 177 deletions.
68 changes: 56 additions & 12 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,19 @@ The ``Series`` object has a method ``cov`` to compute covariance between series
Analogously, ``DataFrame`` has a method ``cov`` to compute pairwise covariances
among the series in the DataFrame, also excluding NA/null values.

.. _computation.covariance.caveats:

.. note::

Assuming the missing data are missing at random this results in an estimate
for the covariance matrix which is unbiased. However, for many applications
this estimate may not be acceptable because the estimated covariance matrix
is not guaranteed to be positive semi-definite. This could lead to
estimated correlations having absolute values which are greater than one,
and/or a non-invertible covariance matrix. See `Estimation of covariance
matrices <http://en.wikipedia.org/w/index.php?title=Estimation_of_covariance_matrices>`_
for more details.

.. ipython:: python
frame = DataFrame(randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
Expand Down Expand Up @@ -99,6 +112,12 @@ correlation methods are provided:
All of these are currently computed using pairwise complete observations.

.. note::

Please see the :ref:`caveats <computation.covariance.caveats>` associated
with this method of calculating correlation matrices in the
:ref:`covariance section <computation.covariance>`.

.. ipython:: python
frame = DataFrame(randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
Expand Down Expand Up @@ -325,11 +344,14 @@ Binary rolling moments
two ``Series`` or any combination of ``DataFrame/Series`` or
``DataFrame/DataFrame``. Here is the behavior in each case:

- two ``Series``: compute the statistic for the pairing
- two ``Series``: compute the statistic for the pairing.
- ``DataFrame/Series``: compute the statistics for each column of the DataFrame
with the passed Series, thus returning a DataFrame
- ``DataFrame/DataFrame``: compute statistic for matching column names,
returning a DataFrame
with the passed Series, thus returning a DataFrame.
- ``DataFrame/DataFrame``: by default compute the statistic for matching column
names, returning a DataFrame. If the keyword argument ``pairwise=True`` is
passed then computes the statistic for each pair of columns, returning a
``Panel`` whose ``items`` are the dates in question (see :ref:`the next section
<stats.moments.corr_pairwise>`).

For example:

Expand All @@ -340,20 +362,42 @@ For example:
.. _stats.moments.corr_pairwise:

Computing rolling pairwise correlations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Computing rolling pairwise covariances and correlations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In financial data analysis and other fields it's common to compute correlation
matrices for a collection of time series. More difficult is to compute a
moving-window correlation matrix. This can be done using the
``rolling_corr_pairwise`` function, which yields a ``Panel`` whose ``items``
are the dates in question:
In financial data analysis and other fields it's common to compute covariance
and correlation matrices for a collection of time series. Often one is also
interested in moving-window covariance and correlation matrices. This can be
done by passing the ``pairwise`` keyword argument, which in the case of
``DataFrame`` inputs will yield a ``Panel`` whose ``items`` are the dates in
question. In the case of a single DataFrame argument the ``pairwise`` argument
can even be omitted:

.. note::

Missing values are ignored and each entry is computed using the pairwise
complete observations. Please see the :ref:`covariance section
<computation.covariance>` for :ref:`caveats
<computation.covariance.caveats>` associated with this method of
calculating covariance and correlation matrices.

.. ipython:: python
correls = rolling_corr_pairwise(df, 50)
covs = rolling_cov(df[['B','C','D']], df[['A','B','C']], 50, pairwise=True)
covs[df.index[-50]]
.. ipython:: python
correls = rolling_corr(df, 50)
correls[df.index[-50]]
.. note::

Prior to version 0.14 this was available through ``rolling_corr_pairwise``
which is now simply syntactic sugar for calling ``rolling_corr(...,
pairwise=True)`` and deprecated. This is likely to be removed in a future
release.

You can efficiently retrieve the time series of correlations between two
columns using ``ix`` indexing:

Expand Down
13 changes: 13 additions & 0 deletions doc/source/v0.14.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,19 @@ These are out-of-bounds selections

Because of the default `align` value changes, coordinates of bar plots are now located on integer values (0.0, 1.0, 2.0 ...). This is intended to make bar plot be located on the same coodinates as line plot. However, bar plot may differs unexpectedly when you manually adjust the bar location or drawing area, such as using `set_xlim`, `set_ylim`, etc. In this cases, please modify your script to meet with new coordinates.

- ``pairwise`` keyword was added to the statistical moment functions
``rolling_cov``, ``rolling_corr``, ``ewmcov``, ``ewmcorr``,
``expanding_cov``, ``expanding_corr`` to allow the calculation of moving
window covariance and correlation matrices (:issue:`4950`). See
:ref:`Computing rolling pairwise covariances and correlations
<stats.moments.corr_pairwise>` in the docs.

.. ipython:: python

df = DataFrame(np.random.randn(10,4),columns=list('ABCD'))
covs = rolling_cov(df[['A','B','C']], df[['B','C','D']], 5, pairwise=True)
covs[df.index[-1]]


MultiIndexing Using Slicers
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
Loading

0 comments on commit aa166bf

Please sign in to comment.