Skip to content

Commit

Permalink
Merge branch 'master' into excel_style
Browse files Browse the repository at this point in the history
  • Loading branch information
jnothman committed Apr 8, 2017
2 parents 306eebe + d60f490 commit a1127f6
Show file tree
Hide file tree
Showing 63 changed files with 5,486 additions and 3,550 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ python: 3.5
# set NOCACHE-true
# To delete caches go to https://travis-ci.org/OWNER/REPOSITORY/caches or run
# travis cache --delete inside the project directory from the travis command line client
# The cash directories will be deleted if anything in ci/ changes in a commit
# The cache directories will be deleted if anything in ci/ changes in a commit
cache:
ccache: true
directories:
Expand Down
5 changes: 4 additions & 1 deletion asv_bench/benchmarks/timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,10 @@ def setup(self):
self.rng3 = date_range(start='1/1/2000', periods=1500000, freq='S')
self.ts3 = Series(1, index=self.rng3)

def time_sort_index(self):
def time_sort_index_monotonic(self):
self.ts2.sort_index()

def time_sort_index_non_monotonic(self):
self.ts.sort_index()

def time_timeseries_slice_minutely(self):
Expand Down
1 change: 1 addition & 0 deletions ci/requirements-3.5_DOC.run
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ sqlalchemy
numexpr
bottleneck
statsmodels
xarray
pyqt=4.11.4
65 changes: 35 additions & 30 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ data with an arbitrary number of dimensions in lower dimensional data
structures like Series (1d) and DataFrame (2d).

In this section, we will show what exactly we mean by "hierarchical" indexing
and how it integrates with the all of the pandas indexing functionality
and how it integrates with all of the pandas indexing functionality
described above and in prior sections. Later, when discussing :ref:`group by
<groupby>` and :ref:`pivoting and reshaping data <reshaping>`, we'll show
non-trivial applications to illustrate how it aids in structuring data for
Expand Down Expand Up @@ -136,7 +136,7 @@ can find yourself working with hierarchically-indexed data without creating a
may wish to generate your own ``MultiIndex`` when preparing the data set.

Note that how the index is displayed by be controlled using the
``multi_sparse`` option in ``pandas.set_printoptions``:
``multi_sparse`` option in ``pandas.set_options()``:

.. ipython:: python
Expand Down Expand Up @@ -175,35 +175,40 @@ completely analogous way to selecting a column in a regular DataFrame:
See :ref:`Cross-section with hierarchical index <advanced.xs>` for how to select
on a deeper level.

.. note::
.. _advanced.shown_levels:

Defined Levels
~~~~~~~~~~~~~~

The repr of a ``MultiIndex`` shows ALL the defined levels of an index, even
if the they are not actually used. When slicing an index, you may notice this.
For example:

The repr of a ``MultiIndex`` shows ALL the defined levels of an index, even
if the they are not actually used. When slicing an index, you may notice this.
For example:
.. ipython:: python
.. ipython:: python
# original multi-index
df.columns
# original multi-index
df.columns
# sliced
df[['foo','qux']].columns
# sliced
df[['foo','qux']].columns
This is done to avoid a recomputation of the levels in order to make slicing
highly performant. If you want to see the actual used levels.

This is done to avoid a recomputation of the levels in order to make slicing
highly performant. If you want to see the actual used levels.
.. ipython:: python
.. ipython:: python
df[['foo','qux']].columns.values
df[['foo','qux']].columns.values
# for a specific level
df[['foo','qux']].columns.get_level_values(0)
# for a specific level
df[['foo','qux']].columns.get_level_values(0)
To reconstruct the multiindex with only the used levels

To reconstruct the multiindex with only the used levels
.. versionadded:: 0.20.0

.. ipython:: python
.. ipython:: python
pd.MultiIndex.from_tuples(df[['foo','qux']].columns.values)
df[['foo','qux']].columns.remove_unused_levels()
Data alignment and using ``reindex``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -288,7 +293,7 @@ As usual, **both sides** of the slicers are included as this is label indexing.

.. code-block:: python
df.loc[(slice('A1','A3'),.....),:]
df.loc[(slice('A1','A3'),.....), :]
rather than this:

Expand Down Expand Up @@ -317,51 +322,51 @@ Basic multi-index slicing using slices, lists, and labels.

.. ipython:: python
dfmi.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:]
dfmi.loc[(slice('A1','A3'), slice(None), ['C1', 'C3']), :]
You can use a ``pd.IndexSlice`` to have a more natural syntax using ``:`` rather than using ``slice(None)``

.. ipython:: python
idx = pd.IndexSlice
dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
dfmi.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']]
It is possible to perform quite complicated selections using this method on multiple
axes at the same time.

.. ipython:: python
dfmi.loc['A1',(slice(None),'foo')]
dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
dfmi.loc['A1', (slice(None), 'foo')]
dfmi.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']]
Using a boolean indexer you can provide selection related to the *values*.

.. ipython:: python
mask = dfmi[('a','foo')]>200
dfmi.loc[idx[mask,:,['C1','C3']],idx[:,'foo']]
mask = dfmi[('a', 'foo')] > 200
dfmi.loc[idx[mask, :, ['C1', 'C3']], idx[:, 'foo']]
You can also specify the ``axis`` argument to ``.loc`` to interpret the passed
slicers on a single axis.

.. ipython:: python
dfmi.loc(axis=0)[:,:,['C1','C3']]
dfmi.loc(axis=0)[:, :, ['C1', 'C3']]
Furthermore you can *set* the values using these methods

.. ipython:: python
df2 = dfmi.copy()
df2.loc(axis=0)[:,:,['C1','C3']] = -10
df2.loc(axis=0)[:, :, ['C1', 'C3']] = -10
df2
You can use a right-hand-side of an alignable object as well.

.. ipython:: python
df2 = dfmi.copy()
df2.loc[idx[:,:,['C1','C3']],:] = df2*1000
df2.loc[idx[:, :, ['C1', 'C3']], :] = df2 * 1000
df2
.. _advanced.xs:
Expand Down
1 change: 1 addition & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1432,6 +1432,7 @@ MultiIndex Components
MultiIndex.droplevel
MultiIndex.swaplevel
MultiIndex.reorder_levels
MultiIndex.remove_unused_levels

.. _api.datetimeindex:

Expand Down
22 changes: 16 additions & 6 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -505,13 +505,18 @@ two ``Series`` or any combination of ``DataFrame/Series`` or
- ``DataFrame/DataFrame``: by default compute the statistic for matching column
names, returning a DataFrame. If the keyword argument ``pairwise=True`` is
passed then computes the statistic for each pair of columns, returning a
``Panel`` whose ``items`` are the dates in question (see :ref:`the next section
``MultiIndexed DataFrame`` whose ``index`` are the dates in question (see :ref:`the next section
<stats.moments.corr_pairwise>`).

For example:

.. ipython:: python
df = pd.DataFrame(np.random.randn(1000, 4),
index=pd.date_range('1/1/2000', periods=1000),
columns=['A', 'B', 'C', 'D'])
df = df.cumsum()
df2 = df[:20]
df2.rolling(window=5).corr(df2['B'])
Expand All @@ -520,11 +525,16 @@ For example:
Computing rolling pairwise covariances and correlations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. warning::

Prior to version 0.20.0 if ``pairwise=True`` was passed, a ``Panel`` would be returned.
This will now return a 2-level MultiIndexed DataFrame, see the whatsnew :ref:`here <whatsnew_0200.api_breaking.rolling_pairwise>`

In financial data analysis and other fields it's common to compute covariance
and correlation matrices for a collection of time series. Often one is also
interested in moving-window covariance and correlation matrices. This can be
done by passing the ``pairwise`` keyword argument, which in the case of
``DataFrame`` inputs will yield a ``Panel`` whose ``items`` are the dates in
``DataFrame`` inputs will yield a MultiIndexed ``DataFrame`` whose ``index`` are the dates in
question. In the case of a single DataFrame argument the ``pairwise`` argument
can even be omitted:

Expand All @@ -539,15 +549,15 @@ can even be omitted:
.. ipython:: python
covs = df[['B','C','D']].rolling(window=50).cov(df[['A','B','C']], pairwise=True)
covs[df.index[-50]]
covs.loc['2002-09-22':]
.. ipython:: python
correls = df.rolling(window=50).corr()
correls[df.index[-50]]
correls.loc['2002-09-22':]
You can efficiently retrieve the time series of correlations between two
columns using ``.loc`` indexing:
columns by reshaping and indexing:

.. ipython:: python
:suppress:
Expand All @@ -557,7 +567,7 @@ columns using ``.loc`` indexing:
.. ipython:: python
@savefig rolling_corr_pairwise_ex.png
correls.loc[:, 'A', 'C'].plot()
correls.unstack(1)[('A', 'C')].plot()
.. _stats.aggregate:

Expand Down
55 changes: 55 additions & 0 deletions doc/source/dsintro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -763,6 +763,11 @@ completion mechanism so they can be tab-completed:
Panel
-----

.. warning::

In 0.20.0, ``Panel`` is deprecated and will be removed in
a future version. See the section :ref:`Deprecate Panel <dsintro.deprecate_panel>`.

Panel is a somewhat less-used, but still important container for 3-dimensional
data. The term `panel data <http://en.wikipedia.org/wiki/Panel_data>`__ is
derived from econometrics and is partially responsible for the name pandas:
Expand All @@ -783,6 +788,7 @@ From 3D ndarray with optional axis labels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. ipython:: python
:okwarning:
wp = pd.Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'],
major_axis=pd.date_range('1/1/2000', periods=5),
Expand All @@ -794,6 +800,7 @@ From dict of DataFrame objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. ipython:: python
:okwarning:
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
Expand All @@ -816,6 +823,7 @@ dictionary of DataFrames as above, and the following named parameters:
For example, compare to the construction above:

.. ipython:: python
:okwarning:
pd.Panel.from_dict(data, orient='minor')
Expand All @@ -824,6 +832,7 @@ DataFrame objects with mixed-type columns, all of the data will get upcasted to
``dtype=object`` unless you pass ``orient='minor'``:

.. ipython:: python
:okwarning:
df = pd.DataFrame({'a': ['foo', 'bar', 'baz'],
'b': np.random.randn(3)})
Expand Down Expand Up @@ -851,6 +860,7 @@ This method was introduced in v0.7 to replace ``LongPanel.to_long``, and convert
a DataFrame with a two-level index to a Panel.

.. ipython:: python
:okwarning:
midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], labels=[[1,1,0,0],[1,0,1,0]])
df = pd.DataFrame({'A' : [1, 2, 3, 4], 'B': [5, 6, 7, 8]}, index=midx)
Expand Down Expand Up @@ -880,6 +890,7 @@ A Panel can be rearranged using its ``transpose`` method (which does not make a
copy by default unless the data are heterogeneous):

.. ipython:: python
:okwarning:
wp.transpose(2, 0, 1)
Expand Down Expand Up @@ -909,6 +920,7 @@ Squeezing
Another way to change the dimensionality of an object is to ``squeeze`` a 1-len object, similar to ``wp['Item1']``

.. ipython:: python
:okwarning:
wp.reindex(items=['Item1']).squeeze()
wp.reindex(items=['Item1'], minor=['B']).squeeze()
Expand All @@ -923,12 +935,55 @@ for more on this. To convert a Panel to a DataFrame, use the ``to_frame``
method:

.. ipython:: python
:okwarning:
panel = pd.Panel(np.random.randn(3, 5, 4), items=['one', 'two', 'three'],
major_axis=pd.date_range('1/1/2000', periods=5),
minor_axis=['a', 'b', 'c', 'd'])
panel.to_frame()
.. _dsintro.deprecate_panel:

Deprecate Panel
---------------

Over the last few years, pandas has increased in both breadth and depth, with new features,
datatype support, and manipulation routines. As a result, supporting efficient indexing and functional
routines for ``Series``, ``DataFrame`` and ``Panel`` has contributed to an increasingly fragmented and
difficult-to-understand codebase.

The 3-D structure of a ``Panel`` is much less common for many types of data analysis,
than the 1-D of the ``Series`` or the 2-D of the ``DataFrame``. Going forward it makes sense for
pandas to focus on these areas exclusively.

Oftentimes, one can simply use a MultiIndex ``DataFrame`` for easily working with higher dimensional data.

In additon, the ``xarray`` package was built from the ground up, specifically in order to
support the multi-dimensional analysis that is one of ``Panel`` s main usecases.
`Here is a link to the xarray panel-transition documentation <http://xarray.pydata.org/en/stable/pandas.html#panel-transition>`__.

.. ipython:: python
:okwarning:
p = tm.makePanel()
p
Convert to a MultiIndex DataFrame

.. ipython:: python
:okwarning:
p.to_frame()
Alternatively, one can convert to an xarray ``DataArray``.

.. ipython:: python
p.to_xarray()
You can see the full-documentation for the `xarray package <http://xarray.pydata.org/en/stable/>`__.

.. _dsintro.panelnd:
.. _dsintro.panel4d:

Expand Down
4 changes: 2 additions & 2 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Different Choices for Indexing
.. versionadded:: 0.11.0

Object selection has had a number of user-requested additions in order to
support more explicit location based indexing. pandas now supports three types
support more explicit location based indexing. Pandas now supports three types
of multi-axis indexing.

- ``.loc`` is primarily label based, but may also be used with a boolean array. ``.loc`` will raise ``KeyError`` when the items are not found. Allowed inputs are:
Expand Down Expand Up @@ -401,7 +401,7 @@ Selection By Position
This is sometimes called ``chained assignment`` and should be avoided.
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`

pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely python and numpy slicing. These are ``0-based`` indexing. When slicing, the start bounds is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise a ``IndexError``.
Pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely python and numpy slicing. These are ``0-based`` indexing. When slicing, the start bounds is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise an ``IndexError``.

The ``.iloc`` attribute is the primary access method. The following are valid inputs:

Expand Down
Loading

0 comments on commit a1127f6

Please sign in to comment.