Merge branch 'master' into excel_style

pcluo · Apr 8, 2017 · a1127f6 · a1127f6
2 parents 306eebe + d60f490
commit a1127f6
Show file tree

Hide file tree

Showing 63 changed files with 5,486 additions and 3,550 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -7,7 +7,7 @@ python: 3.5
 # set NOCACHE-true
 # To delete caches go to https://travis-ci.org/OWNER/REPOSITORY/caches or run
 # travis cache --delete inside the project directory from the travis command line client
-# The cash directories will be deleted if anything in ci/ changes in a commit
+# The cache directories will be deleted if anything in ci/ changes in a commit
 cache:
  ccache: true
  directories:

diff --git a/asv_bench/benchmarks/timeseries.py b/asv_bench/benchmarks/timeseries.py
@@ -292,7 +292,10 @@ def setup(self):
         self.rng3 = date_range(start='1/1/2000', periods=1500000, freq='S')
         self.ts3 = Series(1, index=self.rng3)
 
-    def time_sort_index(self):
+    def time_sort_index_monotonic(self):
+        self.ts2.sort_index()
+
+    def time_sort_index_non_monotonic(self):
         self.ts.sort_index()
 
     def time_timeseries_slice_minutely(self):

diff --git a/ci/requirements-3.5_DOC.run b/ci/requirements-3.5_DOC.run
@@ -18,4 +18,5 @@ sqlalchemy
 numexpr
 bottleneck
 statsmodels
+xarray
 pyqt=4.11.4
diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst
@@ -46,7 +46,7 @@ data with an arbitrary number of dimensions in lower dimensional data
 structures like Series (1d) and DataFrame (2d).
 
 In this section, we will show what exactly we mean by "hierarchical" indexing
-and how it integrates with the all of the pandas indexing functionality
+and how it integrates with all of the pandas indexing functionality
 described above and in prior sections. Later, when discussing :ref:`group by
 <groupby>` and :ref:`pivoting and reshaping data <reshaping>`, we'll show
 non-trivial applications to illustrate how it aids in structuring data for
@@ -136,7 +136,7 @@ can find yourself working with hierarchically-indexed data without creating a
 may wish to generate your own ``MultiIndex`` when preparing the data set.
 
 Note that how the index is displayed by be controlled using the
-``multi_sparse`` option in ``pandas.set_printoptions``:
+``multi_sparse`` option in ``pandas.set_options()``:
 
 .. ipython:: python
 
@@ -175,35 +175,40 @@ completely analogous way to selecting a column in a regular DataFrame:
 See :ref:`Cross-section with hierarchical index <advanced.xs>` for how to select
 on a deeper level.
 
-.. note::
+.. _advanced.shown_levels:
+
+Defined Levels
+~~~~~~~~~~~~~~
+
+The repr of a ``MultiIndex`` shows ALL the defined levels of an index, even
+if the they are not actually used. When slicing an index, you may notice this.
+For example:
 
-   The repr of a ``MultiIndex`` shows ALL the defined levels of an index, even
-   if the they are not actually used. When slicing an index, you may notice this.
-   For example:
+.. ipython:: python
 
-   .. ipython:: python
+   # original multi-index
+   df.columns
 
-      # original multi-index
-      df.columns
+   # sliced
+   df[['foo','qux']].columns
 
-      # sliced
-      df[['foo','qux']].columns
+This is done to avoid a recomputation of the levels in order to make slicing
+highly performant. If you want to see the actual used levels.
 
-   This is done to avoid a recomputation of the levels in order to make slicing
-   highly performant. If you want to see the actual used levels.
+.. ipython:: python
 
-   .. ipython:: python
+   df[['foo','qux']].columns.values
 
-      df[['foo','qux']].columns.values
+   # for a specific level
+   df[['foo','qux']].columns.get_level_values(0)
 
-      # for a specific level
-      df[['foo','qux']].columns.get_level_values(0)
+To reconstruct the multiindex with only the used levels
 
-   To reconstruct the multiindex with only the used levels
+.. versionadded:: 0.20.0
 
-   .. ipython:: python
+.. ipython:: python
 
-      pd.MultiIndex.from_tuples(df[['foo','qux']].columns.values)
+   df[['foo','qux']].columns.remove_unused_levels()
 
 Data alignment and using ``reindex``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -288,7 +293,7 @@ As usual, **both sides** of the slicers are included as this is label indexing.
 
    .. code-block:: python
 
-      df.loc[(slice('A1','A3'),.....),:]
+      df.loc[(slice('A1','A3'),.....), :]
 
    rather than this:
 
@@ -317,51 +322,51 @@ Basic multi-index slicing using slices, lists, and labels.
 
 .. ipython:: python
 
-   dfmi.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:]
+   dfmi.loc[(slice('A1','A3'), slice(None), ['C1', 'C3']), :]
 
 You can use a ``pd.IndexSlice`` to have a more natural syntax using ``:`` rather than using ``slice(None)``
 
 .. ipython:: python
 
    idx = pd.IndexSlice
-   dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
+   dfmi.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']]
 
 It is possible to perform quite complicated selections using this method on multiple
 axes at the same time.
 
 .. ipython:: python
 
-   dfmi.loc['A1',(slice(None),'foo')]
-   dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
+   dfmi.loc['A1', (slice(None), 'foo')]
+   dfmi.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']]
 
 Using a boolean indexer you can provide selection related to the *values*.
 
 .. ipython:: python
 
-   mask = dfmi[('a','foo')]>200
-   dfmi.loc[idx[mask,:,['C1','C3']],idx[:,'foo']]
+   mask = dfmi[('a', 'foo')] > 200
+   dfmi.loc[idx[mask, :, ['C1', 'C3']], idx[:, 'foo']]
 
 You can also specify the ``axis`` argument to ``.loc`` to interpret the passed
 slicers on a single axis.
 
 .. ipython:: python
 
-   dfmi.loc(axis=0)[:,:,['C1','C3']]
+   dfmi.loc(axis=0)[:, :, ['C1', 'C3']]
 
 Furthermore you can *set* the values using these methods
 
 .. ipython:: python
 
    df2 = dfmi.copy()
-   df2.loc(axis=0)[:,:,['C1','C3']] = -10
+   df2.loc(axis=0)[:, :, ['C1', 'C3']] = -10
    df2
 
 You can use a right-hand-side of an alignable object as well.
 
 .. ipython:: python
 
    df2 = dfmi.copy()
-   df2.loc[idx[:,:,['C1','C3']],:] = df2*1000
+   df2.loc[idx[:, :, ['C1', 'C3']], :] = df2 * 1000
    df2
 
 .. _advanced.xs:

diff --git a/doc/source/api.rst b/doc/source/api.rst
@@ -1432,6 +1432,7 @@ MultiIndex Components
    MultiIndex.droplevel
    MultiIndex.swaplevel
    MultiIndex.reorder_levels
+   MultiIndex.remove_unused_levels
 
 .. _api.datetimeindex:
 

diff --git a/doc/source/computation.rst b/doc/source/computation.rst
@@ -505,13 +505,18 @@ two ``Series`` or any combination of ``DataFrame/Series`` or
 - ``DataFrame/DataFrame``: by default compute the statistic for matching column
   names, returning a DataFrame. If the keyword argument ``pairwise=True`` is
   passed then computes the statistic for each pair of columns, returning a
-  ``Panel`` whose ``items`` are the dates in question (see :ref:`the next section
+  ``MultiIndexed DataFrame`` whose ``index`` are the dates in question (see :ref:`the next section
   <stats.moments.corr_pairwise>`).
 
 For example:
 
 .. ipython:: python
 
+   df = pd.DataFrame(np.random.randn(1000, 4),
+                     index=pd.date_range('1/1/2000', periods=1000),
+                     columns=['A', 'B', 'C', 'D'])
+   df = df.cumsum()
+
    df2 = df[:20]
    df2.rolling(window=5).corr(df2['B'])
 
@@ -520,11 +525,16 @@ For example:
 Computing rolling pairwise covariances and correlations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+.. warning::
+
+   Prior to version 0.20.0 if ``pairwise=True`` was passed, a ``Panel`` would be returned.
+   This will now return a 2-level MultiIndexed DataFrame, see the whatsnew :ref:`here <whatsnew_0200.api_breaking.rolling_pairwise>`
+
 In financial data analysis and other fields it's common to compute covariance
 and correlation matrices for a collection of time series. Often one is also
 interested in moving-window covariance and correlation matrices. This can be
 done by passing the ``pairwise`` keyword argument, which in the case of
-``DataFrame`` inputs will yield a ``Panel`` whose ``items`` are the dates in
+``DataFrame`` inputs will yield a MultiIndexed ``DataFrame`` whose ``index`` are the dates in
 question. In the case of a single DataFrame argument the ``pairwise`` argument
 can even be omitted:
 
@@ -539,15 +549,15 @@ can even be omitted:
 .. ipython:: python
 
    covs = df[['B','C','D']].rolling(window=50).cov(df[['A','B','C']], pairwise=True)
-   covs[df.index[-50]]
+   covs.loc['2002-09-22':]
 
 .. ipython:: python
 
    correls = df.rolling(window=50).corr()
-   correls[df.index[-50]]
+   correls.loc['2002-09-22':]
 
 You can efficiently retrieve the time series of correlations between two
-columns using ``.loc`` indexing:
+columns by reshaping and indexing:
 
 .. ipython:: python
    :suppress:
@@ -557,7 +567,7 @@ columns using ``.loc`` indexing:
 .. ipython:: python
 
    @savefig rolling_corr_pairwise_ex.png
-   correls.loc[:, 'A', 'C'].plot()
+   correls.unstack(1)[('A', 'C')].plot()
 
 .. _stats.aggregate:
 

diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst
@@ -763,6 +763,11 @@ completion mechanism so they can be tab-completed:
 Panel
 -----
 
+.. warning::
+
+    In 0.20.0, ``Panel`` is deprecated and will be removed in
+    a future version. See the section :ref:`Deprecate Panel <dsintro.deprecate_panel>`.
+
 Panel is a somewhat less-used, but still important container for 3-dimensional
 data. The term `panel data <http://en.wikipedia.org/wiki/Panel_data>`__ is
 derived from econometrics and is partially responsible for the name pandas:
@@ -783,6 +788,7 @@ From 3D ndarray with optional axis labels
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 .. ipython:: python
+   :okwarning:
 
    wp = pd.Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'],
                  major_axis=pd.date_range('1/1/2000', periods=5),
@@ -794,6 +800,7 @@ From dict of DataFrame objects
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 .. ipython:: python
+   :okwarning:
 
    data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
            'Item2' : pd.DataFrame(np.random.randn(4, 2))}
@@ -816,6 +823,7 @@ dictionary of DataFrames as above, and the following named parameters:
 For example, compare to the construction above:
 
 .. ipython:: python
+   :okwarning:
 
    pd.Panel.from_dict(data, orient='minor')
 
@@ -824,6 +832,7 @@ DataFrame objects with mixed-type columns, all of the data will get upcasted to
 ``dtype=object`` unless you pass ``orient='minor'``:
 
 .. ipython:: python
+   :okwarning:
 
    df = pd.DataFrame({'a': ['foo', 'bar', 'baz'],
                       'b': np.random.randn(3)})
@@ -851,6 +860,7 @@ This method was introduced in v0.7 to replace ``LongPanel.to_long``, and convert
 a DataFrame with a two-level index to a Panel.
 
 .. ipython:: python
+   :okwarning:
 
    midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], labels=[[1,1,0,0],[1,0,1,0]])
    df = pd.DataFrame({'A' : [1, 2, 3, 4], 'B': [5, 6, 7, 8]}, index=midx)
@@ -880,6 +890,7 @@ A Panel can be rearranged using its ``transpose`` method (which does not make a
 copy by default unless the data are heterogeneous):
 
 .. ipython:: python
+   :okwarning:
 
    wp.transpose(2, 0, 1)
 
@@ -909,6 +920,7 @@ Squeezing
 Another way to change the dimensionality of an object is to ``squeeze`` a 1-len object, similar to ``wp['Item1']``
 
 .. ipython:: python
+   :okwarning:
 
    wp.reindex(items=['Item1']).squeeze()
    wp.reindex(items=['Item1'], minor=['B']).squeeze()
@@ -923,12 +935,55 @@ for more on this. To convert a Panel to a DataFrame, use the ``to_frame``
 method:
 
 .. ipython:: python
+   :okwarning:
 
    panel = pd.Panel(np.random.randn(3, 5, 4), items=['one', 'two', 'three'],
                     major_axis=pd.date_range('1/1/2000', periods=5),
                     minor_axis=['a', 'b', 'c', 'd'])
    panel.to_frame()
 
+
+.. _dsintro.deprecate_panel:
+
+Deprecate Panel
+---------------
+
+Over the last few years, pandas has increased in both breadth and depth, with new features,
+datatype support, and manipulation routines. As a result, supporting efficient indexing and functional
+routines for ``Series``, ``DataFrame`` and ``Panel`` has contributed to an increasingly fragmented and
+difficult-to-understand codebase.
+
+The 3-D structure of a ``Panel`` is much less common for many types of data analysis,
+than the 1-D of the ``Series`` or the 2-D of the ``DataFrame``. Going forward it makes sense for
+pandas to focus on these areas exclusively.
+
+Oftentimes, one can simply use a MultiIndex ``DataFrame`` for easily working with higher dimensional data.
+
+In additon, the ``xarray`` package was built from the ground up, specifically in order to
+support the multi-dimensional analysis that is one of ``Panel`` s main usecases.
+`Here is a link to the xarray panel-transition documentation <http://xarray.pydata.org/en/stable/pandas.html#panel-transition>`__.
+
+.. ipython:: python
+   :okwarning:
+
+   p = tm.makePanel()
+   p
+
+Convert to a MultiIndex DataFrame
+
+.. ipython:: python
+   :okwarning:
+
+   p.to_frame()
+
+Alternatively, one can convert to an xarray ``DataArray``.
+
+.. ipython:: python
+
+   p.to_xarray()
+
+You can see the full-documentation for the `xarray package <http://xarray.pydata.org/en/stable/>`__.
+
 .. _dsintro.panelnd:
 .. _dsintro.panel4d:
 

diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst
@@ -69,7 +69,7 @@ Different Choices for Indexing
 .. versionadded:: 0.11.0
 
 Object selection has had a number of user-requested additions in order to
-support more explicit location based indexing. pandas now supports three types
+support more explicit location based indexing. Pandas now supports three types
 of multi-axis indexing.
 
 - ``.loc`` is primarily label based, but may also be used with a boolean array. ``.loc`` will raise ``KeyError`` when the items are not found. Allowed inputs are:
@@ -401,7 +401,7 @@ Selection By Position
    This is sometimes called ``chained assignment`` and should be avoided.
    See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`
 
-pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely python and numpy slicing. These are ``0-based`` indexing. When slicing, the start bounds is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise a ``IndexError``.
+Pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely python and numpy slicing. These are ``0-based`` indexing. When slicing, the start bounds is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise an ``IndexError``.
 
 The ``.iloc`` attribute is the primary access method. The following are valid inputs: