DEPR: deprecate .ix in favor of .loc/.iloc

closes #14218 closes #15116
pandas-dev · Jan 12, 2017 · 3cf48d3 · 3cf48d3
1 parent 0fe491d
commit 3cf48d3
Show file tree

Hide file tree

Showing 79 changed files with 1,597 additions and 1,366 deletions.
diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst
@@ -230,7 +230,7 @@ of tuples:
 Advanced indexing with hierarchical index
 -----------------------------------------
 
-Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc/.ix`` is a
+Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc`` is a
 bit challenging, but we've made every effort to do so. for example the
 following works as you would expect:
 
@@ -258,7 +258,7 @@ Passing a list of labels or tuples works similar to reindexing:
 
 .. ipython:: python
 
-   df.ix[[('bar', 'two'), ('qux', 'one')]]
+   df.loc[[('bar', 'two'), ('qux', 'one')]]
 
 .. _advanced.mi_slicers:
 
@@ -604,7 +604,7 @@ intended to work on boolean indices and may return unexpected results.
 
    ser = pd.Series(np.random.randn(10))
    ser.take([False, False, True, True])
-   ser.ix[[0, 1]]
+   ser.iloc[[0, 1]]
 
 Finally, as a small note on performance, because the ``take`` method handles
 a narrower range of inputs, it can offer performance that is a good deal
@@ -620,7 +620,7 @@ faster than fancy indexing.
    timeit arr.take(indexer, axis=0)
 
    ser = pd.Series(arr[:, 0])
-   timeit ser.ix[indexer]
+   timeit ser.iloc[indexer]
    timeit ser.take(indexer)
 
 .. _indexing.index_types:
@@ -661,7 +661,7 @@ Setting the index, will create create a ``CategoricalIndex``
    df2 = df.set_index('B')
    df2.index
 
-Indexing with ``__getitem__/.iloc/.loc/.ix`` works similarly to an ``Index`` with duplicates.
+Indexing with ``__getitem__/.iloc/.loc`` works similarly to an ``Index`` with duplicates.
 The indexers MUST be in the category or the operation will raise.
 
 .. ipython:: python
@@ -759,14 +759,12 @@ same.
    sf = pd.Series(range(5), index=indexf)
    sf
 
-Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
+Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
 
 .. ipython:: python
 
    sf[3]
    sf[3.0]
-   sf.ix[3]
-   sf.ix[3.0]
    sf.loc[3]
    sf.loc[3.0]
 
@@ -783,7 +781,6 @@ Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS posit
 .. ipython:: python
 
    sf[2:4]
-   sf.ix[2:4]
    sf.loc[2:4]
    sf.iloc[2:4]
 
@@ -813,14 +810,6 @@ In non-float indexes, slicing using floats will raise a ``TypeError``
       In [3]: pd.Series(range(5)).iloc[3.0]
       TypeError: cannot do positional indexing on <class 'pandas.indexes.range.RangeIndex'> with these indexers [3.0] of <type 'float'>
 
-   Further the treatment of ``.ix`` with a float indexer on a non-float index, will be label based, and thus coerce the index.
-
-   .. ipython:: python
-
-      s2 = pd.Series([1, 2, 3], index=list('abc'))
-      s2
-      s2.ix[1.0] = 10
-      s2
 
 Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat
 irregular timedelta-like indexing scheme, but the data is recorded as floats. This could for

diff --git a/doc/source/gotchas.rst b/doc/source/gotchas.rst
@@ -214,27 +214,6 @@ and traded integer ``NA`` capability for a much simpler approach of using a
 special value in float and object arrays to denote ``NA``, and promoting
 integer arrays to floating when NAs must be introduced.
 
-Integer indexing
-----------------
-
-Label-based indexing with integer axis labels is a thorny topic. It has been
-discussed heavily on mailing lists and among various members of the scientific
-Python community. In pandas, our general viewpoint is that labels matter more
-than integer locations. Therefore, with an integer axis index *only*
-label-based indexing is possible with the standard tools like ``.ix``. The
-following code will generate exceptions:
-
-.. code-block:: python
-
-   s = pd.Series(range(5))
-   s[-1]
-   df = pd.DataFrame(np.random.randn(5, 4))
-   df
-   df.ix[-2:]
-
-This deliberate decision was made to prevent ambiguities and subtle bugs (many
-users reported finding bugs when the API change was made to stop "falling back"
-on position-based indexing).
 
 Label-based slicing conventions
 -------------------------------
@@ -305,15 +284,15 @@ index can be somewhat complicated. For example, the following does not work:
 
 ::
 
-    s.ix['c':'e'+1]
+    s.loc['c':'e'+1]
 
 A very common use case is to limit a time series to start and end at two
 specific dates. To enable this, we made the design design to make label-based
 slicing include both endpoints:
 
 .. ipython:: python
 
-    s.ix['c':'e']
+    s.loc['c':'e']
 
 This is most definitely a "practicality beats purity" sort of thing, but it is
 something to watch out for if you expect label-based slicing to behave exactly
@@ -322,58 +301,6 @@ in the way that standard Python integer slicing works.
 Miscellaneous indexing gotchas
 ------------------------------
 
-Reindex versus ix gotchas
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Many users will find themselves using the ``ix`` indexing capabilities as a
-concise means of selecting data from a pandas object:
-
-.. ipython:: python
-
-   df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three', 'four'],
-                     index=list('abcdef'))
-   df
-   df.ix[['b', 'c', 'e']]
-
-This is, of course, completely equivalent *in this case* to using the
-``reindex`` method:
-
-.. ipython:: python
-
-   df.reindex(['b', 'c', 'e'])
-
-Some might conclude that ``ix`` and ``reindex`` are 100% equivalent based on
-this. This is indeed true **except in the case of integer indexing**. For
-example, the above operation could alternately have been expressed as:
-
-.. ipython:: python
-
-   df.ix[[1, 2, 4]]
-
-If you pass ``[1, 2, 4]`` to ``reindex`` you will get another thing entirely:
-
-.. ipython:: python
-
-   df.reindex([1, 2, 4])
-
-So it's important to remember that ``reindex`` is **strict label indexing
-only**. This can lead to some potentially surprising results in pathological
-cases where an index contains, say, both integers and strings:
-
-.. ipython:: python
-
-   s = pd.Series([1, 2, 3], index=['a', 0, 1])
-   s
-   s.ix[[0, 1]]
-   s.reindex([0, 1])
-
-Because the index in this case does not contain solely integers, ``ix`` falls
-back on integer indexing. By contrast, ``reindex`` only looks for the values
-passed in the index, thus finding the integers ``0`` and ``1``. While it would
-be possible to insert some logic to check whether a passed sequence is all
-contained in the index, that logic would exact a very high cost in large data
-sets.
-
 Reindex potentially changes underlying Series dtype
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 

diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst
@@ -61,6 +61,8 @@ See the :ref:`MultiIndex / Advanced Indexing <advanced>` for ``MultiIndex`` and
 
 See the :ref:`cookbook<cookbook.selection>` for some advanced strategies
 
+.. _indexing.choice:
+
 Different Choices for Indexing
 ------------------------------
 
@@ -104,24 +106,13 @@ of multi-axis indexing.
 
   See more at :ref:`Selection by Position <indexing.integer>`
 
-- ``.ix`` supports mixed integer and label based access. It is primarily label
-  based, but will fall back to integer positional access unless the corresponding
-  axis is of integer type. ``.ix`` is the most general and will
-  support any of the inputs in ``.loc`` and ``.iloc``. ``.ix`` also supports floating point
-  label schemes. ``.ix`` is exceptionally useful when dealing with mixed positional
-  and label based hierarchical indexes.
-
-  However, when an axis is integer based, ONLY
-  label based access and not positional access is supported.
-  Thus, in such cases, it's usually better to be explicit and use ``.iloc`` or ``.loc``.
-
   See more at :ref:`Advanced Indexing <advanced>` and :ref:`Advanced
   Hierarchical <advanced.advanced_hierarchical>`.
 
-- ``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer. See more at :ref:`Selection By Callable <indexing.callable>`.
+- ``.loc``, ``.iloc``, and also ``[]`` indexing can accept a ``callable`` as indexer. See more at :ref:`Selection By Callable <indexing.callable>`.
 
 Getting values from an object with multi-axes selection uses the following
-notation (using ``.loc`` as an example, but applies to ``.iloc`` and ``.ix`` as
+notation (using ``.loc`` as an example, but applies to ``.iloc`` as
 well). Any of the axes accessors may be the null slice ``:``. Axes left out of
 the specification are assumed to be ``:``. (e.g. ``p.loc['a']`` is equiv to
 ``p.loc['a', :, :]``)
@@ -135,6 +126,48 @@ the specification are assumed to be ``:``. (e.g. ``p.loc['a']`` is equiv to
     DataFrame; ``df.loc[row_indexer,column_indexer]``
     Panel; ``p.loc[item_indexer,major_indexer,minor_indexer]``
 
+.. _indexing.deprecate_ix:
+
+IX Indexer is Deprecated
+------------------------
+
+.. warning::
+
+  Startin in 0.20.0, the ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*. This has caused quite a bit of user confusion over the years.
+
+
+The recommended methods of indexing are:
+
+.. ipython:: python
+
+  dfd = pd.DataFrame({'A': [1, 2, 3],
+                      'B': [4, 5, 6]},
+                     index=list('abc'))
+
+  dfd
+
+Previous Behavior, where you wish to get the 0th and the 2nd elements from the index in the 'A' column.
+
+.. code-block:: ipython
+
+  In [3]: dfd.ix[[0, 2], 'A']
+  Out[3]:
+  a    1
+  c    3
+  Name: A, dtype: int64
+
+Using ``.loc``. Here we will select the appropriate indexes from the index, then use *label* indexing.
+
+.. ipython:: python
+
+  dfd.loc[df.index[[0, 2]], 'A']
+
+Using ``.iloc``. Here we will get the location of the 'A' column, then use *positional* indexing to select things.
+
+.. ipython:: python
+
+  dfd.iloc[[0, 2], df.columns.get_loc('A')]
+
 .. _indexing.basics:
 
 Basics
@@ -193,7 +226,7 @@ columns.
 
 .. warning::
 
-   pandas aligns all AXES when setting ``Series`` and ``DataFrame`` from ``.loc``, ``.iloc`` and ``.ix``.
+   pandas aligns all AXES when setting ``Series`` and ``DataFrame`` from ``.loc``, and ``.iloc``.
 
    This will **not** modify ``df`` because the column alignment is before value assignment.
 
@@ -526,7 +559,7 @@ Selection By Callable
 
 .. versionadded:: 0.18.1
 
-``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer.
+``.loc``, ``.iloc``, and also ``[]`` indexing can accept a ``callable`` as indexer.
 The ``callable`` must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.
 
 .. ipython:: python
@@ -641,7 +674,7 @@ Setting With Enlargement
 
 .. versionadded:: 0.13
 
-The ``.loc/.ix/[]`` operations can perform enlargement when setting a non-existant key for that axis.
+The ``.loc/[]`` operations can perform enlargement when setting a non-existant key for that axis.
 
 In the ``Series`` case this is effectively an appending operation
 
@@ -906,7 +939,7 @@ without creating a copy:
 
 Furthermore, ``where`` aligns the input boolean condition (ndarray or DataFrame),
 such that partial selection with setting is possible. This is analogous to
-partial setting via ``.ix`` (but on the contents rather than the axis labels)
+partial setting via ``.loc`` (but on the contents rather than the axis labels)
 
 .. ipython:: python
 
@@ -1716,7 +1749,7 @@ A chained assignment can also crop up in setting in a mixed dtype frame.
 
 .. note::
 
-   These setting rules apply to all of ``.loc/.iloc/.ix``
+   These setting rules apply to all of ``.loc/.iloc``
 
 This is the correct access method
 

diff --git a/doc/source/whatsnew/v0.20.0.txt b/doc/source/whatsnew/v0.20.0.txt
@@ -10,6 +10,7 @@ users upgrade to this version.
 Highlights include:
 
 - Building pandas for development now requires ``cython >= 0.23`` (:issue:`14831`)
+- The ``.ix`` indexer has been deprecated, see :ref:`here <whatsnew.api_breaking.deprecate_ix>`
 
 Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations <whatsnew_0200.deprecations>` before updating.
 
@@ -122,6 +123,53 @@ Other enhancements
 Backwards incompatible API changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+
+.. _whatsnew.api_breaking.deprecate_ix
+
+Deprecate .ix
+^^^^^^^^^^^^^
+
+The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*. This has caused quite a bit of user confusion over the years. The full indexing documentation are :ref:`here <indexing>`. (:issue:`14218`)
+
+
+The recommended methods of indexing are:
+
+- ``.loc`` if you want to *label* index
+- ``.iloc`` if you want to *positionally* index.
+
+Using ``.ix`` will now show a deprecation warning with a mini-example of how to convert code.
+
+.. ipython:: python
+
+  df = pd.DataFrame({'A': [1, 2, 3],
+                     'B': [4, 5, 6]},
+                    index=list('abc'))
+
+  df
+
+Previous Behavior, where you wish to get the 0th and the 2nd elements from the index in the 'A' column.
+
+.. code-block:: ipython
+
+  In [3]: df.ix[[0, 2], 'A']
+  Out[3]:
+  a    1
+  c    3
+  Name: A, dtype: int64
+
+Using ``.loc``. Here we will select the appropriate indexes from the index, then use *label* indexing.
+
+.. ipython:: python
+
+  df.loc[df.index[[0, 2]], 'A']
+
+Using ``.iloc``. Here we will get the location of the 'A' column, then use *positional* indexing to select things.
+
+.. ipython:: python
+
+  df.iloc[[0, 2], df.columns.get_loc('A')]
+
+
 .. _whatsnew.api_breaking.index_map
 
 Map on Index types now return other Index types