DEPR: deprecate .ix in favor of .loc/.iloc

closes #14218
pandas-dev · Jan 12, 2017 · c462504 · c462504
1 parent 0fe491d
commit c462504
Show file tree

Hide file tree

Showing 78 changed files with 1,590 additions and 1,290 deletions.
diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst
@@ -230,7 +230,7 @@ of tuples:
 Advanced indexing with hierarchical index
 -----------------------------------------
 
-Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc/.ix`` is a
+Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc`` is a
 bit challenging, but we've made every effort to do so. for example the
 following works as you would expect:
 
@@ -258,7 +258,7 @@ Passing a list of labels or tuples works similar to reindexing:
 
 .. ipython:: python
 
-   df.ix[[('bar', 'two'), ('qux', 'one')]]
+   df.loc[[('bar', 'two'), ('qux', 'one')]]
 
 .. _advanced.mi_slicers:
 
@@ -604,7 +604,7 @@ intended to work on boolean indices and may return unexpected results.
 
    ser = pd.Series(np.random.randn(10))
    ser.take([False, False, True, True])
-   ser.ix[[0, 1]]
+   ser.iloc[[0, 1]]
 
 Finally, as a small note on performance, because the ``take`` method handles
 a narrower range of inputs, it can offer performance that is a good deal
@@ -620,7 +620,7 @@ faster than fancy indexing.
    timeit arr.take(indexer, axis=0)
 
    ser = pd.Series(arr[:, 0])
-   timeit ser.ix[indexer]
+   timeit ser.iloc[indexer]
    timeit ser.take(indexer)
 
 .. _indexing.index_types:
@@ -661,7 +661,7 @@ Setting the index, will create create a ``CategoricalIndex``
    df2 = df.set_index('B')
    df2.index
 
-Indexing with ``__getitem__/.iloc/.loc/.ix`` works similarly to an ``Index`` with duplicates.
+Indexing with ``__getitem__/.iloc/.loc`` works similarly to an ``Index`` with duplicates.
 The indexers MUST be in the category or the operation will raise.
 
 .. ipython:: python
@@ -759,14 +759,12 @@ same.
    sf = pd.Series(range(5), index=indexf)
    sf
 
-Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
+Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
 
 .. ipython:: python
 
    sf[3]
    sf[3.0]
-   sf.ix[3]
-   sf.ix[3.0]
    sf.loc[3]
    sf.loc[3.0]
 
@@ -783,7 +781,6 @@ Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS posit
 .. ipython:: python
 
    sf[2:4]
-   sf.ix[2:4]
    sf.loc[2:4]
    sf.iloc[2:4]
 
@@ -813,14 +810,6 @@ In non-float indexes, slicing using floats will raise a ``TypeError``
       In [3]: pd.Series(range(5)).iloc[3.0]
       TypeError: cannot do positional indexing on <class 'pandas.indexes.range.RangeIndex'> with these indexers [3.0] of <type 'float'>
 
-   Further the treatment of ``.ix`` with a float indexer on a non-float index, will be label based, and thus coerce the index.
-
-   .. ipython:: python
-
-      s2 = pd.Series([1, 2, 3], index=list('abc'))
-      s2
-      s2.ix[1.0] = 10
-      s2
 
 Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat
 irregular timedelta-like indexing scheme, but the data is recorded as floats. This could for

diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst
@@ -61,6 +61,8 @@ See the :ref:`MultiIndex / Advanced Indexing <advanced>` for ``MultiIndex`` and
 
 See the :ref:`cookbook<cookbook.selection>` for some advanced strategies
 
+.. _indexing.choice:
+
 Different Choices for Indexing
 ------------------------------
 
@@ -104,24 +106,13 @@ of multi-axis indexing.
 
   See more at :ref:`Selection by Position <indexing.integer>`
 
-- ``.ix`` supports mixed integer and label based access. It is primarily label
-  based, but will fall back to integer positional access unless the corresponding
-  axis is of integer type. ``.ix`` is the most general and will
-  support any of the inputs in ``.loc`` and ``.iloc``. ``.ix`` also supports floating point
-  label schemes. ``.ix`` is exceptionally useful when dealing with mixed positional
-  and label based hierarchical indexes.
-
-  However, when an axis is integer based, ONLY
-  label based access and not positional access is supported.
-  Thus, in such cases, it's usually better to be explicit and use ``.iloc`` or ``.loc``.
-
   See more at :ref:`Advanced Indexing <advanced>` and :ref:`Advanced
   Hierarchical <advanced.advanced_hierarchical>`.
 
-- ``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer. See more at :ref:`Selection By Callable <indexing.callable>`.
+- ``.loc``, ``.iloc``, and also ``[]`` indexing can accept a ``callable`` as indexer. See more at :ref:`Selection By Callable <indexing.callable>`.
 
 Getting values from an object with multi-axes selection uses the following
-notation (using ``.loc`` as an example, but applies to ``.iloc`` and ``.ix`` as
+notation (using ``.loc`` as an example, but applies to ``.iloc`` as
 well). Any of the axes accessors may be the null slice ``:``. Axes left out of
 the specification are assumed to be ``:``. (e.g. ``p.loc['a']`` is equiv to
 ``p.loc['a', :, :]``)
@@ -135,6 +126,48 @@ the specification are assumed to be ``:``. (e.g. ``p.loc['a']`` is equiv to
     DataFrame; ``df.loc[row_indexer,column_indexer]``
     Panel; ``p.loc[item_indexer,major_indexer,minor_indexer]``
 
+.. _indexing.deprecate_ix:
+
+IX Indexer is Deprecated
+------------------------
+
+.. warning::
+
+  Startin in 0.20.0, the ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*. This has caused quite a bit of user confusion over the years.
+
+
+The recommended methods of indexing are:
+
+.. ipython:: python
+
+  dfd = pd.DataFrame({'A': [1, 2, 3],
+                      'B': [4, 5, 6]},
+                     index=list('abc'))
+
+  dfd
+
+Previous Behavior, where you wish to get the 0th and the 2nd elements from the index in the 'A' column.
+
+.. code-block:: ipython
+
+  In [3]: dfd.ix[[0, 2], 'A']
+  Out[3]:
+  a    1
+  c    3
+  Name: A, dtype: int64
+
+Using ``.loc``. Here we will select the appropriate indexes from the index, then use *label* indexing.
+
+.. ipython:: python
+
+  dfd.loc[df.index[[0, 2]], 'A']
+
+Using ``.iloc``. Here we will get the location of the 'A' column, then use *positional* indexing to select things.
+
+.. ipython:: python
+
+  dfd.iloc[[0, 2], df.columns.get_loc('A')]
+
 .. _indexing.basics:
 
 Basics
@@ -193,7 +226,7 @@ columns.
 
 .. warning::
 
-   pandas aligns all AXES when setting ``Series`` and ``DataFrame`` from ``.loc``, ``.iloc`` and ``.ix``.
+   pandas aligns all AXES when setting ``Series`` and ``DataFrame`` from ``.loc``, and ``.iloc``.
 
    This will **not** modify ``df`` because the column alignment is before value assignment.
 
@@ -526,7 +559,7 @@ Selection By Callable
 
 .. versionadded:: 0.18.1
 
-``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer.
+``.loc``, ``.iloc``, and also ``[]`` indexing can accept a ``callable`` as indexer.
 The ``callable`` must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.
 
 .. ipython:: python
@@ -641,7 +674,7 @@ Setting With Enlargement
 
 .. versionadded:: 0.13
 
-The ``.loc/.ix/[]`` operations can perform enlargement when setting a non-existant key for that axis.
+The ``.loc/[]`` operations can perform enlargement when setting a non-existant key for that axis.
 
 In the ``Series`` case this is effectively an appending operation
 
@@ -906,7 +939,7 @@ without creating a copy:
 
 Furthermore, ``where`` aligns the input boolean condition (ndarray or DataFrame),
 such that partial selection with setting is possible. This is analogous to
-partial setting via ``.ix`` (but on the contents rather than the axis labels)
+partial setting via ``.loc`` (but on the contents rather than the axis labels)
 
 .. ipython:: python
 
@@ -1716,7 +1749,7 @@ A chained assignment can also crop up in setting in a mixed dtype frame.
 
 .. note::
 
-   These setting rules apply to all of ``.loc/.iloc/.ix``
+   These setting rules apply to all of ``.loc/.iloc``
 
 This is the correct access method
 

diff --git a/doc/source/whatsnew/v0.20.0.txt b/doc/source/whatsnew/v0.20.0.txt
@@ -10,6 +10,7 @@ users upgrade to this version.
 Highlights include:
 
 - Building pandas for development now requires ``cython >= 0.23`` (:issue:`14831`)
+- The ``.ix`` indexer has been deprecated, see :ref:`here <whatsnew.api_breaking.deprecate_ix>`
 
 Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations <whatsnew_0200.deprecations>` before updating.
 
@@ -122,6 +123,53 @@ Other enhancements
 Backwards incompatible API changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+
+.. _whatsnew.api_breaking.deprecate_ix
+
+Deprecate .ix
+^^^^^^^^^^^^^
+
+The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*. This has caused quite a bit of user confusion over the years. The full indexing documentation are :ref:`here <indexing>`. (:issue:`14218`)
+
+
+The recommended methods of indexing are:
+
+- ``.loc`` if you want to *label* index
+- ``.iloc`` if you want to *positionally* index.
+
+Using ``.ix`` will now show a deprecation warning with a mini-example of how to convert code.
+
+.. ipython:: python
+
+  df = pd.DataFrame({'A': [1, 2, 3],
+                     'B': [4, 5, 6]},
+                    index=list('abc'))
+
+  df
+
+Previous Behavior, where you wish to get the 0th and the 2nd elements from the index in the 'A' column.
+
+.. code-block:: ipython
+
+  In [3]: df.ix[[0, 2], 'A']
+  Out[3]:
+  a    1
+  c    3
+  Name: A, dtype: int64
+
+Using ``.loc``. Here we will select the appropriate indexes from the index, then use *label* indexing.
+
+.. ipython:: python
+
+  df.loc[df.index[[0, 2]], 'A']
+
+Using ``.iloc``. Here we will get the location of the 'A' column, then use *positional* indexing to select things.
+
+.. ipython:: python
+
+  df.iloc[[0, 2], df.columns.get_loc('A')]
+
+
 .. _whatsnew.api_breaking.index_map
 
 Map on Index types now return other Index types

diff --git a/pandas/core/frame.py b/pandas/core/frame.py
@@ -1961,7 +1961,7 @@ def _ixs(self, i, axis=0):
             if isinstance(i, slice):
                 # need to return view
                 lab_slice = slice(label[0], label[-1])
-                return self.ix[:, lab_slice]
+                return self.loc[:, lab_slice]
             else:
                 if isinstance(label, Index):
                     return self.take(i, axis=1, convert=True)
@@ -2056,7 +2056,7 @@ def _getitem_array(self, key):
             indexer = key.nonzero()[0]
             return self.take(indexer, axis=0, convert=False)
         else:
-            indexer = self.ix._convert_to_indexer(key, axis=1)
+            indexer = self.loc._convert_to_indexer(key, axis=1)
             return self.take(indexer, axis=1, convert=True)
 
     def _getitem_multilevel(self, key):
@@ -2389,7 +2389,7 @@ def __setitem__(self, key, value):
 
     def _setitem_slice(self, key, value):
         self._check_setitem_copy()
-        self.ix._setitem_with_indexer(key, value)
+        self.loc._setitem_with_indexer(key, value)
 
     def _setitem_array(self, key, value):
         # also raises Exception if object array with NA values
@@ -2400,17 +2400,17 @@ def _setitem_array(self, key, value):
             key = check_bool_indexer(self.index, key)
             indexer = key.nonzero()[0]
             self._check_setitem_copy()
-            self.ix._setitem_with_indexer(indexer, value)
+            self.loc._setitem_with_indexer(indexer, value)
         else:
             if isinstance(value, DataFrame):
                 if len(value.columns) != len(key):
                     raise ValueError('Columns must be same length as key')
                 for k1, k2 in zip(key, value.columns):
                     self[k1] = value[k2]
             else:
-                indexer = self.ix._convert_to_indexer(key, axis=1)
+                indexer = self.loc._convert_to_indexer(key, axis=1)
                 self._check_setitem_copy()
-                self.ix._setitem_with_indexer((slice(None), indexer), value)
+                self.loc._setitem_with_indexer((slice(None), indexer), value)
 
     def _setitem_frame(self, key, value):
         # support boolean setting with DataFrame input, e.g.
@@ -4403,7 +4403,7 @@ def append(self, other, ignore_index=False, verify_integrity=False):
         elif isinstance(other, list) and not isinstance(other[0], DataFrame):
             other = DataFrame(other)
             if (self.columns.get_indexer(other.columns) >= 0).all():
-                other = other.ix[:, self.columns]
+                other = other.loc[:, self.columns]
 
         from pandas.tools.merge import concat
         if isinstance(other, (list, tuple)):

diff --git a/pandas/core/generic.py b/pandas/core/generic.py
@@ -1809,18 +1809,12 @@ def xs(self, key, axis=0, level=None, drop_level=True):
             loc, new_ax = labels.get_loc_level(key, level=level,
                                                drop_level=drop_level)
 
-            # convert to a label indexer if needed
-            if isinstance(loc, slice):
-                lev_num = labels._get_level_number(level)
-                if labels.levels[lev_num].inferred_type == 'integer':
-                    loc = labels[loc]
-
             # create the tuple of the indexer
             indexer = [slice(None)] * self.ndim
             indexer[axis] = loc
             indexer = tuple(indexer)
 
-            result = self.ix[indexer]
+            result = self.iloc[indexer]
             setattr(result, result._get_axis_name(axis), new_ax)
             return result
 
@@ -1983,7 +1977,7 @@ def drop(self, labels, axis=0, level=None, inplace=False, errors='raise'):
             slicer = [slice(None)] * self.ndim
             slicer[self._get_axis_number(axis_name)] = indexer
 
-            result = self.ix[tuple(slicer)]
+            result = self.loc[tuple(slicer)]
 
         if inplace:
             self._update_inplace(result)
@@ -4332,8 +4326,9 @@ def first(self, offset):
         if not offset.isAnchored() and hasattr(offset, '_inc'):
             if end_date in self.index:
                 end = self.index.searchsorted(end_date, side='left')
+                return self.iloc[:end]
 
-        return self.ix[:end]
+        return self.loc[:end]
 
     def last(self, offset):
         """
@@ -4364,7 +4359,7 @@ def last(self, offset):
 
         start_date = start = self.index[-1] - offset
         start = self.index.searchsorted(start_date, side='right')
-        return self.ix[start:]
+        return self.iloc[start:]
 
     def rank(self, axis=0, method='average', numeric_only=None,
              na_option='keep', ascending=True, pct=False):
@@ -5078,7 +5073,7 @@ def truncate(self, before=None, after=None, axis=None, copy=True):
 
         slicer = [slice(None, None)] * self._AXIS_LEN
         slicer[axis] = slice(before, after)
-        result = self.ix[tuple(slicer)]
+        result = self.loc[tuple(slicer)]
 
         if isinstance(ax, MultiIndex):
             setattr(result, self._get_axis_name(axis),

diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
@@ -4103,7 +4103,7 @@ def _chop(self, sdata, slice_obj):
         if self.axis == 0:
             return sdata.iloc[slice_obj]
         else:
-            return sdata._slice(slice_obj, axis=1)  # ix[:, slice_obj]
+            return sdata._slice(slice_obj, axis=1)  # .loc[:, slice_obj]
 
 
 class NDFrameSplitter(DataSplitter):