Skip to content

Commit

Permalink
DEPR: deprecate .ix in favor of .loc/.iloc
Browse files Browse the repository at this point in the history
closes #14218
closes #15116
  • Loading branch information
jreback committed Jan 12, 2017
1 parent 0fe491d commit 3cf48d3
Show file tree
Hide file tree
Showing 79 changed files with 1,597 additions and 1,366 deletions.
23 changes: 6 additions & 17 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ of tuples:
Advanced indexing with hierarchical index
-----------------------------------------

Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc/.ix`` is a
Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc`` is a
bit challenging, but we've made every effort to do so. for example the
following works as you would expect:

Expand Down Expand Up @@ -258,7 +258,7 @@ Passing a list of labels or tuples works similar to reindexing:

.. ipython:: python
df.ix[[('bar', 'two'), ('qux', 'one')]]
df.loc[[('bar', 'two'), ('qux', 'one')]]
.. _advanced.mi_slicers:

Expand Down Expand Up @@ -604,7 +604,7 @@ intended to work on boolean indices and may return unexpected results.
ser = pd.Series(np.random.randn(10))
ser.take([False, False, True, True])
ser.ix[[0, 1]]
ser.iloc[[0, 1]]
Finally, as a small note on performance, because the ``take`` method handles
a narrower range of inputs, it can offer performance that is a good deal
Expand All @@ -620,7 +620,7 @@ faster than fancy indexing.
timeit arr.take(indexer, axis=0)

ser = pd.Series(arr[:, 0])
timeit ser.ix[indexer]
timeit ser.iloc[indexer]
timeit ser.take(indexer)

.. _indexing.index_types:
Expand Down Expand Up @@ -661,7 +661,7 @@ Setting the index, will create create a ``CategoricalIndex``
df2 = df.set_index('B')
df2.index
Indexing with ``__getitem__/.iloc/.loc/.ix`` works similarly to an ``Index`` with duplicates.
Indexing with ``__getitem__/.iloc/.loc`` works similarly to an ``Index`` with duplicates.
The indexers MUST be in the category or the operation will raise.

.. ipython:: python
Expand Down Expand Up @@ -759,14 +759,12 @@ same.
sf = pd.Series(range(5), index=indexf)
sf
Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
.. ipython:: python
sf[3]
sf[3.0]
sf.ix[3]
sf.ix[3.0]
sf.loc[3]
sf.loc[3.0]
Expand All @@ -783,7 +781,6 @@ Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS posit
.. ipython:: python
sf[2:4]
sf.ix[2:4]
sf.loc[2:4]
sf.iloc[2:4]
Expand Down Expand Up @@ -813,14 +810,6 @@ In non-float indexes, slicing using floats will raise a ``TypeError``
In [3]: pd.Series(range(5)).iloc[3.0]
TypeError: cannot do positional indexing on <class 'pandas.indexes.range.RangeIndex'> with these indexers [3.0] of <type 'float'>
Further the treatment of ``.ix`` with a float indexer on a non-float index, will be label based, and thus coerce the index.
.. ipython:: python
s2 = pd.Series([1, 2, 3], index=list('abc'))
s2
s2.ix[1.0] = 10
s2
Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat
irregular timedelta-like indexing scheme, but the data is recorded as floats. This could for
Expand Down
77 changes: 2 additions & 75 deletions doc/source/gotchas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -214,27 +214,6 @@ and traded integer ``NA`` capability for a much simpler approach of using a
special value in float and object arrays to denote ``NA``, and promoting
integer arrays to floating when NAs must be introduced.

Integer indexing
----------------

Label-based indexing with integer axis labels is a thorny topic. It has been
discussed heavily on mailing lists and among various members of the scientific
Python community. In pandas, our general viewpoint is that labels matter more
than integer locations. Therefore, with an integer axis index *only*
label-based indexing is possible with the standard tools like ``.ix``. The
following code will generate exceptions:

.. code-block:: python
s = pd.Series(range(5))
s[-1]
df = pd.DataFrame(np.random.randn(5, 4))
df
df.ix[-2:]
This deliberate decision was made to prevent ambiguities and subtle bugs (many
users reported finding bugs when the API change was made to stop "falling back"
on position-based indexing).

Label-based slicing conventions
-------------------------------
Expand Down Expand Up @@ -305,15 +284,15 @@ index can be somewhat complicated. For example, the following does not work:

::

s.ix['c':'e'+1]
s.loc['c':'e'+1]

A very common use case is to limit a time series to start and end at two
specific dates. To enable this, we made the design design to make label-based
slicing include both endpoints:

.. ipython:: python
s.ix['c':'e']
s.loc['c':'e']
This is most definitely a "practicality beats purity" sort of thing, but it is
something to watch out for if you expect label-based slicing to behave exactly
Expand All @@ -322,58 +301,6 @@ in the way that standard Python integer slicing works.
Miscellaneous indexing gotchas
------------------------------

Reindex versus ix gotchas
~~~~~~~~~~~~~~~~~~~~~~~~~

Many users will find themselves using the ``ix`` indexing capabilities as a
concise means of selecting data from a pandas object:

.. ipython:: python
df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three', 'four'],
index=list('abcdef'))
df
df.ix[['b', 'c', 'e']]
This is, of course, completely equivalent *in this case* to using the
``reindex`` method:

.. ipython:: python
df.reindex(['b', 'c', 'e'])
Some might conclude that ``ix`` and ``reindex`` are 100% equivalent based on
this. This is indeed true **except in the case of integer indexing**. For
example, the above operation could alternately have been expressed as:

.. ipython:: python
df.ix[[1, 2, 4]]
If you pass ``[1, 2, 4]`` to ``reindex`` you will get another thing entirely:

.. ipython:: python
df.reindex([1, 2, 4])
So it's important to remember that ``reindex`` is **strict label indexing
only**. This can lead to some potentially surprising results in pathological
cases where an index contains, say, both integers and strings:

.. ipython:: python
s = pd.Series([1, 2, 3], index=['a', 0, 1])
s
s.ix[[0, 1]]
s.reindex([0, 1])
Because the index in this case does not contain solely integers, ``ix`` falls
back on integer indexing. By contrast, ``reindex`` only looks for the values
passed in the index, thus finding the integers ``0`` and ``1``. While it would
be possible to insert some logic to check whether a passed sequence is all
contained in the index, that logic would exact a very high cost in large data
sets.

Reindex potentially changes underlying Series dtype
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
69 changes: 51 additions & 18 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ See the :ref:`MultiIndex / Advanced Indexing <advanced>` for ``MultiIndex`` and

See the :ref:`cookbook<cookbook.selection>` for some advanced strategies

.. _indexing.choice:

Different Choices for Indexing
------------------------------

Expand Down Expand Up @@ -104,24 +106,13 @@ of multi-axis indexing.

See more at :ref:`Selection by Position <indexing.integer>`

- ``.ix`` supports mixed integer and label based access. It is primarily label
based, but will fall back to integer positional access unless the corresponding
axis is of integer type. ``.ix`` is the most general and will
support any of the inputs in ``.loc`` and ``.iloc``. ``.ix`` also supports floating point
label schemes. ``.ix`` is exceptionally useful when dealing with mixed positional
and label based hierarchical indexes.

However, when an axis is integer based, ONLY
label based access and not positional access is supported.
Thus, in such cases, it's usually better to be explicit and use ``.iloc`` or ``.loc``.

See more at :ref:`Advanced Indexing <advanced>` and :ref:`Advanced
Hierarchical <advanced.advanced_hierarchical>`.

- ``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer. See more at :ref:`Selection By Callable <indexing.callable>`.
- ``.loc``, ``.iloc``, and also ``[]`` indexing can accept a ``callable`` as indexer. See more at :ref:`Selection By Callable <indexing.callable>`.

Getting values from an object with multi-axes selection uses the following
notation (using ``.loc`` as an example, but applies to ``.iloc`` and ``.ix`` as
notation (using ``.loc`` as an example, but applies to ``.iloc`` as
well). Any of the axes accessors may be the null slice ``:``. Axes left out of
the specification are assumed to be ``:``. (e.g. ``p.loc['a']`` is equiv to
``p.loc['a', :, :]``)
Expand All @@ -135,6 +126,48 @@ the specification are assumed to be ``:``. (e.g. ``p.loc['a']`` is equiv to
DataFrame; ``df.loc[row_indexer,column_indexer]``
Panel; ``p.loc[item_indexer,major_indexer,minor_indexer]``

.. _indexing.deprecate_ix:

IX Indexer is Deprecated
------------------------

.. warning::

Startin in 0.20.0, the ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*. This has caused quite a bit of user confusion over the years.


The recommended methods of indexing are:

.. ipython:: python
dfd = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6]},
index=list('abc'))
dfd
Previous Behavior, where you wish to get the 0th and the 2nd elements from the index in the 'A' column.

.. code-block:: ipython
In [3]: dfd.ix[[0, 2], 'A']
Out[3]:
a 1
c 3
Name: A, dtype: int64
Using ``.loc``. Here we will select the appropriate indexes from the index, then use *label* indexing.

.. ipython:: python
dfd.loc[df.index[[0, 2]], 'A']
Using ``.iloc``. Here we will get the location of the 'A' column, then use *positional* indexing to select things.

.. ipython:: python
dfd.iloc[[0, 2], df.columns.get_loc('A')]
.. _indexing.basics:

Basics
Expand Down Expand Up @@ -193,7 +226,7 @@ columns.

.. warning::

pandas aligns all AXES when setting ``Series`` and ``DataFrame`` from ``.loc``, ``.iloc`` and ``.ix``.
pandas aligns all AXES when setting ``Series`` and ``DataFrame`` from ``.loc``, and ``.iloc``.

This will **not** modify ``df`` because the column alignment is before value assignment.

Expand Down Expand Up @@ -526,7 +559,7 @@ Selection By Callable

.. versionadded:: 0.18.1

``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer.
``.loc``, ``.iloc``, and also ``[]`` indexing can accept a ``callable`` as indexer.
The ``callable`` must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.

.. ipython:: python
Expand Down Expand Up @@ -641,7 +674,7 @@ Setting With Enlargement

.. versionadded:: 0.13

The ``.loc/.ix/[]`` operations can perform enlargement when setting a non-existant key for that axis.
The ``.loc/[]`` operations can perform enlargement when setting a non-existant key for that axis.

In the ``Series`` case this is effectively an appending operation

Expand Down Expand Up @@ -906,7 +939,7 @@ without creating a copy:

Furthermore, ``where`` aligns the input boolean condition (ndarray or DataFrame),
such that partial selection with setting is possible. This is analogous to
partial setting via ``.ix`` (but on the contents rather than the axis labels)
partial setting via ``.loc`` (but on the contents rather than the axis labels)

.. ipython:: python
Expand Down Expand Up @@ -1716,7 +1749,7 @@ A chained assignment can also crop up in setting in a mixed dtype frame.

.. note::

These setting rules apply to all of ``.loc/.iloc/.ix``
These setting rules apply to all of ``.loc/.iloc``

This is the correct access method

Expand Down
48 changes: 48 additions & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ users upgrade to this version.
Highlights include:

- Building pandas for development now requires ``cython >= 0.23`` (:issue:`14831`)
- The ``.ix`` indexer has been deprecated, see :ref:`here <whatsnew.api_breaking.deprecate_ix>`

Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations <whatsnew_0200.deprecations>` before updating.

Expand Down Expand Up @@ -122,6 +123,53 @@ Other enhancements
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


.. _whatsnew.api_breaking.deprecate_ix

Deprecate .ix
^^^^^^^^^^^^^

The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*. This has caused quite a bit of user confusion over the years. The full indexing documentation are :ref:`here <indexing>`. (:issue:`14218`)


The recommended methods of indexing are:

- ``.loc`` if you want to *label* index
- ``.iloc`` if you want to *positionally* index.

Using ``.ix`` will now show a deprecation warning with a mini-example of how to convert code.

.. ipython:: python

df = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6]},
index=list('abc'))

df

Previous Behavior, where you wish to get the 0th and the 2nd elements from the index in the 'A' column.

.. code-block:: ipython

In [3]: df.ix[[0, 2], 'A']
Out[3]:
a 1
c 3
Name: A, dtype: int64

Using ``.loc``. Here we will select the appropriate indexes from the index, then use *label* indexing.

.. ipython:: python

df.loc[df.index[[0, 2]], 'A']

Using ``.iloc``. Here we will get the location of the 'A' column, then use *positional* indexing to select things.

.. ipython:: python

df.iloc[[0, 2], df.columns.get_loc('A')]


.. _whatsnew.api_breaking.index_map

Map on Index types now return other Index types
Expand Down
Loading

0 comments on commit 3cf48d3

Please sign in to comment.