-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: warning to raise KeyError in the future if not all elements of a list are selected via .loc #17295
Conversation
still lots of warnings. |
Codecov Report
@@ Coverage Diff @@
## master #17295 +/- ##
==========================================
- Coverage 91.25% 91.24% -0.02%
==========================================
Files 163 163
Lines 49859 49866 +7
==========================================
Hits 45499 45499
- Misses 4360 4367 +7
Continue to review full report at Codecov.
|
4b523f4
to
e5f80e3
Compare
ok this is ready to go. |
doc/source/whatsnew/v0.21.0.txt
Outdated
|
||
s.loc[[1, 2]] | ||
|
||
Previously Behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous Behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No?!
Cool! Notice however that the warning is not being raised for In [2]: pd.DataFrame([[i] for i in range(3)], index=pd.MultiIndex.from_product([[1], [5,6,7]])).loc[[(1,5), (3,7)]]
Out[2]:
0
1 5 0.0
3 7 NaN I assume/hope that we will want to change the behavior with partial indexing too (despite the current behaviour being different): In [3]: pd.DataFrame([[i] for i in range(3)], index=pd.MultiIndex.from_product([[1], [5,6,7]])).loc[[1,3]]
Out[3]:
0
1 5 0
6 1
7 2 So maybe this too should raise a warning (although the whatsnew notes would maybe require some addition). Can go in another PR however. |
yeah this doesn't cover MultiIndex cases at all, but agree it should/could. welcome for PR's on top. |
One more thought: users might have in their code (I suspect I do) things like a.loc[list_like].dropna() ... in which case the suggested replacement is not |
@toobaz in your comment #17295 (comment) yes I think the warning could refer to a new section in the docs (like we do for .ix). where a more extended example could be. (and include this effect as well). let me add. |
updated with nicer message and doc section. |
doc/source/indexing.rst
Outdated
|
||
.. warning:: | ||
|
||
Starting in 0.21.0, using ``.loc`` with missing keys in a list. This is deprecated, in favor of ``.reindex``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"using .loc
with missing keys in a list. This is deprecated" -> "using .loc
with a list-like containing missing keys"?
doc/source/indexing.rst
Outdated
s.reindex([1, 2, 3]) | ||
|
||
|
||
Alternatively if you want to selected only *valid* keys, the following is idiomatic; furthermore this will preserve the dtype of the selection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to select
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I would also replace "this will preserve the dtype of the selection" with "this is more efficient, and is guaranteed to preserve the dtype of the selection"
I think it is OK, just the two sentences commented above and the "Previously Behavior" sound a bit weird. |
ok pushed. @jorisvandenbossche @TomAugspurger if you'd have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One concern I just realized is we may be stranding users with duplicates in their index.
In [15]: s = pd.Series(1, index=['a', 'a', 'b', 'c'])
In [16]: s.loc[['c', 'd']]
Out[16]:
c 1.0
d NaN
dtype: float64
In [17]: s.reindex(['c', 'd'])
...
ValueError: cannot reindex from a duplicate axis
I haven't kept up with whether reindexing into a Index with dupes is supposed to work on not...
doc/source/indexing.rst
Outdated
|
||
.. warning:: | ||
|
||
Starting in 0.21.0, using ``.loc`` with a list-like containing missing keys. This is deprecated, in favor of ``.reindex``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove “this”
doc/source/indexing.rst
Outdated
s = Series([1, 2, 3]) | ||
s | ||
|
||
Selection with all keys found is unchanged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this one to the end. Good to have in this section though.
doc/source/indexing.rst
Outdated
Current Behavior | ||
|
||
In [4]: s.loc[[1, 2, 3]] | ||
/Users/jreback/miniconda3/envs/pandas/bin/ipython:1: FutureWarning: passing list-likes to .loc with any non-matching elements will raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could remove most of your user path here
doc/source/whatsnew/v0.21.0.txt
Outdated
3 NaN | ||
dtype: float64 | ||
|
||
The idiomatic way to achieve selecting potentially not-found elmenents if via ``.reindex()`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If -> is
Indeed, it can't work in general - assume you had done So maybe the |
(Edited my previous comment, which in that form didn't make any sense) |
changed; that is quite idiomatic |
fixed according to @TomAugspurger and @toobaz comments. |
(Will take a look at this tomorrow evening) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only did a new pass through the docs, will look at the rest tomorrow.
doc/source/indexing.rst
Outdated
@@ -333,8 +333,15 @@ Selection By Label | |||
|
|||
dfl.loc['20130102':'20130104'] | |||
|
|||
.. warning:: | |||
|
|||
Starting in 0.21.0, pandas will show a ``FutureWarning`` if indexing with a list-of-lables and not ALL labels are present. In the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lables -> labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still al 'lables' -> labels (the first occurrence of the word)
doc/source/indexing.rst
Outdated
.. warning:: | ||
|
||
Starting in 0.21.0, pandas will show a ``FutureWarning`` if indexing with a list-of-lables and not ALL labels are present. In the future | ||
this will raise a ``KeyError``. See :ref:`list-like Using loc with missing keys in a list is Deprecated <indexing.deprecate_loc_reindex_listlike>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: Using -> using (a capital in the middle of the sentence is a bit strange)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the text within the ref does not read as grammatically correct. Maybe the 'list-like' needs to be removed? (list is mentioned later in the sentence)
doc/source/indexing.rst
Outdated
|
||
.. warning:: | ||
|
||
Starting in 0.21.0, using ``.loc`` with a list-like containing missing key, is deprecated, in favor of ``.reindex``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not only loc
, also []
doc/source/indexing.rst
Outdated
|
||
.. ipython:: python | ||
|
||
s = Series([1, 2, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one
doc/source/indexing.rst
Outdated
|
||
.. ipython:: python | ||
|
||
s.loc[s.index & ['c', 'd']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeating my comment:
- I would use
intersection
instead of&
- this is not a correct example (the alternative you provide does not give the same result as the deprecated one. As I say above I am not sure if there is actually an alternative, but if there is not, we should just remove this example)
doc/source/whatsnew/v0.21.0.txt
Outdated
|
||
Previously, selecting at least 1 valid key with a list-like indexer would succeed and return ``NaN`` for non-found elements. | ||
This is exactly the function of ``.reindex()``. This will now show a ``FutureWarning`` message; in the future this will raise ``KeyError`` (:issue:`15747`). | ||
This warning will trigger on a ``DataFrame`` or a ``Series`` for using ``.loc[]`` on the Index, or ``[[]]`` on a ``Series``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove on the Index
, as you don't call loc
on the index.
Also, maybe we could just say that it is triggered for both .loc[]
and []
(without needing to make the distinction of dataframe and series. Could mention after that that []
on dataframe already raises KeyError now)
pandas/core/indexing.py
Outdated
# code, so we want to avoid warning & then | ||
# just raising | ||
_missing_key_warning = textwrap.dedent(""" | ||
Passing list-likes to .loc with any non-matching labels will raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not only .loc
, also []
(now you get this error message also when using s[]
, which will be confusing)
updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good!
Added another bunch of doc comments.
Also, there are still two warnings in the docs (I didn't check whether you just need to add :okwarning:
or rather need to update the example):
>>>-------------------------------------------------------------------------
Warning in /tmp/doc/source/advanced.rst at block ending on line 1015
Specify :okwarning: as an option in the ipython:: block to suppress this message
----------------------------------------------------------------------------
/home/travis/build/pandas-dev/pandas/pandas/core/series.py:694: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
return self.loc[key]
<<<-------------------------------------------------------------------------
>>>-------------------------------------------------------------------------
Warning in /tmp/doc/source/indexing.rst at block ending on line 963
Specify :okwarning: as an option in the ipython:: block to suppress this message
----------------------------------------------------------------------------
/home/travis/build/pandas-dev/pandas/pandas/core/series.py:694: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
return self.loc[key]
<<<-------------------------------------------------------------------------
doc/source/indexing.rst
Outdated
@@ -333,8 +333,15 @@ Selection By Label | |||
|
|||
dfl.loc['20130102':'20130104'] | |||
|
|||
.. warning:: | |||
|
|||
Starting in 0.21.0, pandas will show a ``FutureWarning`` if indexing with a list-of-lables and not ALL labels are present. In the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still al 'lables' -> labels (the first occurrence of the word)
doc/source/indexing.rst
Outdated
.. warning:: | ||
|
||
Starting in 0.21.0, pandas will show a ``FutureWarning`` if indexing with a list-of-lables and not ALL labels are present. In the future | ||
this will raise a ``KeyError``. See :ref:`list-like Using loc with missing keys in a list is Deprecated <indexing.deprecate_loc_reindex_listlike>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the text within the ref does not read as grammatically correct. Maybe the 'list-like' needs to be removed? (list is mentioned later in the sentence)
doc/source/indexing.rst
Outdated
|
||
Current Behavior | ||
|
||
In [4]: s.loc[[1, 2, 3]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing .. code-block:: ipython
doc/source/indexing.rst
Outdated
s = pd.Series(np.arange(4), index=['a', 'a', 'b', 'c']) | ||
labels = ['c', 'd'] | ||
|
||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python -> ipython
doc/source/indexing.rst
Outdated
|
||
However, this would *still* raise if your resulting index is duplicated. | ||
|
||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python -> ipython
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -268,6 +268,63 @@ We have updated our minimum supported versions of dependencies (:issue:`15206`, | |||
| Bottleneck | 1.0.0 | | | |||
+--------------+-----------------+----------+ | |||
|
|||
.. _whatsnew_0210.api_breaking.loc: | |||
|
|||
Indexing with missing list-of-labels is Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "Indexing with list with missing labels .. " ? (I find the 'missing list-of-labels' a bit confusing)
doc/source/whatsnew/v0.21.0.txt
Outdated
|
||
Current Behavior | ||
|
||
In [4]: s.loc[[1, 2, 3]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing .. code-block:: ipython
pandas/io/formats/excel.py
Outdated
|
||
# all missing, raise | ||
if not len(Index(cols) & df.columns): | ||
raise KeyError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you provide an error message?
pandas/io/formats/excel.py
Outdated
# 1 missing is ok | ||
# TODO(jreback) this should raise | ||
# on *any* missing columns | ||
self.df = df.reindex(columns=cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indeed currently works, but I think it is certainly OK to change that. Or maybe also raise futurewarning as with loc?
push to fix all @jorisvandenbossche comments. I deprecated passing 1 or more missing |
a4fb72c
to
db52a4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment about wording in error message. For the rest good to merge!
pandas/io/formats/excel.py
Outdated
warnings.warn( | ||
"columns must be a subset of the " | ||
"dataframe columns; this will raise " | ||
"a KeyError in the future", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you only see that error message, it is not directly clear that on the the names you passed in columns
is missing (you directly say the result). Can you clarify that?
Maybe like "Not all names specified in 'columns' are found(or present); this will raise a KeyError in the future"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
ok pushed |
… list are selected via .loc closes pandas-dev#15747
This is really nice, thanks @jreback ! |
… list are selected via .loc (pandas-dev#17295) closes pandas-dev#15747
… list are selected via .loc (pandas-dev#17295) closes pandas-dev#15747
… list are selected via .loc (pandas-dev#17295) closes pandas-dev#15747
closes #15747