Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecating Series.argmin and Series.argmax (#16830) #16955

Merged
merged 13 commits into from
Sep 27, 2017

Conversation

lphk92
Copy link
Contributor

@lphk92 lphk92 commented Jul 15, 2017

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

This is one step toward closing issue #16830. After this, I will post another pull request containing the code to implement the expected behavior of argmax and argmin for both Series and DataFrame.

@gfyoung
Copy link
Member

gfyoung commented Jul 15, 2017

After this, I will post another pull request containing the code to implement the expected behavior of argmax and argmin for both Series and DataFrame.

Do note that that PR will not be merged for some time because we need at least one major release of time before we change behavior like that.

@gfyoung
Copy link
Member

gfyoung commented Jul 15, 2017

Also, you should add tests to make sure that these warnings get issued (we should have tests already for argmin and argmax in the code-base).

@lphk92
Copy link
Contributor Author

lphk92 commented Jul 15, 2017

Also, you should add tests to make sure that these warnings get issued (we should have tests already for argmin and argmax in the code-base).

Done

Do note that that PR will not be merged for some time because we need at least one major release of time before we change behavior like that.

Yea I figured that would be the case. If it's okay with you I'll still put up the PR so that the code is there and ready to be rebased when the time comes.

@gfyoung
Copy link
Member

gfyoung commented Jul 15, 2017

If it's okay with you I'll still put up the PR so that the code is there and ready to be rebased when the time comes.

That works, though it will be up to you primarily to keep it rebased and merge-able in the interim. We can perhaps make reference to the PR in our deprecations tracker.

result = np.argmin(Series(data))
assert result == np.argmin(data)

with pytest.warns(FutureWarning):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use tm.assert_produces_warning(FutureWarning). We don't use pytest warning context manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gfyoung gfyoung added 2/3 Compat API Design Deprecate Functionality to remove in pandas and removed 2/3 Compat labels Jul 15, 2017
@codecov
Copy link

codecov bot commented Jul 15, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@2cd85ca). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #16955   +/-   ##
=========================================
  Coverage          ?   90.99%           
=========================================
  Files             ?      161           
  Lines             ?    49288           
  Branches          ?        0           
=========================================
  Hits              ?    44849           
  Misses            ?     4439           
  Partials          ?        0
Flag Coverage Δ
#multiple 88.76% <100%> (?)
#single 40.2% <100%> (?)
Impacted Files Coverage Δ
pandas/core/series.py 94.89% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cd85ca...2b7ebbc. Read the comment docs.

@codecov
Copy link

codecov bot commented Jul 15, 2017

Codecov Report

Merging #16955 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16955      +/-   ##
==========================================
- Coverage   91.25%   91.24%   -0.02%     
==========================================
  Files         163      163              
  Lines       49808    49810       +2     
==========================================
- Hits        45454    45447       -7     
- Misses       4354     4363       +9
Flag Coverage Δ
#multiple 89.03% <100%> (ø) ⬆️
#single 40.32% <80%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 94.93% <100%> (ø) ⬆️
pandas/util/_decorators.py 78% <100%> (ø) ⬆️
pandas/io/formats/format.py 96.07% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.1%) ⬇️
pandas/core/generic.py 92.07% <0%> (ø) ⬆️
pandas/core/indexes/base.py 96.29% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0fe5cc...426d8eb. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Jul 15, 2017

looks like #16964 covers this.

@jreback jreback closed this Jul 15, 2017
@jreback
Copy link
Contributor

jreback commented Jul 16, 2017

@lphk92 hmm. So the question is to we just to a public change that may break code, or deprecate it for a cycle, and then the change which may break code. In the past what we have done is introduced a new method, e.g. was the case with .sort() -> .sort_values() so we side-stepped the issue.

But here we want to use argmin/max.

Any thoughts @TomAugspurger @jorisvandenbossche @gfyoung

@TomAugspurger
Copy link
Contributor

I think merge this for 0.21 and #16964 for 0.22

@TomAugspurger TomAugspurger reopened this Jul 16, 2017
@gfyoung
Copy link
Member

gfyoung commented Jul 16, 2017

@jreback : The PR description above outlines @lphk92 's plan of action. Deprecate for this version and modify in the subsequent one.

I agree with this and therefore am onboard with @TomAugspurger here.

@gfyoung
Copy link
Member

gfyoung commented Jul 16, 2017

I think merge this for 0.21 and #16964 for 0.22

Is there going to be a 0.22? I thought we were going straight to 1.0.

@gfyoung
Copy link
Member

gfyoung commented Jul 16, 2017

Not sure yet why Appveyor isn't catching the warning, especially the first time around. Also, this error seems persistent because it occurred twice now.

My first suspicion is that there are other places where argmin and argmax are called with Series that are not being caught (trying searching for the terms via GitHub search in this repository). I think there might be some based on what I saw.

@TomAugspurger
Copy link
Contributor

One thing I didn't really appreciate yesterday, is this will show a warning even for np.argmin(Series), the correct way to do things now. We'll need to sort that out... somehow.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. ping on green.

result = np.argmin(Series(data))
assert result == np.argmin(data)

with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue deprecation issue as a comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this as a separate function (these deprecation tests)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all functions in this test will cause the deprecation warning, if I break them into a separate test, what should be left in here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing much, maybe even rename the test a bit to reflect what you are testing

test_numpy_argmin_deprecated or somesuch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue deprecation issue as a comment

@lphk92 : you should also address this comment from @jreback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung I thought that was what I was doing by adding the comments stating that the deprecation warning was also occurring in np.argmax. Could you clarify where you would like a comment added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What he means is just reference the issue number beneath the function definition e.g. "see gh-16830"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh gotcha, thanks for the clarification. I will add that in the next commit.

@@ -116,6 +116,8 @@ Other API Changes
Deprecations
~~~~~~~~~~~~
- :func:`read_excel()` has deprecated ``sheetname`` in favor of ``sheet_name`` for consistency with ``.to_excel()`` (:issue:`10559`).
- :method:`Series.argmax` has been deprecated in favor of :method:`Series.idxmax` (:issue:`16830`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use :func: instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Note that in 'theory', :method: is more correct here (as idxmax is a method, not a function), but in practice both work, so it is not that important

data = np.random.randint(0, 11, size=10)
result = np.argmax(Series(data))
assert result == np.argmax(data)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


with tm.assert_produces_warning(FutureWarning):
# argmin is aliased to idxmin
Series(data).argmin()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test on the result as well (and same for argmin)

@TomAugspurger
Copy link
Contributor

Just to make sure this doesn't get lost, there's an issue with this current approach, as np.argmin(arr), etc. will dispatch to arr.argmin, which warns immediately.

In [1]: import pandas as pd; import numpy as np
s = ^[[A
In [2]: s = pd.Series([1, 2])

In [3]: s
Out[3]:
0    1
1    2
dtype: int64

In [4]: np.argmin(s)
/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/numpy/core/fromnumeric.py:57: FutureWarning: argmin is deprecated. Use idxmin instead
  return getattr(obj, method)(*args, **kwds)
Out[4]: 0

These functions do take *args and **kwargs, so we could slip in a __pandas__deprecated kwargs, to control whether or not to warn. Something like

diff --git a/pandas/core/series.py b/pandas/core/series.py
index 5294031be..cfda05486 100644
--- a/pandas/core/series.py
+++ b/pandas/core/series.py
@@ -1287,6 +1287,8 @@ class Series(base.IndexOpsMixin, strings.StringAccessorMixin,
         DataFrame.idxmax
         numpy.ndarray.argmax
         """
+        if '__pandas_deprecated__' in kwargs:
+            warnings.warn(FutureWarning, 'message')
         skipna = nv.validate_argmax_with_skipna(skipna, args, kwargs)
         i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
         if i == -1:
@@ -1294,7 +1296,10 @@ class Series(base.IndexOpsMixin, strings.StringAccessorMixin,
         return self.index[i]
 
     # ndarray compat
-    argmin = deprecate('argmin', idxmin)
+    def argmin(self, axis=None, skipna=True, *args, **kwargs):
+        return self.idxmin(axis=None, skipna=skipna, __pandas__deprecated=True,
+                           *args, **kwargs)
+
     argmax = deprecate('argmax', idxmax)
 
     def round(self, decimals=0, *args, **kwargs):

@lphk92 currently this fails in validate_kwargs in pandas/pandas/util/_validators.py, but you may be able to get it to work. Mind trying it out?

@TomAugspurger
Copy link
Contributor

And then your test should assert that np.argmin(series) does not emit a warning.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2017

@TomAugspurger

Just to make sure this doesn't get lost, there's an issue with this current approach, as np.argmin(arr), etc. will dispatch to arr.argmin, which warns immediately.

I don't see a problem with this. I would merge this PR as is.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 16, 2017 via email

@gfyoung
Copy link
Member

gfyoung commented Jul 16, 2017

We don't want to warn when the user is correctly using argmin. If you prefer we could only warn once, with an option to always disable the warning.

I agree with @TomAugspurger that we should avoid this. However, the warning occurs because np.argmin calls the argmin attribute of the object if that exists. Thus, the only way to avoid it is to remove the argmin method completely...

Perhaps we should update the message to say something about argmin being called from numpy is okay. That's the only thing that comes to mind ATM.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 16, 2017 via email

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. not sure why this is failing. pls rebase.

- ``pd.options.html.border`` has been deprecated in favor of ``pd.options.display.html.border`` (:issue:`15793`).

- :func:`SeriesGroupBy.nth` has deprecated ``True`` in favor of ``'all'`` for its kwarg ``dropna`` (:issue:`11038`).

Series.argmax and Series.argmin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a ref to this subsection. make the title something like

Series.argmin/max are deprecated.

we've deprecated the current behavior of :func:`Series.argmax` and
:func:`Series.argmin`. Using either of these will emit a ``FutureWarning``.

If you were using ``Series.argmin`` and ``Series.argmax``, please switch to using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove these last 2 paragraphs. keep it short and sweet.

@TomAugspurger
Copy link
Contributor

OK, something very strange was going on with the tests. I could only reproduce with python 2.7 when running the sparse tests first:

pytest pandas/tests/sparse/test_series.py::TestSparseSeriesAnalytics::test_deprecated_numpy_func_call  pandas/tests/series/test_analytics.py::TestSeriesAnalytics::test_numpy_argmin_deprecated --tb=short

Running the tests/series/test_analytics.py and then the sparse was fine. Hopefully my last commit makes it work either way.

There may be some warnings to cleanup to. Looking into those now.

@TomAugspurger
Copy link
Contributor

Still failed, not sure why. I'm ok with skipping those tests python 2 since

  1. it's working
  2. this will be removed in the next version

data = np.random.randint(0, 11, size=10)
result = np.argmin(Series(data))
assert result == np.argmin(data)
@pytest.mark.skipif(PY2, reason="Buggy assertion on warning")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually having to do this means that you are NOT catching the actual assertion somewhere else. IOW there is another place where it IS showing the warning, but its not being caught.

@jorisvandenbossche
Copy link
Member

If you look at the travis log, there are quite some places in other tests that raise this warning, and which probably should be fixed

@jorisvandenbossche
Copy link
Member

In series/test_operators.py, there are test_assert_argminmax_raises and test_argminmax_with_inf. But should those just catch the warning, or should they be changed to test idxmin/idxmax instead ?

@TomAugspurger
Copy link
Contributor

Ok, the tets are (finally) all green. We used argmax internally in https://github.com/pandas-dev/pandas/pull/16955/files#diff-db91df05b120c5eb39d90899b54b5321R348 when we wanted idxmax.

I found one more warning from a test where I failed to switch to idxmax. Merging later today if it's green.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Just two minor comments

# until the implemention of Series.argmin is corrected.
result = np.argmin(s)

assert result == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you asserting here directly to 0, and below to the result of expected = s.idxmin(). Both can use the same expected value? (I see in the recent commits you changed this)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed them all to compare directly to scalars. I think that's preferable.

# See gh-16830
data = np.arange(10)

s = Series(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was like this before, but it would actually be good to use eg index=np.arange(1, 11), just to have a distinction between positional and label based argmin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you changed it correctly. It's the index of the Series that should be changed, not the actual data. Because now the tests should fail (the argmin should still be 0 (first location or first index label), while you changed it to assert to 1)

@@ -599,7 +599,7 @@ def to_string(self):
else: # max_cols == 0. Try to fit frame to terminal
text = self.adj.adjoin(1, *strcols).split('\n')
row_lens = Series(text).apply(len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this just be

max_len = Series(text).str.len().max() ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests seem to pass locally with this.

@jreback
Copy link
Contributor

jreback commented Sep 26, 2017

minor comment, otherwise lgtm.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 26, 2017 via email

@TomAugspurger
Copy link
Contributor

All green, and no new warnings in the summary. Merging.

Thanks @lphk92!

@TomAugspurger TomAugspurger merged commit f9d88cd into pandas-dev:master Sep 27, 2017
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
…s-dev#16955)

* Deprecating Series.argmin and Series.argmax (pandas-dev#16830)

Added statements about correcting behavior in future commit

Add reference to github ticket

Fixing placement of github comment

Made test code more explicit

Fixing unrelated tests that are also throwing warnings

Updating whatsnew to give more detail about deprecation

Fixing whatsnew and breaking out tests to catch warnings

Additional comments and more concise whatsnew

Updating deprecate decorator to support custom message

DOC: Update docstrings, depr message, and whatsnew

* Added debug prints

* Try splitting the filters

* Reword whatsnew

* Change sparse series test

* Skip on py2

* Change to idxmin

* Remove py2 skips

* Catch more warnings

* Final switch to idxmax

* Consistent tests, refactor to_string

* Fixed tests
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
…s-dev#16955)

* Deprecating Series.argmin and Series.argmax (pandas-dev#16830)

Added statements about correcting behavior in future commit

Add reference to github ticket

Fixing placement of github comment

Made test code more explicit

Fixing unrelated tests that are also throwing warnings

Updating whatsnew to give more detail about deprecation

Fixing whatsnew and breaking out tests to catch warnings

Additional comments and more concise whatsnew

Updating deprecate decorator to support custom message

DOC: Update docstrings, depr message, and whatsnew

* Added debug prints

* Try splitting the filters

* Reword whatsnew

* Change sparse series test

* Skip on py2

* Change to idxmin

* Remove py2 skips

* Catch more warnings

* Final switch to idxmax

* Consistent tests, refactor to_string

* Fixed tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants