Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argmax behaves differently from other higher-order functions #48502

Open
timholy opened this issue Feb 2, 2023 · 14 comments
Open

argmax behaves differently from other higher-order functions #48502

timholy opened this issue Feb 2, 2023 · 14 comments
Milestone

Comments

@timholy
Copy link
Member

timholy commented Feb 2, 2023

argmax has two meanings:

Both of these are useful and well-defined operations. But they should be considered with respect to Julia's overall consistency: it's worth asking how argmax(f, itr) compares to other such functions that accept a "mapping" function f as an input. For example, sum(f, itr) computes sum(f(x) for x in itr), maximum(f, itr) computes maximum(f(x) for x in itr), and so on. One simple consequence is that these functions have a very natural property: sum(identity, itr) == sum(itr).

So, is argmax(identity, a) == argmax(a)?

julia> argmax(identity, [10, 30, 20])
30

julia> argmax([10, 30, 20])
2

Nope. How about "is argmax(f, itr) equivalent to argmax(f(x) for x in itr)"?

julia> a = [1, 5, NaN, 3]
4-element Vector{Float64}:
   1.0
   5.0
 NaN
   3.0

julia> f(x) =  isnan(x) ? oftype(x, -Inf) : x   # use a sentinel value for NaN

julia> argmax(f, a)
5.0

Again nope: if it did, it would return the index i such that a[i] == maximum(f, a).

Personally, I think this is a footgun. If we ever decide to go to Julia 2.0, I think we should reconsider whether these two methods belong to the same function.

One potential solution is that we could have argmax(itr) become synonymous with argmax(identity, itr) and embrace the newer meaning of argmax. Then, noting that findmax(f, a) does act in a manner consistent with other mapping functions, we could just eliminate the original meaning of argmax, as argmax seems to sometimes be used as a substitute for findmax(A)[2]. But I'm less fond of having to remember whether the index is the first or second output; hence we could switch to having it return a NamedTuple (we might be able to do that even now).

@timholy timholy added this to the 2.0 milestone Feb 2, 2023
@oscardssmith
Copy link
Member

IMO, the correct 2.0 solution is getting rid of all the other ones like sum(f, x) which only exist because we don't currently have a good way of doing reductions generically.

@timholy
Copy link
Member Author

timholy commented Feb 2, 2023

AFAICT it's just a specialization bottleneck. We could fix that today, but the cost is longer compile times.

@timholy
Copy link
Member Author

timholy commented Feb 2, 2023

But that wouldn't change the fact that argmax means two different things. Are you saying that if we got rid of the others, then the disconnect isn't so bad? It's certainly worse because of the analogy to other higher-order functions, but I'm not sure it's great even if we do follow your proposal.

@jakobnissen
Copy link
Contributor

In my opinion, the methods like sum(f, x) and maximum(f, x) are unnecessary. They are using valuable methods just as syntactical sugar for map+sum and map+maximum, respectively. As I see it, the mapping operation is orthogonal, and need not be extra methods of all these higher order functions.

@timholy
Copy link
Member Author

timholy commented Feb 2, 2023

@jakobnissen, see #48502 (comment)

@BioTurboNick
Copy link
Contributor

BioTurboNick commented Feb 2, 2023

Worth linking the previous related discussions?
#27613
#39203

@Seelengrab
Copy link
Contributor

Seelengrab commented Feb 2, 2023

Also #41339 with some summarized discussion of the previous discussions

@timholy
Copy link
Member Author

timholy commented Feb 2, 2023

Given that the point made by @oscardssmith and @jakobnissen is natural but misses the main point, maybe it's worth thinking about this way: how would you express the 1-arg method in terms of the 2-arg method? The answer is

argmax(itr) = argmax(idx -> itr[idx], CartesianIndices(itr))

Note that the itr moves into the mapping function and only barely contributes to the collection being iterated over. To me this is a clear sign that these are actually two different functions.

@BioTurboNick
Copy link
Contributor

Given that the point made by @oscardssmith and @jakobnissen is natural but misses the main point, maybe it's worth thinking about this way: how would you express the 1-arg method in terms of the 2-arg method? The answer is

argmax(itr) = argmax(idx -> itr[idx], CartesianIndices(itr))

Note that the itr moves into the mapping function and only barely contributes to the collection being iterated over. To me this is a clear sign that these are actually two different functions.

I suggested something similar, though slightly less general, in one of those posts.

@aplavin
Copy link
Contributor

aplavin commented Feb 2, 2023

That's totally a footgun!
I also tried to convince others not to introduce argmax(f, A) but failed. See a long discussion at https://discourse.julialang.org/t/findmax-and-friends-confusing-behaviour-to-be-introduced-in-1-7/61904.
There's direct empirical evidence that it's a footgun: even popular packages sometimes implement argmax wrong, even after almost two years of argmax(f, A) - see the end of that discourse thread.

If starting from scratch: an intuitive, unambiguous, and consistent interface could be indmax([f], X)/keymax to return the index of the maximum, and maximum(A, [by=f]) to return the maximizing element. Leaving argmax to specialized optimization/minimization packages.

@JeffBezanson
Copy link
Member

This has been discussed many times. I would summarize as

  • argmax is defined as an operation on a function, not a collection/set, and argmax(f, s) matches the mathematical definition, so it should stay
  • sum(f, s) etc. are non-standard and are hacks, those were mistakes
  • Adding indmax seems like a good compromise; there is no particular downside except "too many functions"

@timholy
Copy link
Member Author

timholy commented Feb 7, 2023

Agreed that argmax(f, s) is the better definition, even though it was added later.

@MasonProtter
Copy link
Contributor

MasonProtter commented May 31, 2023

I'm really surprised to see people complain about sum(f, itr). I think that functions like that are fantastic because new users typically don't understand what the hell map, reduce, foldl and filter are and it takes a surprisingly long time to wrap ones head around them. Supplying sum(f, itr) is a really nice stepping stone IMO.

But even as an experienced user I find those functions really convenient and natural.

@jariji
Copy link
Contributor

jariji commented May 31, 2023

@MasonProtter If Julia had concise reduction syntax, it could be easier to use even than sum(f, itr). But I don't want to derail the thread - happy to continue elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants