Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename search and find functions? #5664

Closed
johnmyleswhite opened this issue Feb 4, 2014 · 13 comments
Closed

rename search and find functions? #5664

johnmyleswhite opened this issue Feb 4, 2014 · 13 comments
Labels
breaking This change will break code needs decision A decision on this change is needed
Milestone

Comments

@johnmyleswhite
Copy link
Member

To me, this seems like the wrong behavior for searchsortedfirst:

julia> searchsortedfirst([1, 2, 3], 0)
1
@ivarne
Copy link
Sponsor Member

ivarne commented Feb 4, 2014

It seems like the same answer I would give if you asked me instead of Julia

Base.searchsortedfirst(a, x, [by=<transform>,] [lt=<comparison>,] [rev=false])

   Returns the index of the first value in "a" greater than or equal
   to "x", according to the specified order. Returns "length(a)+1"
   if "x" is greater than all values in "a".

1 is the index of 1 which is the first object in the list that is greater than 0.

@johnmyleswhite
Copy link
Member Author

Interesting. I would have assumed searchsortedfirst would produce the equivalent of first(searchsorted(a, x)). Guess that's so easy it's not worth having.

@ivarne
Copy link
Sponsor Member

ivarne commented Feb 4, 2014

I did not assume anything, but was very glad to find that the documentation was consistent with the behaviour.

I still can't seem to get the connection between the name and the behaviour, so that might be something worth a discussion, as this is an exported function in Base. It is not really a plus for a language to have stdlib functions that is confusing with regard to what they do.

@kmsquire
Copy link
Member

kmsquire commented Feb 4, 2014

In fact, searchsorted calls both searchsortedfirst and searchsortedlast (in
an efficient manner).

Stefan had actually suggested not exporting these at one point, but got
some pushback. I think it's because searchsorted always creates a
Range1object, so if you don't need the actual range, these two
functions are more
efficient, and are being used in the wild.

(But if you can suggest better names...)

Kevin

On Mon, Feb 3, 2014 at 5:40 PM, Ivar Nesje notifications@github.com wrote:

I did not assume anything, but was very glad to find that the
documentation was consistent with the behaviour.

I still can't seem to get the connection between the name and the
behaviour, so that might be something worth a discussion, as this is an
exported function in Base. It is not really a plus for a language to have
stdlib functions that is confusing with regard to what they do.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5664#issuecomment-34022543
.

@ivarne
Copy link
Sponsor Member

ivarne commented Feb 4, 2014

I would describe the function's behavior as "index of the first element in a greater than x". What about firstgreaterthan?

@kmsquire
Copy link
Member

kmsquire commented Feb 4, 2014

I would describe the function's behavior as "index of the first element in a greater than x". What about firstgreaterthan?

So, it's actually firstgreaterthanorequal (and lastlessthanorequal). Might work in German. ;-)

@aelg
Copy link

aelg commented Feb 4, 2014

I think the C++ standard library calls these lower_bound and upper_bound (upper_bound returns first greater than, but that is because of indexing style differences in the languages)

@ivarne
Copy link
Sponsor Member

ivarne commented Feb 4, 2014

Sad that passing comparison functions to functions is so slow. After thinking about it, it might be more flexible to have a firstcmp (or just first) function that behaves like this function if you pass >
firstcmp(a,x,<), or just first(A, a -> (a < 3) )

@johnmyleswhite johnmyleswhite reopened this Feb 4, 2014
@johnmyleswhite
Copy link
Member Author

Since people keep discussing this, I reopened it. Here's my two cents.

  • I'm pretty unclear what the boundary between search* and find* prefixed functions is.
  • Following up on @aelg's point, I'd think findlowerbound and findupperbound would be clearer names for this function.

@quinnj
Copy link
Member

quinnj commented Feb 4, 2014

They could also be bsearchfirst and bsearchlast.

@JeffBezanson
Copy link
Sponsor Member

That's a good point. Clearly the find functions grew out of the original find (indexes of nonzeros), which seems to be a pretty horrible waste of that very useful name.

The situation is worse than I thought and it looks like we need some serious renaming here.

The basic functions seem to be find, findall, findfirst, findnext, and findprev. I don't see a reason these shouldn't work on all indexable (and in some cases, iterable) objects. findmin and findmax can be generalized to something like findmost. Then there are couple special functions (e.g. findlowerbound) that require sorted collections. That covers all the generic find and search functions.

We have a slightly odd convention that find(a, b) looks for b inside a, except for predicates, in which case the predicate comes first (probably related to do syntax). We do seem to be consistent about this though.

I found this definition:

find(testf::Function, x) = find(testf(x))

which seems to assume that testf is vectorized over x. This is a good example of non-composability arising from implicit vectorization.

@JeffBezanson
Copy link
Sponsor Member

Also, I advocate designing around the most natural and general versions of these functions, with performance-related specializations tacked on later as needed. For example, the current findin should be written as findall(x->(x in b), a). A specialization for convenience/performance would then be findall_in(a, b). That way you can think in terms of the general, sensible functions, but easily grab specialized versions when they're available.

It then becomes possible to add findfirst_in, etc., which we probably wouldn't do, but at least the name would be obvious if we did.

@JeffBezanson
Copy link
Sponsor Member

Closing in favor of #10593.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

7 participants