Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find Julep: issue with sentinel values #47

Closed
GregPlowman opened this issue Nov 13, 2017 · 26 comments
Closed

Find Julep: issue with sentinel values #47

GregPlowman opened this issue Nov 13, 2017 · 26 comments
Labels

Comments

@GregPlowman
Copy link

Originally posted as comment on commit, suggested to open issue instead.
cc @nalimilan

Find Julep

Extract from section Issues Beyond the Scope of This Julep

  • Sentinel values in a world where array indices do not necessarily start with 1:
    • findfirst(x, v) returns 0 if no value matching v is found; however, if x allows 0 as an index, the meaning of 0 is ambiguous. One could return typemin(Int) or minimum(linearindices(x))-1, but what if x starts indexing at typemin(Int)?
    • No matter [what] sentinel value gets returned, the deprecation strategy here is delicate. There may be a lot of code that checks the return value and compares it to 0.

It would be nice if supporting non-1-based indices was in the scope of this Julep, and it might be better to address non-1-based arrays before their use becomes more widespread.

Note that using first(linearindices(A))-1 as a sentinel value would be non-breaking for standard 1-based arrays.
If A starts indexing at typemin(Int), then returned sentinel value would wrap-around to typemax(Int) (i.e. no error).

The calling code could then test for the sentinel value in the same way:

i = findfirst(!iszero, A)
if i != first(linearindices(A))-1
  ...
end

To simplify, could introduce new function sentinelindex, and use this consistently instead:

sentinelindex(A) = first(linearindices(A))-1
i = findfirst(!iszero, A)
if i != sentinelindex(A)
  ...
end
@nalimilan
Copy link
Member

@timholy @StefanKarpinski Do you think that's something we should consider for 0.7?

@ararslan ararslan added the Find label Nov 22, 2017
@ararslan
Copy link
Member

It would likely be a painful deprecation, but with efficient unions we could have findfirst return nothing when no match is found.

@Sacha0
Copy link
Member

Sacha0 commented Nov 22, 2017

Ref. JuliaLang/julia#24673 (comment). Best!

@nalimilan
Copy link
Member

@timholy Do you want this in 1.0? Now is the last chance to push for this change. ;-)

@StefanKarpinski
Copy link
Member

It does seem like it may be a better approach.

@timholy
Copy link
Member

timholy commented Jan 5, 2018

Yes, it would be much better not to use 0 as a sentinel value. Returning nothing has a lot of merit because with existing code you'll get errors rather than wrong values.

@nalimilan
Copy link
Member

@timholy Would you have time to make a pull request? I'm pretty swamped with my 1.0 items already (but I can help reviewing it).

BTW, something I realized is that for AbstractDict and other types whose indices/keys can be of type Nothing, returning nothing is ambiguous. I think that's fine since in most cases the key type does not contain Nothing, but we could support e.g. findfirst(Some, pred, dict), in which case the returned value would be Union{Some{T}, Nothing}.

@timholy
Copy link
Member

timholy commented Jan 8, 2018

Yes, now that I reached a fixed point on the broadcasting code, I'll try to get to it.

I haven't followed find-related issues. What's the meaning of the 3-argument variant of findfirst? Link?

@nalimilan
Copy link
Member

Yes, now that I reached a fixed point on the broadcasting code, I'll try to get to it.

Great!

I haven't followed find-related issues. What's the meaning of the 3-argument variant of findfirst? Link?

It doesn't exist yet, that was just a proposal for how we could handle dictionaries whose key type includes Nothing. But that's for post-1.0 I think.

@andyferris
Copy link
Member

So... I feel a little heretical - but isn't Nullable the exact thing you want for when you might expect not to get a valid return in some cases? Of course, there's similar approaches such as a pair of results or the Union{Some{T}, Nothing} approach. (Did Some end up in Base?)

It seems a little dodgy to not allow for certain keys for generic indexables in the generic case (there's also a sentinel Symbol used in Base). I think this is the kind of situation where you want to follow some solid software engineering practices, and use Some or Nullable.

@StefanKarpinski
Copy link
Member

Perhaps, but that complicates the API considerably and it's hard to imagine a case where nothing would be a valid or useful index value for an array-like object. Sure, it could for something like a Dict where keys can be anything, but I think that designing these APIs for the most generic possible structure will make us crazy.

@jrevels
Copy link
Member

jrevels commented Jan 9, 2018

In similar cases in my own code, I've taken to creating a singleton type for sentinel values (e.g. struct NotFound end). It's totally unambiguous, since the only intended use for the instance of the singleton type is as the sentinel value.

@StefanKarpinski
Copy link
Member

It's only unambiguous because you don't use NotFound objects as keys. We can't guarantee that for Dicts for example.

@vtjnash
Copy link
Member

vtjnash commented Jan 9, 2018

I wonder if there's some parallel to get here, where I think we will want to eventually have:

get(Dict, Key, Default) = Value || Default
get(Dict, Key) = Value || throw(Error)
get?(Dict, Key) = Some(Value) || nothing

@StefanKarpinski
Copy link
Member

I think there's a somewhat fundamental split between containers like arrays where the keys are known to be something simple like an Int where lookup is quite fast. Not only does one want a simpler sentinel value for such structures, but they're also the kinds of structures where you may just want to return the key since getting the associated value is quite efficient. For more complex structures, the key can be anything and lookup may be non-trivial. In such cases, not only might you want to distinguish nothing as a key from nothing as a sentinel, but you also may want to either return key, value pairs since lookup is expensive or return some kind of token object for which lookup is more efficient than for keys.

@jrevels
Copy link
Member

jrevels commented Jan 9, 2018

It's only unambiguous because you don't use NotFound objects as keys. We can't guarantee that for Dicts for example.

I mean that it's unambiguous from an API standpoint - the singleton/sentinel type only has a single purpose/use case that it is explicitly defined for.

The ambiguity problem arises when a user would be using a specific value for their own purpose, and then your API claims that value as a sentinel value. It seems to me that a commonly punned-upon singleton value like nothing would be prone to this issue. If your API just defines its own singleton type for the sentinel value, however, this problem mostly goes away.

Of course, you can't force the user not to abuse your API's singleton/sentinel value, but IMO it'd be far less likely to get abused than something like nothing.

@ararslan
Copy link
Member

ararslan commented Jan 9, 2018

Would it be incredibly annoying to deprecate the use of nothing as an index and require it to be Some(nothing)?

@StefanKarpinski
Copy link
Member

I mean that it's unambiguous from an API standpoint - the singleton/sentinel type only has a single purpose/use case that it is explicitly defined for.

Yes, but it's not uncommon for very generic code to put completely arbitrary objects into dictionaries that may have been arrived at by non-explicit means like reflection on the program state. You cannot guarantee that a NotFound object will never be used as a key in such situations unless you disallow using it as a key (in which case you have to treat it as a special case in such generic code).

@jrevels
Copy link
Member

jrevels commented Jan 9, 2018

it's not uncommon for very generic code to put completely arbitrary objects into dictionaries that may have been arrived at by non-explicit means like reflection on the program state.

This is a well-articulated point, and I agree that there could be sentinel-value-dependent bugs in generic code regardless of which sentinel value an API selects.

Generic code eventually boils down to type-dependent code, though, and at that point, it seems to me that a API-specific singleton sentinel value (e.g. NotFound()) is more likely than alternatives (e.g. nothing) to cause an error when misused, rather than silently take the wrong (EDIT: "unintended" is probably a better word) code path. Specifically, it seems that NotFound() would arrive at a MethodError earlier than nothing would, since there would likely be many more methods defined on Nothing than NotFound in general.

For the record, I'm not actually very opposed to nothing as a sentinel value - in practice, it'd probably be fine either way - I just thought the issue merited consideration.

@timholy
Copy link
Member

timholy commented Jan 9, 2018

Definitely merits consideration. But I think that no matter what you picked, you could get the "disallowed" value used as a key if we support enough container types. All it takes is this:

found_keys = OrderedSet()
i = findifrst(f, container)      # if i is NotFound() this will later lead to trouble
push!(found_keys, i)
...
for k in found_keys
    kindex = findfirst(iequal(k), found_keys)   # ambiguous

@timholy
Copy link
Member

timholy commented Jan 9, 2018

I wonder if there's some parallel to get here

In #25472 I initially tried adding an optional senintel argument to find* functions. Obviously the sentinel would have to be untyped. The thing that made me back off of that was the scary ambiguity situation. If we wanted to specialize findnext on, say CartesianIndex keys, having a sentinel greatly increases the number of stupid versions of the functions you have to add to resolve ambiguities.

@jrevels
Copy link
Member

jrevels commented Jan 9, 2018

I don't think that - and am not trying to argue that - the chosen sentinel value should/could be disallowed as an element/key of arbitrary containers. I realize now that that's been the central point of contention here, so I should have clarified that earlier, my bad. That wasn't even the kind of situation I was thinking about.

Rather, I was thinking of the problems that arise in situations like:

result = findfirst(f, container)
if some_application_specific_predicate(result)
    do_something(result, container)
end 

where some_application_specific_predicate and do_something could contain generically written business logic. If a sentinel-value-related bug happened to be encountered internally to these functions, it seems likely that NotFound() would result in a MethodError more quickly/often than nothing would, since nothing has more uses/responsibilities than NotFound() does (i.e. has more methods directly defined on it).

@nalimilan
Copy link
Member

If a sentinel-value-related bug happened to be encountered internally to these functions, it seems likely that NotFound() would result in a MethodError more quickly/often than nothing would, since nothing has more uses/responsibilities than NotFound() does (i.e. has more methods directly defined on it).

nothing is precisely used in situations where it must not propagate silently already, e.g. as the return value of match or tryparse when they fail. Very few methods are defined on ::Nothing for this reason. I think the only method which specifically accepts it is coalesce, which is specially designed for this case.

@jrevels
Copy link
Member

jrevels commented Jan 9, 2018

True. I feel that it's not too uncommon for external packages to adopt nothing for their own sentinel value purposes, in which case that's where the ambiguities would likely come from, rather than from Base itself. However, I could easily be wrong on that hunch, and I don't feel like scouring METADATA for proof.

At this point, it seems as though the idea of an API-specific sentinel value has been given (perhaps more than) its fair due of consideration, so I happily concede to the nothing camp :)

@nalimilan
Copy link
Member

Would it be incredibly annoying to deprecate the use of nothing as an index and require it to be Some(nothing)?

@ararslan I think so. People don't necessarily anticipate that their code could be passed nothing by a caller, possibly through several layers. It would be problematic to throw an error when nothing is encountered, as it could happen only at runtime, or when a user tries a very specific thing.

That said, the same problem would arise if we allow nothing as an index, as it could be misinterpreted as indicating the absence of a result, which cannot be detected before runtime. It seems that the only practical way of preventing this kind of problem would be to return Union{Some{K}, Nothing} for AbstractDict{K} where K >: Nothing. This could either happen automatically, or by throwing an error and requiring people to use a specific find* function/method.

@nalimilan
Copy link
Member

I've just realized indexin uses 0 as a sentinel. I guess we should change it to use nothing too? It could also use the same index type as other find* functions for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants