Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get(coll, key) for Associative returning Nullable #18211

Closed
wants to merge 1 commit into from

Conversation

dbeach24
Copy link
Contributor

This returns a Nullable value corresponding to the key (or null).
Includes specialization for dictionaries.
Fixes #13055

@@ -6,6 +6,13 @@ const secret_table_token = :__c782dbf1cf4d6a2e5e3865d7e95634f2e09b5902__

haskey(d::Associative, k) = in(k,keys(d))

"""
getnull(d::Associative, key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if exported, this should also go in the rst docs - see CONTRIBUTING.md for full explanation, but you put the signature there then run make docs and the docstring content should get populated over.

@vtjnash
Copy link
Member

vtjnash commented Aug 24, 2016

would the names tryget or trygetindex be better than getnull (e.g. like tryparse)?

@vtjnash
Copy link
Member

vtjnash commented Aug 24, 2016

or even just define this is 2-argument get?

@TotalVerb
Copy link
Contributor

TotalVerb commented Aug 24, 2016

This should get its own name. Otherwise, it will not generalize to multidimensional indexed arrays.

Technically this is trygetindex, though I can't see tryget used for anything else.

@dbeach24
Copy link
Contributor Author

dbeach24 commented Aug 24, 2016

@vtjnash @TotalVerb Based on your comments, I changed the name to tryget(collection, key).
@tkelman I have updated the docs as you suggested.

Please let me know if you find any other issues.

@@ -698,6 +706,11 @@ function get{K,V}(default::Callable, h::Dict{K,V}, key)
return (index < 0) ? default() : h.vals[index]::V
end

function tryget{K,V}(h::Dict{K,V}, key)
index = ht_keyindex(h, key)
return (index<0) ? Nullable{V}() : Nullable{V}(h.vals[index]::V)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this more efficient than the abstract definition above? the generic one should be correct here right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids computing the hash twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it avoids computing the hash and looking up the slot in the table twice.
@tkelman Please see the original issue #13055.

@StefanKarpinski StefanKarpinski changed the title Added implementation of getnull(coll, key) for Associative. Add tryget(coll, key) for Associative returning Nullable Aug 25, 2016
@@ -452,6 +456,10 @@ let d = ImmutableDict{String, String}(),
@test get(d4, "key1", :default) === v2
@test get(d4, "foo", :default) === :default
@test get(d, k1, :default) === :default
@test tryget(d1, "key1") === Nullable{String}(v1)
Copy link
Contributor Author

@dbeach24 dbeach24 Aug 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkelman tryget(::ImmutableDict, key) is tested here.

@eschnett
Copy link
Contributor

The name tryget sounds as if this function was connected to a try statement. I'd rather have something like nullable in the name, as in get_nullable. This function is unlikely to be used too frequently, and making its action explicit in its name is a good thing.

@dbeach24
Copy link
Contributor Author

@eschnett tryget was chosen because it is consistent with the existing tryparse (which also returns a Nullable).

As for the function being unlikely to be used, I will point out that in Python, for example, dict.get(key) returns None if the key is not present, and this is not considered an esoteric use of dictionaries. Of course, Python doesn't have to worry about type stability, so it is simply the default case of dict.get(key, default=None).

Julia seems to lack a type-stable equivalent, which is what motivated this addition.

@eschnett
Copy link
Contributor

I understand and agree with the motivation. What I meant to say is that the difference in function name length between tryget and get_nullable is here not that important because most codes will not contain tens of calls to get_nullable in a single function, and get_nullable is easier to understand.

@quinnj
Copy link
Member

quinnj commented Aug 25, 2016

FWIW, I'm not a fan of tryget either, mainly because I don't liketryparse either. Eventually, I think we should do parse(Nullable{Int}, x) if you want a Nullable back instead of needing a separate method.

I wonder if it'd be worth considering an alternative method like:

julia> Base.get{T}(::Type{T}, dict, key, default) = haskey(dict, key) ? T(dict[key]) : T(default)

julia> t = Dict{String,Int}("a"=>1, "b"=>2, "c"=>3)
Dict{String,Int64} with 3 entries:
  "c" => 3
  "b" => 2
  "a" => 1

julia> get(Nullable{Float64}, t, "a", 0)
Nullable{Float64}(1.0)

julia> get(Nullable{Float64}, t, "d", 0)
Nullable{Float64}(0.0)

julia> get(Int8, t, "a", 0)
1

julia> get(Int8, t, "d", 0)
0

It's nice because: 1) it doesn't require any new methods and 2) it generalizes to non-Nullable situations where I might just want to ensure I always get the same type out of a dict (think about Dict{String,Any} or something)

@dbeach24
Copy link
Contributor Author

@eschnett @quinnj I'd like to narrow the scope of this PR. Can we please move debates about renaming the existing try... functions to a different issue?

I'm trying to resolve the narrow issue of avoiding repeat lookups in a dictionary when both testing for a key and retrieving its value. When @vtjnash pointed out the existing tryparse function, I tried to be consistent with that in choosing the name tryget.

If the community later concludes that all the try... functions are bad, and, for example, that get with four arguments is better, then someone can come along and implement those changes in a separate PR.

@StefanKarpinski
Copy link
Member

I think the name tryget is more consistent with what we already have. If we come up with a better scheme for expressing this kind of thing (which I think we should), then maybe we can change it, but for now, this name seems as good as any other.

@quinnj
Copy link
Member

quinnj commented Aug 25, 2016

I still think 4-arg get is worth considering, even at this point, since it avoids introducing a completely new method to Base in addition to being more functional than the currently proposed tryget.

@dbeach24
Copy link
Contributor Author

If we must go there, then I'll say that:

x = get(Nullable{Int64}, dict, "myKey", Nullable{Int64}())

makes my eyes bleed. More general or not, I would want to have a succinct and readable way of expressing this. Spelling out the Nullable-decorated value type of the dictionary (twice!) is something only a Java programmer could love. (duck!)

x = tryget(dict, "myKey")

(or getnull or even get_nullable) seems far more readable to me.

@StefanKarpinski
Copy link
Member

Maybe just get(Nullable, dict, "key")?

@dbeach24
Copy link
Contributor Author

@StefanKarpinski Is there any precedent in the language for doing something like that? Admittedly, it's no longer than get_nullable(dict, "key")... What would the definition look like?

@martinholters
Copy link
Member

Maybe just get(Nullable, dict, "key")?

Nice and concise. Combined with @quinnj's proposal, it would just be something like

get(::Type{Nullable}, dict, key) = get(Nullable{valtype(dict)}, dict, key)
function get{T}(::Type{Nullable{T}}, dict, key, default=Nullable{T}())
# ...
end

I guess?

It only seems the semantics would be ambiguous if someone decides to store Nullables in a dict.

@TotalVerb
Copy link
Contributor

It would be less ambiguous (and more general) if it were get(f, dict, i). The neutral element could perhaps be determined through multiple dispatch.

@dbeach24
Copy link
Contributor Author

dbeach24 commented Aug 26, 2016

@TotalVerb So we would presumably define:

f() = Nullable{Int}()   # neutral element
f(x) = Nullable{Int}(x) # present element

And get would call the first form if the element is not present...
So, in this instance, f could actually be Nullable{Int}, as in:

get(Nullable{Int}, dict, key)

With the function coming first, this almost seems to suggest a do construct, but I can't see how that's useful in this instance:

x = get(dict, "key") do
   # ??? needs to be defined for both 0- and 1- arg invocations
end

@TotalVerb
Copy link
Contributor

Unfortunately, I just realized that three-argument get is already defined and taken, and the semantics are not the same. So that syntax is off the table.

@dbeach24
Copy link
Contributor Author

@TotalVerb Doesn't the existing 3-arg get have the associative container as the first arg? If our has the associative as the second arg, can we define both forms without ambiguity?

get(x::Associative, key, default) # use this signature for existing 
get(f::Function, x::Associative, key) # use this signature for the new proposed form

Maybe this is hard to get right -- I don't know.

@TotalVerb
Copy link
Contributor

TotalVerb commented Aug 26, 2016

There is also a 3-arg get with function as first argument. It tries to get the value, then returns it unchanged; otherwise, it returns the value obtained by invoking the function.

See for instance

julia> get(Nullable{Int}, Dict(:a => 1), :a)
1

julia> get(Nullable{Int}, Dict(:a => 1), :b)
Nullable{Int64}()

@dbeach24
Copy link
Contributor Author

@TotalVerb Amazingly (I think) this function is almost exactly what we want, but we'd have to change the semantics. The current behavior is:

get(f, dict, key) = haskey(dict, key) ? dict[key] : f()

The behavior we want is:

get(f, dict, key) = haskey(dict, key) ? f(dict[key]) : f()

(Caveat: The real implementation would need to lookup the key only once, not twice as shown here.)

It's close, but I guess this would constitute a breaking change if implemented.

@TotalVerb
Copy link
Contributor

TotalVerb commented Aug 26, 2016

The trouble is that changing these semantics is a pretty big breaking change, especially since I believe this form is used often with do blocks:

get(dict, key) do
    # create default
end

Perhaps there is an efficient (no runtime overhead, ideally) way to fall back to identity in the 1-argument case—but I'm not well-versed enough in how inference works to judge whether it will be able to eliminate applicable or method_exists in this situation.

@dbeach24
Copy link
Contributor Author

Perhaps the cure is worse than the disease.

What guiding principles does the Julia community use to determine when it's better to overload an existing method, and when it's better to just have a separately named one?

Personally, I'm starting to lean toward the latter again.

@TotalVerb
Copy link
Contributor

TotalVerb commented Aug 26, 2016

I honestly am really not a fan of using the same name for what are very, very different operations. We don't use the same name for getindex and get, and I don't understand why this operation needs the same name. The whole point of multiple dispatch is to facilitate generic programming... and when three-argument get has three different meanings depending on the precise types of the arguments, this is not generic.

get, incidentally, is already a pun... the get(::Nullable) method, for instance, has nothing to do with get(::Associative, key, default).

@rfourquet
Copy link
Member

@TotalVerb I don't understand why this could not be the 2-arg get, "[because] it will not generalize to multidimensional indexed arrays", as get is not defined for arrays... Would you mind clarifying this point?

@TotalVerb
Copy link
Contributor

@rfourquet I am going to walk back my argument, because in fact get as it exists is fundamentally non-generic and cannot be made general to n-dimensional arrays, on account of its argument order. Functions like setindex! put the indices last so that they can work on arrays; get is not defined this way and so it could not possibly apply to arrays.

@dbeach24
Copy link
Contributor Author

In light of all of this, I'd like to propose that we use the 2-argument form of get to return a nullable value from any associative.

get{K,V}(a::Associative{K,V}, key) = haskey(a, key) ? Nullable{V}(a[key]::V) : Nullable{V}()

This would seem to have the following benefits:

  1. Does not introduce any new exports from Base.
  2. Does not conflict with existing definitions of get.
  3. Type-stable.
  4. Succinct.
  5. Can be specialized to avoid double-lookup for various dictionary implementations.
  6. Semantically consistent with Python's dict.get(key, default=None).

Up-Vote / Down-Vote. Reactions?

@quinnj
Copy link
Member

quinnj commented Aug 29, 2016

I can get behind that.

Though I would note that this would seem to be our first big foray into a real "commitment" to Nullables. Not saying it's a bad thing or irreversible or anything like that, I just know it took a while for Nullables to come about and this is a big (probably good) step to really integrating them with the rest of Base.

@dbeach24
Copy link
Contributor Author

@quinnj That's a very good point. I hope that committing to Nullable is a good thing, but maybe that's not a given.

I remember hearing at least one opinion at this year's JuliaCon that Nullable was a less-than-elegant solution ported over from more rigid, statically-typed languages. It was suggested that if Julia's compiler were enhanced to emit optimized dispatch code, particularly in the case where the possible types of a variable are fairly constrained, that returning Union{T, Void} would be the preferable solution. All of this surprised me because it seemed to run counter to much of what I'd heard at JuliaCon 2015, where there seemed to be some emphasis on Nullable, NullableArray, and the important improvements that would yield for DataFrames.

To what degree is type-constrained dispatch optimization being considered for Julia 1.0? Is it strong enough to advise against using Nullable?

@nalimilan
Copy link
Member

Always returning a Nullable is an interesting option. Unfortunately, we need to find a deprecation path that doesn't break existing code, so this means we need two release cycles do do it. I think we should go with tryget for now, and then, if we decide to always return a Nullable, deprecate the current get in 0.6, so that we can rename tryget to get in 0.7/1.0.

The same strategy could be applied to tryparse. This is also the same situation as for findfirst (#15755). We need a systematic plan to evaluate whether and how we should make that general shift. Could a JuliaLang owner create a nullable tag to allow grouping these issues together?

@dbeach24
Copy link
Contributor Author

@nalimilan I was not suggesting that the 3-arg form of get should return a Nullable -- only the 2-arg form, where no default is provided. If you have an acceptable default value, then the Nullable really isn't necessary. In my question above, I was asking if there were plans to optimize Union type dispatch that would soon make the Nullable type obsolete. If that is the case, then we should think twice before further embracing Nullable in Base. Sorry for any confusion.

@nalimilan
Copy link
Member

Even if only the two-argument form is changed, this needs a deprecation path to avoid breaking existing code.

@dbeach24
Copy link
Contributor Author

@nalimilan There is no existing 2-arg form. This is the proposed addition, not a proposed change.

@nalimilan
Copy link
Member

Sorry, I would have sweared it existed. So it's actually a good thing it wasn't added before, as it makes it much easier to use it instead of tryget. +1

@dbeach24 dbeach24 changed the title Add tryget(coll, key) for Associative returning Nullable Add get(coll, key) for Associative returning Nullable Aug 29, 2016
@dbeach24
Copy link
Contributor Author

Just updated the PR to implement:

get(h::Associative, key) -> Nullable

Could someone please review?

This returns a Nullable value corresponding to the key (or null).
Includes specialization for various dictionary types.
(Fix bug in priority queue test.)
Fixes JuliaLang#13055
@nalimilan nalimilan added the missing data Base.missing and related functionality label Sep 6, 2016
@nalimilan nalimilan mentioned this pull request Jul 5, 2017
Copy link
Member

@rfourquet rfourquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs rebasing (e.g. PriorityQueue), but looks good to me. I had forgotten about this issue and was about to start a PR doing exactly this. I really don't see what fits better a two-arg get than this, and this seems to be exactly the kind of function Nullable is designed for (not in the statitistician meaning of course). I'm just uncertain about the deprecation problem If nullable gets renamed to Option or some other name. If this happens "soon", could be better to wait for the new name first (and use tryget in the meantime). But this functionality is needed, can someone make a decision?

@rfourquet rfourquet added the needs decision A decision on this change is needed label Aug 13, 2017
@nalimilan
Copy link
Member

nalimilan commented Aug 24, 2017

This is the poster child for the Union{Some{T}, Null} type which has been evoked as a replacement for Nullable (#22682). But clearly we need to make a decision regarding Nullable before fixing this issue.

@KristofferC
Copy link
Member

Nullable is no longer in Base

@KristofferC KristofferC closed this Jan 7, 2018
@nalimilan
Copy link
Member

nalimilan commented Jan 7, 2018

The issue itself is still relevant though. See PR #25131 for a similar change (merging tryparse and parse).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.