-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GroupKey
and DataFrameRow
to NamedTuple
#2305
Comments
The reason of the design is that So there are two action points to be decided:
|
Yes, that feels more obvious to me.
The main surprise for me was: df = DataFrame(
a = repeat([1, 2, 3, 4], outer=[2]),
b = repeat([2, 1], outer=[4]),
c = 1:8)
gd = groupby(df, :a)
@show haskey(gd, (a=1,))
@show in((a=1,), keys(gd))
producing the inconsistent haskey(gd, (a = 1,)) = true
(a = 1,) in keys(gd) = false
And it's because @show keys(gd) .== Ref((a=1,))
@show NamedTuple.(keys(gd)) .== Ref((a=1,))
keys(gd) .== Ref((a = 1,)) = Bool[0, 0, 0, 0]
NamedTuple.(keys(gd)) .== Ref((a = 1,)) = Bool[1, 0, 0, 0] So the journey started with discovering In my opinion, the docs should recommend |
OK. I will split it to three PRs as this is independent:
Indeed we can add Intuitively I think that There is also a decision if we should do the same with |
Thanks! On a related note, I think having equality between The question around |
Yes - these are the considerations and they ca get tricky if we want to be efficient at the same time. In short - some rules have to be defined to what extent we want to have a consistency. Feel free to comment on what you feel would be good if you have an opinion. In #2308 I cover the stuff that does not require such decisions. |
I don't think |
I understand what you say, but in |
This might be an elucidating example julia> in(NaN, keys(Dict(NaN=>:a)))
true
julia> in(NaN, collect(keys(Dict(NaN=>:a))))
false
julia> in(missing, keys(Dict(missing=>:a)))
true
julia> in(missing, collect(keys(Dict(missing=>:a))))
missing The behavior is not handled in |
Yes, I know it sounds absurd but I'd favor consistency here. If |
Points 1. and 2. from my list are done. So we are left with:
i.e. we would make sure that |
No I wasn't opposed to adding |
So to conclude, is this what we want?
If yes then @goretkin - let me please know if you want to go forward with this PR. If not I can implement it without a problem. |
Makes sense. There's one difficulty with transitivity of equality between |
OK - so we should error on Still |
I'm not aware of any case of
|
So for me to be clear. Can you please summarize the rules for |
No methods for |
OK - clear. Just to be sure:
AFAICT we do not have defined it yet, but I understand what we want to do is to do |
Yes. At least that would be consistent with the fact that we allow tuples in |
I have to go back and read more carefully. For now:
It seems logical, but I don't think this should be a requirement julia> x = [1, NaN]; copy(x) == x
false I don't understand what role I think there should be a method |
The role is that we allow to index In general: yes - please think of it carefully 😄, there is no rush - and it is easy to make a wrong decision, that we would regret later. Thank you! |
Yes, my language was a bit sloppy. What I meant is that |
Yes, we all agree on that AFAICT. |
Ah, okay, I missed that. Thanks for explaining. So then I understand now that this is a question of making Take the definition function haskey2(c, x)
try
c[x]
return true
catch
end
return false
end Is it correct to say that currently |
No. They will be the same currently. To make things precise. Currently for I understand that we are talking how to transfer these rules to |
So we can even define something like (probably with better method signatures):
and to not touch definitions of |
I guess you meant to use
So as discussed above this would ensure |
Ah - these things are tricky. Thinking of it I meant
This is what we currently have with |
We could throw an error and see whether people complain (probably not).
What do you mean? Currently for tuples we can have this: julia> haskey(gd, (1,))
true
julia> (1,) in keys(gd)
false |
I think it is OK to throw an error - this is probably going to catch some bugs (and it is easy enough to convert to
Exactly, and we would have both of them return
So for From a docsting of
So what Base actually requires is that |
"Requires" may be a bit too strong. What the docstring says is that Anyway, yeah, it's true that consistency with dicts is a worthy goal. Maybe that's more important than consistency between |
https://juliadata.github.io/DataFrames.jl/stable/lib/types/#DataFrames.DataFrameRow mentions using the
copy
function to convert a row to aNamedTuple
, and it just callsNamedTuple(::DataFrameRow)
To convert a
GroupKey
into aNamedTuple
, there is nocopy
method, but there isNamedTuple(key::GroupKey)
This strikes me as inconsistent, but perhaps there's a reason. My instinct is that the documentation should just suggest using the
NamedTuple(::T)
method in both cases.The text was updated successfully, but these errors were encountered: