Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add in for GroupKeys #2392

Merged
merged 4 commits into from
Oct 4, 2020
Merged

add in for GroupKeys #2392

merged 4 commits into from
Oct 4, 2020

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Aug 29, 2020

Fixes #2305

This is the last thing that was left from that issue. I was not sure if 1 in keys(gdf) should return true or false. On one hand haskey(gdf, 1) is true, but it seems unintuitive. Maybe we should remove haskey support for integers?

@bkamins bkamins added the non-breaking The proposed change is not breaking label Aug 29, 2020
@bkamins bkamins added this to the 1.0 milestone Aug 29, 2020
@bkamins
Copy link
Member Author

bkamins commented Aug 30, 2020

@oxinabox - are there conclusions from custom indexing BOF during JuliaCon2020 for such cases.

To be precise:

  1. we have a collection (GroupedDataFrame) that can be indexed by integers and also by values of other type (in this case Tuple, NamedTuple, Dict and GroupKey)
  2. we have a GroupKeys object that is returned by keys(::GroupedDataFrame)

The question is:

should in(::Integer, ::GroupKeys) produce always false (this is what we have now), or potentially return true (if the integer is in range), or throw an error (for safety)?

Also maybe there are other recommendations that it would be good to be aware of. Thank you!

@oxinabox
Copy link
Contributor

the fancy indexing BoF did not focus on conclusions, it focused on desires.
Here are the notes from that BoF
https://docs.google.com/document/d/1imBX3k0EEejauWVyXONZDRj8LTr0PeLOJNGEgo6ow1g/edit#heading=h.qrm4q6q56yxm

If we wanted to match the behavour of NamedTuples it seems to me it should return false.
but idk if the behavour of NamedTuples actuals makes sense in the first place, since it has haskey(nt, x) and x in keys(nt) disagreeing.

julia> (;a=1)
(a = 1,)

julia> keys((;a=1))
(:a,)

julia> 1 ∈ keys((;a=1))
false

julia> nt = (;a=10, b=20);

julia> nt[1]
10

julia> 1 in keys(nt)
false

julia> haskey(nt, 1)
true

@bkamins
Copy link
Member Author

bkamins commented Aug 31, 2020

Thank you for looking into this. Let us wait what Julia core team says.

@bkamins
Copy link
Member Author

bkamins commented Sep 1, 2020

Given the discussion on Slack I understand that in(::Integer, ::GroupKeys) would make sense to return true if index is valid (although this is not what currently Base does). Do you agree with this @oxinabox?

The interpretation is that keys should encompass a concept of "any key", but when iterating keys obviously only one kind of key is iterated over. Is this thinking correct?

@oxinabox
Copy link
Contributor

oxinabox commented Sep 2, 2020

I think that is correct

@bkamins
Copy link
Member Author

bkamins commented Sep 2, 2020

OK - changed!

@nalimilan
Copy link
Member

Maybe we should wait and see whether Base does the same change for NamedTuple before merging this?

@bkamins
Copy link
Member Author

bkamins commented Sep 3, 2020

I think that before 2.0 it is not going to change in Base as:

julia> x = (a=1,b=2)
(a = 1, b = 2)

julia> keys(x)
(:a, :b)

so as @oxinabox mentioned the type of object returned by keys would have to change and this is a relatively big breakage IMO.

As GroupKeys is our own type I think we are free to do what we think is right and it was agreed that x in keys(y) should be true if y[x] gives you a valid result.

@bkamins
Copy link
Member Author

bkamins commented Sep 9, 2020

@nalimilan - so what do you think of this?

@bkamins
Copy link
Member Author

bkamins commented Oct 3, 2020

@nalimilan - bump. I think it would be better to make a decision on this before 1.0 release.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I had totally forgotten this one. I don't have a strong opinion. It seems that whatever we do there will be a small inconsistency between isequal, in and haskey.

@bkamins bkamins merged commit c114798 into JuliaData:master Oct 4, 2020
@bkamins bkamins deleted the in_group_keys branch October 4, 2020 07:07
@bkamins
Copy link
Member Author

bkamins commented Oct 4, 2020

Thank you!

@bkamins
Copy link
Member Author

bkamins commented Mar 22, 2021

@nalimilan - we should then additionally decide how == and isequal should work when mixing GroupKey and other types. I think we should not compare as true to vectors or Tuple, but we can consider comparing as isequal (and consequently ==) to NamedTuple. If we decided to go this way then hash of GroupKey must match hash of NamedTuple (which will be a bit expensive, but acceptable). What do you think?

@bkamins
Copy link
Member Author

bkamins commented Mar 22, 2021

sorry - this should be in #2639 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
grouping non-breaking The proposed change is not breaking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GroupKey and DataFrameRow to NamedTuple
3 participants