Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add filter and filter! to GroupedDataFrame #2279

Merged
merged 8 commits into from
Jun 24, 2020

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Jun 9, 2020

@nalimilan - this is a follow up to your comment in #1732.

@bkamins bkamins added this to the 1.x milestone Jun 9, 2020
@bkamins bkamins added feature grouping non-breaking The proposed change is not breaking labels Jun 9, 2020
src/groupeddataframe/groupeddataframe.jl Outdated Show resolved Hide resolved
src/groupeddataframe/groupeddataframe.jl Outdated Show resolved Hide resolved
src/groupeddataframe/groupeddataframe.jl Outdated Show resolved Hide resolved
src/groupeddataframe/groupeddataframe.jl Outdated Show resolved Hide resolved
src/groupeddataframe/groupeddataframe.jl Outdated Show resolved Hide resolved
src/groupeddataframe/groupeddataframe.jl Outdated Show resolved Hide resolved
src/groupeddataframe/groupeddataframe.jl Outdated Show resolved Hide resolved
end

"""
filter!(function, gdf::GroupedDataFrame)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any other in-place functions on GroupedDataFrame already, apart from select! and transform!? I'm asking because for consistency with the two latter, it would make sense for filter! to drop rows from parent(gdf). Though that could be a unexpected. We need a general policy on this, which could also apply to in-place oeprations on SubDataFrame.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question (actually I initially considered to add filter only for these reasons). It applies both to filter and filter!. I have not thought about it because what I proposed was most natural.

So the considerations are:

  1. we do not have other functions that mutate GroupedDataFrame in-place
  2. we do not have functions that mutate SubDataFrame schema in-place (of course we allow mutating the data, but not schema)
  3. select! and transform! work only for some GroupedDataFrames (ones that have no groups dropped) and they guarantee to retain row count in the parent; also it is in general expected (and documented) that they mutate the parent as they define new columns in general
  4. filter! and filter can work on an already "subsetted" GroupDataFrame so then it would be confusing what should happen with groups that are already not present.
  5. it is natural to expect that filter(predicate, collecion) does the same as collection[predicate.(collection)], which does not mutate the parent of collection if it is present.

All in all I think it is better to keep what we have.

test/grouping.jl Outdated Show resolved Hide resolved
bkamins and others added 2 commits June 10, 2020 10:14
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins
Copy link
Member Author

bkamins commented Jun 10, 2020

I have also pushed updates to docstrings of filter/filter! for AbstractDataFrame to make them consistent.

@bkamins
Copy link
Member Author

bkamins commented Jun 10, 2020

Your comment made me realize that in the past we assumed GroupedDataFrame is immutable, and we used this assumption in transformation code. I have pushed a fix.

@bkamins
Copy link
Member Author

bkamins commented Jun 10, 2020

This should be rebased after #2280 is finished and only then merged.

@bkamins
Copy link
Member Author

bkamins commented Jun 18, 2020

So the only downside of filter! is that it modifies a view. I have reviewed all the code and in #2280 I make sure that we do not make assumptions that it is not modified, but still maybe it is better not to add filter!?

@nalimilan
Copy link
Member

Maybe let's leave it out until we have a clear use case for it? If nobody requests it maybe it's not needed, and that keeps the API more consistent regarding what ! does.

@bkamins
Copy link
Member Author

bkamins commented Jun 19, 2020

OK - I have removed filter!. This PR is independent now of #2280 as a consequence.

@bkamins
Copy link
Member Author

bkamins commented Jun 22, 2020

similar - only coverage fails

@bkamins bkamins merged commit 6768e2e into JuliaData:master Jun 24, 2020
@bkamins bkamins deleted the filter_groupeddataframe branch June 24, 2020 13:34
@bkamins
Copy link
Member Author

bkamins commented Jun 24, 2020

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature grouping non-breaking The proposed change is not breaking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants