-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a mixin for reductions #9925
Conversation
2b8d40f
to
048cf50
Compare
048cf50
to
01d1574
Compare
Marking this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from small docstring improvement, I have one thought in comment, but probably doesn't require any actions.
Other thoughts: It certainly require some training to be able to use the factory. There are subtle things like op
parameter requires a annotation, and the docstring of the _base_operation
is required to exist. I wish we can make both optional in compliance with python standards.
Definitely something we need to address. $ python -OO
Python 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf
>>> df = cudf.DataFrame()
>>> df.groupby([]).max() # AttributeError |
Yeah this is just an oversight on my part. Optional attributes should only be patched if they are defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from docstring example: #9925 (comment), just a few parting thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I meant to click approve. I think the above question/comments does not count as blockers.
@gpucibot merge |
This PR builds on the framework introduced in #9925 to implement scans. I plan to apply this mixin to ColumnBase as well, but that will require more work to clean up binary operations for column types and it is large enough to merit a separate PR. Contributes to #10177. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Michael Wang (https://github.com/isVoid) - Ashwin Srinath (https://github.com/shwina) URL: #10360
This PR builds on the framework introduced in #9925 to implement scans. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #10358
This PR implements a factory for mixins classes based on the common pattern in cuDF of categories of similar functions all calling a common method implementing some standard pre/post-processing before calling a lower-level API of either one of its members (e.g.
Frames
callingColumn
methods) or the C++ libcudf library. When added to another class, these mixins support customization of which methods are exposed via a class member set of method names. Documentation for these methods is generated by formatting the docstring for the common internal method, e.g._reduce
for reductions. As a first pass, this PR generates a single mixin for reductions and applies it to all the relevant classes. Future PRs will use this to generate classes for scans, binary operations, and unary operations, and perhaps other similar categories as they are uncovered.This approach assumes a great deal of API homogeneity between the different methods in a category.
Frame
violates this assumption because similar operations often support slightly different parameters (for instance, some reductions support amin_count
parameter), so for nowFrame
was not madeReducible
. That decision could be revisited if 1) the degree of homogeneity of these function signatures increases over time, or 2) we can introduce greater customization into these mixins without adding too much complexity. A first attempt of (2) can be seen in this branch, but the degree of additional complexity just to supportFrame
isn't really justifiable at this stage, so unless we can come up with a simpler solution I recommend leavingFrame
as is for now.