Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a mixin for reductions #9925

Merged
merged 46 commits into from
Feb 25, 2022

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Dec 16, 2021

This PR implements a factory for mixins classes based on the common pattern in cuDF of categories of similar functions all calling a common method implementing some standard pre/post-processing before calling a lower-level API of either one of its members (e.g. Frames calling Column methods) or the C++ libcudf library. When added to another class, these mixins support customization of which methods are exposed via a class member set of method names. Documentation for these methods is generated by formatting the docstring for the common internal method, e.g. _reduce for reductions. As a first pass, this PR generates a single mixin for reductions and applies it to all the relevant classes. Future PRs will use this to generate classes for scans, binary operations, and unary operations, and perhaps other similar categories as they are uncovered.

This approach assumes a great deal of API homogeneity between the different methods in a category. Frame violates this assumption because similar operations often support slightly different parameters (for instance, some reductions support a min_count parameter), so for now Frame was not made Reducible. That decision could be revisited if 1) the degree of homogeneity of these function signatures increases over time, or 2) we can introduce greater customization into these mixins without adding too much complexity. A first attempt of (2) can be seen in this branch, but the degree of additional complexity just to support Frame isn't really justifiable at this stage, so unless we can come up with a simpler solution I recommend leaving Frame as is for now.

@vyasr vyasr added 2 - In Progress Currently a work in progress code quality Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 16, 2021
@vyasr vyasr added this to the CuDF Python Refactoring milestone Dec 16, 2021
@vyasr vyasr self-assigned this Dec 16, 2021
@vyasr vyasr changed the base branch from branch-22.02 to branch-22.04 January 18, 2022 21:46
@vyasr vyasr force-pushed the refactor/reductions branch from 2b8d40f to 048cf50 Compare January 20, 2022 01:05
@vyasr vyasr force-pushed the refactor/reductions branch from 048cf50 to 01d1574 Compare January 24, 2022 19:50
@vyasr vyasr added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jan 24, 2022
@shwina
Copy link
Contributor

shwina commented Feb 16, 2022

Marking this DO NOT MERGE, as @vyasr and I agreed we would like to first see how well the approach taken here generalizes beyond reductions, by trying it with something like binary ops or scans. That will be in a separate branch.

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from small docstring improvement, I have one thought in comment, but probably doesn't require any actions.

Other thoughts: It certainly require some training to be able to use the factory. There are subtle things like op parameter requires a annotation, and the docstring of the _base_operation is required to exist. I wish we can make both optional in compliance with python standards.

python/cudf/cudf/core/mixins/mixin_factory.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/mixins/mixin_factory.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/mixins/mixin_factory.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/mixins/mixin_factory.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/mixins/mixin_factory.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/mixins/mixin_factory.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/mixins/mixin_factory.py Outdated Show resolved Hide resolved
@isVoid isVoid dismissed their stale review February 16, 2022 20:34

Superseded by the latest review.

@shwina
Copy link
Contributor

shwina commented Feb 17, 2022

There are subtle things like op parameter requires a annotation, and the docstring of the _base_operation is required to exist. I wish we can make both optional in compliance with python standards.

Definitely something we need to address. python -OO for example fails for me with this branch because it explicitly discards docstrings:

$ python -OO
Python 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf
>>> df = cudf.DataFrame()
>>> df.groupby([]).max()  # AttributeError

@vyasr
Copy link
Contributor Author

vyasr commented Feb 17, 2022

There are subtle things like op parameter requires a annotation, and the docstring of the _base_operation is required to exist. I wish we can make both optional in compliance with python standards.

Definitely something we need to address. python -OO for example fails for me with this branch because it explicitly discards docstrings:

$ python -OO
Python 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf
>>> df = cudf.DataFrame()
>>> df.groupby([]).max()  # AttributeError

Yeah this is just an oversight on my part. Optional attributes should only be patched if they are defined.

@vyasr
Copy link
Contributor Author

vyasr commented Feb 22, 2022

Marking this DO NOT MERGE, as @vyasr and I agreed we would like to first see how well the approach taken here generalizes beyond reductions, by trying it with something like binary ops or scans. That will be in a separate branch.

@isVoid @shwina An example of this is in vyasr#3

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from docstring example: #9925 (comment), just a few parting thoughts here.

python/cudf/cudf/core/mixins/mixin_factory.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/mixins/mixin_factory.py Outdated Show resolved Hide resolved
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I meant to click approve. I think the above question/comments does not count as blockers.

@vyasr vyasr added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 5 - DO NOT MERGE Hold off on merging; see PR for details labels Feb 25, 2022
@vyasr
Copy link
Contributor Author

vyasr commented Feb 25, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 21325e8 into rapidsai:branch-22.04 Feb 25, 2022
rapids-bot bot pushed a commit that referenced this pull request Mar 2, 2022
This PR builds on the framework introduced in #9925 to implement scans. I plan to apply this mixin to ColumnBase as well, but that will require more work to clean up binary operations for column types and it is large enough to merit a separate PR.

Contributes to #10177.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Michael Wang (https://github.com/isVoid)
  - Ashwin Srinath (https://github.com/shwina)

URL: #10360
rapids-bot bot pushed a commit that referenced this pull request Mar 7, 2022
This PR builds on the framework introduced in #9925 to implement scans.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)

URL: #10358
@vyasr vyasr deleted the refactor/reductions branch March 9, 2022 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants