Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cardinality control for groupby benchs with flat types #15134

Merged

Conversation

PointKernel
Copy link
Member

@PointKernel PointKernel commented Feb 24, 2024

Description

Contributes to #15114

This PR adds cardinality control to group_max, group_nunique and group_rank benchmarks.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@PointKernel PointKernel added feature request New feature or request 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Feb 24, 2024
@PointKernel PointKernel self-assigned this Feb 24, 2024
@GregoryKimball
Copy link
Contributor

Thanks for kicking this off! You might also consider adding nvbench axes for cardinality for the existing benchmarks. If we started with a default value of {0} then this would let us vary cardinality for engineering studies without changing the automated benchmark runtime.

@PointKernel PointKernel changed the title Add groupby benchmarks varying cardinalities Add cardinality control for groupby max bench Mar 4, 2024
@PointKernel PointKernel changed the title Add cardinality control for groupby max bench Add cardinality control for groupby benchs with flat types Mar 4, 2024
@PointKernel PointKernel marked this pull request as ready for review March 4, 2024 19:42
@PointKernel PointKernel requested a review from a team as a code owner March 4, 2024 19:42
@PointKernel PointKernel added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Mar 4, 2024
Comment on lines +46 to +50
data_profile profile =
data_profile_builder()
.cardinality(cardinality)
.no_validity()
.distribution(cudf::type_to_id<int32_t>(), distribution_id::UNIFORM, 0, size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code to create keys and values here seem to be very similar (the same) as for creating those of groupby max. Can we further extract them into a common function? Like create_keys_values()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate part is just data profile construction. Though we repeat those 5 lines of code each time, it's just one API call. It's not worth creating a helper function wrapping one single call IMO.

@PointKernel
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit b08dd9b into rapidsai:branch-24.04 Mar 8, 2024
73 checks passed
@PointKernel PointKernel deleted the groupby-bench-cardinality branch March 8, 2024 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants