Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Unique family of APIs use reduce_then_scan #1765

Merged
merged 16 commits into from
Aug 30, 2024

Conversation

danhoeflinger
Copy link
Contributor

@danhoeflinger danhoeflinger commented Aug 5, 2024

This PR changes the unique family of scan-like APIs to use reduce_then_scan when it is beneficial.

This PR allows us to remove __pattern_scan_copy functions because they are no longer used.
We have moved the algorithm decisions to go through at the level of __parallel_[copy_if/partition/unique]_copy, so we no longer need the "scan_copy" at the pattern level.

Moves all algorithm selection decisions to __parallel_unique_copy, and unifies range API to also use this function. This allows us to unify the algorithmic selection, and provide performance improvements to the ranges API.

Unique requires some constexpr special casing in the kernel to allow us to avoid an extra branch for each element in the _GenMask for unique to avoid underflow when index == 0. We special case the kernels to skip and always copy the 0th element for unique family APIs, and start the scan at element 1. This allows us to handle this copy of the 0th element without any additional kernel launches. The n==1 case is handled specially with a simple copy call.


This PR is targeted to #1764, to allow for a clean diff, and is a part of the following sequence of PRs meant to be merged in order:

#1769 [MERGED] Relocate __lazy_ctor_storage to utils header
#1770 [MERGED] Use __result_and_scratch_storage within scan kernels
#1762 Add reduce_then_scan algorithm for transform scan family
#1763 Make Copy_if family of APIs use reduce_then_scan algorithm
#1764 Make Partition family of APIs use reduce_then_scan algorithm
#1765 Make Unique family of APIs use reduce_then_scan algorithm (This PR)

This work is a collaboration between @mmichel11 @adamfidel and @danhoeflinger, and based upon an original prototype by Ted Painter.

@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from 013c496 to dadb933 Compare August 5, 2024 20:57
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch from 8a9abd7 to f481951 Compare August 5, 2024 21:00
@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from dadb933 to eafdcc0 Compare August 6, 2024 12:53
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch from f481951 to 6a7bae0 Compare August 6, 2024 12:53
@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from eafdcc0 to c7c043e Compare August 6, 2024 13:34
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch from 6a7bae0 to 019a6c7 Compare August 6, 2024 13:35
@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from c7c043e to d2f9639 Compare August 6, 2024 14:19
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch from 019a6c7 to 95ac867 Compare August 6, 2024 14:19
@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from d2f9639 to 6cbb5dd Compare August 6, 2024 17:19
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch 2 times, most recently from 0372b92 to 768ebb6 Compare August 6, 2024 17:25
@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from 6cbb5dd to 06da0e2 Compare August 6, 2024 17:26
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch from 768ebb6 to 9c63fbf Compare August 6, 2024 17:27
@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from 06da0e2 to edf26c5 Compare August 6, 2024 18:11
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch from 9c63fbf to c26e98f Compare August 6, 2024 18:11
@danhoeflinger danhoeflinger added this to the 2022.7.0 milestone Aug 6, 2024
@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from edf26c5 to 1ae6489 Compare August 6, 2024 19:03
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch from c26e98f to 8a4f9c3 Compare August 6, 2024 19:03
@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from 1ae6489 to aae11db Compare August 6, 2024 20:51
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch from 8a4f9c3 to 87aa137 Compare August 6, 2024 20:52
@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from aae11db to 207bfed Compare August 6, 2024 20:59
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch from 87aa137 to 929d518 Compare August 6, 2024 21:00
@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from 207bfed to bc972e1 Compare August 7, 2024 14:49
Copy link
Contributor Author

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after rebasing and minor comment changes.

@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from c8298cb to 040a1ef Compare August 29, 2024 20:20
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch from 1c77128 to bf16949 Compare August 29, 2024 20:20
julianmi
julianmi previously approved these changes Aug 30, 2024
Copy link
Contributor

@julianmi julianmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after a rebase.

@danhoeflinger danhoeflinger force-pushed the dev/shared/partition_reduce_then_scan branch from 040a1ef to bd6aba6 Compare August 30, 2024 19:19
Base automatically changed from dev/shared/partition_reduce_then_scan to main August 30, 2024 19:51
@danhoeflinger danhoeflinger dismissed julianmi’s stale review August 30, 2024 19:51

The base branch was changed.

danhoeflinger and others added 15 commits August 30, 2024 15:52
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Co-authored-by: Matthew Michel <106704043+mmichel11@users.noreply.github.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
@danhoeflinger danhoeflinger force-pushed the dev/shared/unique_reduce_then_scan branch from bf16949 to b2956f9 Compare August 30, 2024 19:55
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danhoeflinger danhoeflinger merged commit c7d845f into main Aug 30, 2024
19 checks passed
@danhoeflinger danhoeflinger deleted the dev/shared/unique_reduce_then_scan branch August 30, 2024 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants