Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reduce then scan algorithm for transform scan family #1762

Merged
merged 88 commits into from
Aug 30, 2024

Conversation

danhoeflinger
Copy link
Contributor

@danhoeflinger danhoeflinger commented Aug 5, 2024

This PR Adds a generic reduce_then_scan algorithm for scan-like algorithms, and uses this algorithm for transform scan family of scan-like algorithms where it is beneficial.

This PR is targeted after a pair of other PRs #1769 + #1770, which lay some plumbing for these changes. Those two PRs were separated from this one to provide more focus to this change.

It is important to understand that subsequent PRs will add other families of scan-like algorithms (copy_if, partition, unique, ...), which may justify some aspects of the implementation of the reduce_then_scan kernels that could seem overcomplicated for the transform scan family alone.


This PR is targeted to #1770, to allow for a clean diff, and is a part of the following sequence of PRs meant to be merged in order:

#1769 [MERGED] Relocate __lazy_ctor_storage to utils header
#1770 [MERGED] Use __result_and_scratch_storage within scan kernels
#1762 Add reduce_then_scan algorithm for transform scan family (This PR)
#1763 Make Copy_if family of APIs use reduce_then_scan algorithm
#1764 Make Partition family of APIs use reduce_then_scan algorithm
#1765 Make Unique family of APIs use reduce_then_scan algorithm

This work is a collaboration between @mmichel11 @adamfidel and @danhoeflinger, and based upon an original prototype by Ted Painter.

@danhoeflinger danhoeflinger force-pushed the dev/shared/transform_reduce_then_scan branch from 9eb3099 to fc43983 Compare August 6, 2024 17:16
@danhoeflinger danhoeflinger changed the base branch from main to dev/shared/scan_result_and_scratch_storage August 6, 2024 17:21
@danhoeflinger danhoeflinger added this to the 2022.7.0 milestone Aug 6, 2024
@danhoeflinger danhoeflinger force-pushed the dev/shared/scan_result_and_scratch_storage branch from c8a274a to 4e4c0bc Compare August 6, 2024 18:59
@danhoeflinger danhoeflinger force-pushed the dev/shared/transform_reduce_then_scan branch 2 times, most recently from 3d277c6 to 4df758a Compare August 6, 2024 20:58
@danhoeflinger danhoeflinger marked this pull request as ready for review August 6, 2024 22:02
@danhoeflinger danhoeflinger changed the title [Draft] Add reduce then scan algorithm for transform scan family Add reduce then scan algorithm for transform scan family Aug 6, 2024
@danhoeflinger danhoeflinger force-pushed the dev/shared/transform_reduce_then_scan branch 3 times, most recently from ea28133 to d01aeb1 Compare August 8, 2024 12:27
@danhoeflinger danhoeflinger force-pushed the dev/shared/scan_result_and_scratch_storage branch from 47a227c to c471ed5 Compare August 8, 2024 14:06
adamfidel and others added 5 commits August 21, 2024 07:58
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Copy link
Contributor

@julianmi julianmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had another look and I think it is in a good state. Clang-format is red, but I prefer the current version since it is easier to read.

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
adamfidel and others added 3 commits August 22, 2024 17:05
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
{
template <typename _OutRng, typename ValueType>
void
operator()(const _OutRng& __out_rng, std::size_t __id, const ValueType& __v) const
Copy link
Contributor

@MikeDvorskiy MikeDvorskiy Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant reference const _OutRng& __out_rng looks suspicious, because the range is output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I removed the const here this and fixed a couple other range types in the kernel. I will go through the remaining PRs to check for similar issues. (I think output ranges for the other helpers probably have the same issue)

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
julianmi
julianmi previously approved these changes Aug 29, 2024
Copy link
Contributor

@julianmi julianmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please wait for another approval since there are partial reviews of several people.

// and then write back the final values to memory
if (__sub_group_id == 0)
{
// step 1) load to Xe SLM the WG-local S prefix sums
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the reference to Xe here since this algorithm is designed to work on non-Xe-based devices as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.

Copy link
Contributor Author

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor things I found.

// and then write back the final values to memory
if (__sub_group_id == 0)
{
// step 1) load to Xe SLM the WG-local S prefix sums
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
danhoeflinger and others added 2 commits August 29, 2024 16:13
Co-authored-by: Matthew Michel <106704043+mmichel11@users.noreply.github.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Copy link
Contributor

@adamfidel adamfidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is in a good state and is okay to be merged.

Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We have checked tests and performance with the latest version.

@danhoeflinger danhoeflinger merged commit f320dbd into main Aug 30, 2024
20 of 21 checks passed
@danhoeflinger danhoeflinger deleted the dev/shared/transform_reduce_then_scan branch August 30, 2024 15:42
@@ -91,35 +91,15 @@ oneapi::dpl::__internal::__difference_t<_Range2>
__pattern_transform_scan_base(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Range1&& __rng1, _Range2&& __rng2,
_UnaryOperation __unary_op, _InitType __init, _BinaryOperation __binary_op, _Inclusive)
{
if (__rng1.empty())
oneapi::dpl::__internal::__difference_t<_Range1> __n = __rng1.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment I forgot to send earlier:
__n is also the return value, and the function is declared to return __difference_t<_Range2>, not that of _Range1. In the previous code, we took the size of __rng1 and converted it to the return type. But I wonder if we should just take the size of __rng2 instead (if we assume ranges being of equal size), or maybe change the return type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. We created PR #1811 to address this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants