Single-pass scan kernel template #1320

adamfidel · 2023-12-18T16:23:36Z

This PR provides an implementation of the single-pass scan algorithm as a kernel template.

…o dev/adamfidel/scan_kt

include/oneapi/dpl/experimental/kt/single_pass_scan.h

Co-authored-by: Dmitriy Sobolev <Dmitriy.Sobolev@intel.com>

test/kt/CMakeLists.txt

test/kt/single_pass_scan.cpp

Co-authored-by: Dmitriy Sobolev <Dmitriy.Sobolev@intel.com>

…adamfidel/scan_kt

dmitriy-sobolev

I've left some minor comments.
Meanwhile, I'm looking at the kernel internals and will be back with the comments soon if any.

test/kt/single_pass_scan.cpp

include/oneapi/dpl/experimental/kt/single_pass_scan.h

akukanov · 2024-04-23T18:33:09Z

test/support/test_config.h

+// Group reduction produces wrong results with multiplication of 64-bit for certain driver versions
+// TODO: When a driver fix is provided to resolve this issue, consider altering this macro or checking the driver version at runtime
+// of the underlying sycl::device to determine whether to include or exclude 64-bit type tests.
+#define _PSTL_GROUP_REDUCTION_MULT_INT64_BROKEN 1
+


Is this for the same problem as the macro right above (which is also documented as a known issue), or for a different one?

I thought it was a different problem, but looking at it more, I think it's the same issue that the macros _PSTL_ICPX_TEST_RED_BY_SEG_BROKEN_64BIT_TYPES and ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION are meant for.

I know the internal ticket number for this issue, but it's hard to correlate that this is the same issue that these other macros are for because we do not (rightfully so) associate our internal issue tracking with these comments.

After an offline discussion with @mmichel11, we found that _PSTL_ICPX_TEST_RED_BY_SEG_BROKEN_64BIT_TYPES is for a separate issue.

The macro ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION seems to be from the same root issue, but I feel that we should still use a separate macro for the following reasons:

The ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION macro is expected to be defined externally through CMake rather than defined in test_config.h with the other macros

The macro's name suggests that when defined, it will apply some workaround for 64-bit group reductions with multiplies, whereas in this case, we want to explictly disable the tests cases with 64-bit types and the std::multiplies binary operator

test/kt/single_pass_scan.cpp

dmitriy-sobolev

I've checked everything besides atomics, especially the specifics of the used memory ordering. Everything what I've checked looks good to me.

danhoeflinger

This LGTM. I checked it far more thoroughly earlier on, but I have been reviewing delta commits and following the comments, and they look good to me.

My outstanding issues have been:

init value, which we have decided to address in a later PR
Memory footprint in SLM of joint algorithms, and how to communicate memory footprint requirements to the end user to provide them the proper guidance for selecting kernel params. I think this remains somewhat unresolved (as the sycl feature is a black box), but I think we can make a GH issue, and improve on this in the future (if we feel unsatisfied with our guidance in the docs with the current code). I don't think it is worth holding back this PR for this.

It would probably be good to get another approval from one of the others who have been working on this.

dmitriy-sobolev

LGTM. I've reviewed the atomic operations and found no race conditions.

One more suggestions to consider doing in a separate PR: use regular (non-atomic) variables for values.

adamfidel · 2024-04-24T13:07:04Z

Thanks all for the reviews!

@danhoeflinger and @dmitriy-sobolev, I will create GH issues to address the next steps that you have mentioned in your approvals.

adamfidel added 16 commits August 18, 2023 14:52

Start of single-pass scan kernel template

58c2639

Fix hang in inclusive scan

e0a676d

Debug statements for scan kernel template

956f139

Update scan kernel template test

deb92cb

Merge remote-tracking branch 'antcarcomp01/dev/adamfidel/scan_kt' int…

fd0af78

…o dev/adamfidel/scan_kt

Only have a single work-item per group query for previous tile status

590b1c0

First attempt at parallel lookback

5f4069a

Working cooperative lookback

6d4aa3d

Fix correctness issue with non-power-of-2 sizes

7e32a6f

Code cleanup and bug fixes

1fcdba4

Remove accidental debug statement

db77c7d

Test with floats

0228724

Fix incorrect values for size_t scan

b911bc0

Add support for sizes > 2^30

8dacc2f

Uglify

69b5afe

Remove debug code

d88abb1

timmiesmith modified the milestone: 2022.5.0 Jan 8, 2024

adamfidel modified the milestones: 2022.4.0, 2022.5.0 Jan 8, 2024

adamfidel added 7 commits January 8, 2024 07:12

Merge branch 'main' into dev/adamfidel/scan_kt

53e17b1

Move single-pass scan test to same place as ESIMD radix sort test

ce2bb8a

Move kernel template to other kernel template directory

d67d579

Restructure scan KT tests to be similar to sort KT tests

6d7929a

Delete old test code

4b01e64

clang-format

6e97ed2

Re-arrange assert for 64-bit atomics

063ac92

mmichel11 reviewed Apr 17, 2024

View reviewed changes

include/oneapi/dpl/experimental/kt/single_pass_scan.h Outdated Show resolved Hide resolved

include/oneapi/dpl/experimental/kt/single_pass_scan.h Outdated Show resolved Hide resolved

include/oneapi/dpl/experimental/kt/single_pass_scan.h Outdated Show resolved Hide resolved

::std -> std:: and replacing assert with exception

f1e0709

adamfidel force-pushed the dev/adamfidel/scan_kt branch from 2ff21c9 to f1e0709 Compare April 17, 2024 21:45

dmitriy-sobolev reviewed Apr 19, 2024

View reviewed changes

include/oneapi/dpl/experimental/kt/single_pass_scan.h Outdated Show resolved Hide resolved

Fix number of elements vs number of bytes

265ab8f

Co-authored-by: Dmitriy Sobolev <Dmitriy.Sobolev@intel.com>

dmitriy-sobolev reviewed Apr 19, 2024

View reviewed changes

test/kt/CMakeLists.txt Show resolved Hide resolved

danhoeflinger reviewed Apr 19, 2024

View reviewed changes

test/kt/single_pass_scan.cpp Outdated Show resolved Hide resolved

adamfidel and others added 4 commits April 19, 2024 11:12

Improve test data generation, especially for multiplies

fe18dac

Co-authored-by: Dmitriy Sobolev <Dmitriy.Sobolev@intel.com>

Merge remote-tracking branch 'github/dev/adamfidel/scan_kt' into dev/…

e96c8e9

…adamfidel/scan_kt

Fix CMake target without any constant params

0df3640

Adding a few includes for completeness

33c8982

dmitriy-sobolev reviewed Apr 23, 2024

View reviewed changes

test/kt/single_pass_scan.cpp Outdated Show resolved Hide resolved

test/kt/single_pass_scan.cpp Outdated Show resolved Hide resolved

include/oneapi/dpl/experimental/kt/single_pass_scan.h Outdated Show resolved Hide resolved

adamfidel added 3 commits April 23, 2024 10:14

Address PR comments

e431f01

Merge remote-tracking branch 'github/main' into dev/adamfidel/scan_kt

6bcc32d

Use new get_new_kernel_params function

5f2fea6

akukanov reviewed Apr 23, 2024

View reviewed changes

Correctly pass kernel name with optional_kernel_name

978e0fa

dmitriy-sobolev reviewed Apr 23, 2024

View reviewed changes

test/kt/single_pass_scan.cpp Outdated Show resolved Hide resolved

dmitriy-sobolev reviewed Apr 23, 2024

View reviewed changes

test/kt/single_pass_scan.cpp Outdated Show resolved Hide resolved

clang-format + Moving around get_new_kernel_params

804c8c7

dmitriy-sobolev reviewed Apr 23, 2024

View reviewed changes

clang-format

7cd2d63

danhoeflinger approved these changes Apr 23, 2024

View reviewed changes

dmitriy-sobolev approved these changes Apr 24, 2024

View reviewed changes

adamfidel merged commit e625bf0 into main Apr 24, 2024
20 checks passed

adamfidel deleted the dev/adamfidel/scan_kt branch April 24, 2024 13:06

This was referenced Apr 24, 2024

Scan kernel template support for initial value #1526

Open

Optimize atomic operations in scan kernel template #1528

Open

Improve guidance for choosing kernel template parameters of scan based on memory requirements #1529

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-pass scan kernel template #1320

Single-pass scan kernel template #1320

adamfidel commented Dec 18, 2023 •

edited

Loading

dmitriy-sobolev left a comment

akukanov Apr 23, 2024

adamfidel Apr 23, 2024

adamfidel Apr 23, 2024

dmitriy-sobolev left a comment

danhoeflinger left a comment •

edited

Loading

dmitriy-sobolev left a comment

adamfidel commented Apr 24, 2024

Single-pass scan kernel template #1320

Single-pass scan kernel template #1320

Conversation

adamfidel commented Dec 18, 2023 • edited Loading

dmitriy-sobolev left a comment

Choose a reason for hiding this comment

akukanov Apr 23, 2024

Choose a reason for hiding this comment

adamfidel Apr 23, 2024

Choose a reason for hiding this comment

adamfidel Apr 23, 2024

Choose a reason for hiding this comment

dmitriy-sobolev left a comment

Choose a reason for hiding this comment

danhoeflinger left a comment • edited Loading

Choose a reason for hiding this comment

dmitriy-sobolev left a comment

Choose a reason for hiding this comment

adamfidel commented Apr 24, 2024

adamfidel commented Dec 18, 2023 •

edited

Loading

danhoeflinger left a comment •

edited

Loading