Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single-pass scan kernel template #1320
Single-pass scan kernel template #1320
Changes from 30 commits
58c2639
e0a676d
956f139
deb92cb
fd0af78
590b1c0
5f4069a
6d4aa3d
7e32a6f
1fcdba4
db77c7d
0228724
b911bc0
8dacc2f
69b5afe
d88abb1
53e17b1
ce2bb8a
d67d579
6d7929a
4b01e64
6e97ed2
063ac92
ec5a47c
217a848
28ab915
d48e6d6
c2a6686
9cf1ec1
9d621a9
605cc42
c747f75
b5136ce
96c25ef
5a7626e
de2061b
d78db9c
d4f019f
64edea3
e272b18
cdbec2d
3e57dbe
e55c491
b944c2b
49dea3f
5262700
e7c32cb
5a6352f
85c730c
f135477
4e65b42
f8f8508
c0e22e3
404fcc2
0e7a87c
d6b60a6
a899426
a8caee1
41adacc
247bffb
80e6cd4
0d0efeb
b530eec
849db06
d076f11
eb6fb65
f1e0709
265ab8f
fe18dac
e96c8e9
0df3640
33c8982
e431f01
6bcc32d
5f2fea6
978e0fa
804c8c7
7cd2d63
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intel::..
attribute can be replaced with a more portable abstraction:[[_ONEDPL_SYCL_REQD_SUB_GROUP_SIZE_IF_SUPPORTED(__req_sub_group_size)]]
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage of
[[_ONEDPL_SYCL_REQD_SUB_GROUP_SIZE_IF_SUPPORTED(__req_sub_group_size)]]
could potentially be problematic here since other portions of code assume the subgroup size is always32
. If we want this solution to be device generic, I think we would also need to query the subgroup size at runtime in the kernel to be correct.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think one large
sycl::malloc_device
call and setting__status_vals_full
,__status_vals_partial
, and__status_flags
to offsets within the allocation may be quicker than three separate mallocs?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is better to use a single allocation where possible. I have updated the code to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an overload with an
init
to satisfy our current targets / goals?(If so, this brings up the question of what sort of data do we accept as init: scalar, device_ptr, single element "range" / iterator?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to include a version with a single scalar init value to match the functionality of the original scan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so does that need to go in this PR or do you plan to add that in a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. My goal was to have it implemented in this PR, but it is taking longer to implement + debug correctness than I expected. I am still planning to update this PR with the changes for an initial value, but if it is dragging too long then I might defer it to a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does igpu stand on? I am correct in assuming that this KT is targeted to intel GPUs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is for Intel GPUs as per the design doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it was discussed but we can consider adding it into gpu namespace instead of igpu.
If the implementation does not use any instructions specific to Intel GPUs (and is not expected to use), then it might perform well on other cards due to being highly-configurable.
What do you think about it?