-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of reduction for small number of elements to reduce for types where tree-reduction is needed #1463
Conversation
1. Renamed misspelled variable 2. If reduction_nelems is small, used SequentialReductionKernel for tree-reductions as it is done for atomic reduction 3. Tweak scaling down logic for moderately-sized number of elements to reduce. We should also use max_wg if the iter_nelems is very small (one), since choosing max_wg for large iter_nelems may lead to under- utilization of GPU.
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1463/index.html |
Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_75 ran successfully. |
_tensor_impl continues holding constructors, where, clip _tensor_elementwise_impl holds elementwise functions _tensor_reductions_impl holds reduction functions.
The PR resolves the performance issue gh-1461, L2 norm benchmark is back to expected values. Thank you! |
Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_78 ran successfully. |
Added stable API to retrieve implementation functions in each elementwise function class instance to allow `dpnp` to access that information using stable API.
…at types Added entries for float and double types to TypePairSupportDataForCompReductionAtomic as spotted by @ndgrigorian in the PR review. Also moved comments around.
This removes use of dpnp.matmul from the example, making this example self-contained.
@oleksandr-pavlyk
|
Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_79 ran successfully. |
…ts (#1464) * Adds SequentialSearchReduction functor to search reductions * Search reductions use correct branch for float16 constexpr branch logic accounted for floating point types but not sycl::half, which meant NaNs were not propagating for float16 data
Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_81 ran successfully. |
These includes in |
Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_82 ran successfully. |
Thank you @ndgrigorian, good catch! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @oleksandr-pavlyk, LGTM
Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_84 ran successfully. |
Renamed misspelled variable
If
reduction_nelems
is small, useSequentialReductionKernel
for tree-reductions as it is done for atomic reductionTweak scaling down logic for moderately-sized number of elements to reduce. Closes Sensible performance degradation in
dpt.tensor.sum
#1461We should also use
max_wg
if theiter_nelems
is very small (one), since choosingmax_wg
for largeiter_nelems
may lead to under-utilization of GPU.