Improve performance of reduction for small number of elements to reduce for types where tree-reduction is needed #1463

oleksandr-pavlyk · 2023-11-02T16:19:51Z

Renamed misspelled variable
If reduction_nelems is small, use SequentialReductionKernel for tree-reductions as it is done for atomic reduction
Tweak scaling down logic for moderately-sized number of elements to reduce. Closes Sensible performance degradation in dpt.tensor.sum #1461

We should also use max_wg if the iter_nelems is very small (one), since choosing max_wg for large iter_nelems may lead to under-utilization of GPU.

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
If this PR is a work in progress, are you opening the PR as a draft?

1. Renamed misspelled variable 2. If reduction_nelems is small, used SequentialReductionKernel for tree-reductions as it is done for atomic reduction 3. Tweak scaling down logic for moderately-sized number of elements to reduce. We should also use max_wg if the iter_nelems is very small (one), since choosing max_wg for large iter_nelems may lead to under- utilization of GPU.

github-actions · 2023-11-02T16:49:03Z

View rendered docs @ https://intelpython.github.io/dpctl/pulls/1463/index.html

github-actions · 2023-11-02T18:29:30Z

Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_75 ran successfully.
Passed: 935
Failed: 65
Skipped: 119

_tensor_impl continues holding constructors, where, clip _tensor_elementwise_impl holds elementwise functions _tensor_reductions_impl holds reduction functions.

antonwolfy · 2023-11-02T22:57:50Z

The PR resolves the performance issue gh-1461, L2 norm benchmark is back to expected values. Thank you!

coveralls · 2023-11-03T01:11:50Z

coverage: 85.785% (+0.04%) from 85.748%
when pulling d4d4992 on optimize-small-size-tree-reduction
into 11ecba8 on master.

github-actions · 2023-11-03T01:35:34Z

Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_78 ran successfully.
Passed: 935
Failed: 65
Skipped: 119

Added stable API to retrieve implementation functions in each elementwise function class instance to allow `dpnp` to access that information using stable API.

@ndgrigorian

…at types Added entries for float and double types to TypePairSupportDataForCompReductionAtomic as spotted by @ndgrigorian in the PR review. Also moved comments around.

This removes use of dpnp.matmul from the example, making this example self-contained.

ndgrigorian · 2023-11-03T16:57:36Z

@oleksandr-pavlyk
I realized while looking through reductions.hpp that floating-point types are not included in TypePairSupportDataForCompReductionAtomic, so even though we permit atomics for max and min on floating-point arrays, there aren't any functions to call in the table.

float and double should be added to fix this.

github-actions · 2023-11-03T17:33:23Z

Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_79 ran successfully.
Passed: 935
Failed: 65
Skipped: 119

…ts (#1464) * Adds SequentialSearchReduction functor to search reductions * Search reductions use correct branch for float16 constexpr branch logic accounted for floating point types but not sycl::half, which meant NaNs were not propagating for float16 data

github-actions · 2023-11-03T20:39:03Z

Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_81 ran successfully.
Passed: 935
Failed: 65
Skipped: 119

dpctl/tensor/_elementwise_common.py

ndgrigorian · 2023-11-03T21:29:15Z

dpctl/dpctl/tensor/libtensor/source/tensor_ctors.cpp

Line 40 in af28d98

#include "boolean_reductions.hpp"

dpctl/dpctl/tensor/libtensor/source/tensor_ctors.cpp

Line 51 in af28d98

#include "reductions/reduction_common.hpp"

These includes in tensor_ctors.cpp should be unnecessary with these changes.

github-actions · 2023-11-03T21:49:48Z

Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_82 ran successfully.
Passed: 935
Failed: 65
Skipped: 119

oleksandr-pavlyk · 2023-11-04T00:02:07Z

dpctl/dpctl/tensor/libtensor/source/tensor_ctors.cpp

Line 40 in af28d98

#include "boolean_reductions.hpp"

dpctl/dpctl/tensor/libtensor/source/tensor_ctors.cpp

Line 51 in af28d98

#include "reductions/reduction_common.hpp"

These includes in tensor_ctors.cpp should be unnecessary with these changes.

Thank you @ndgrigorian, good catch!

ndgrigorian

Thank you @oleksandr-pavlyk, LGTM

github-actions · 2023-11-04T01:34:58Z

Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_84 ran successfully.
Passed: 935
Failed: 65
Skipped: 119

oleksandr-pavlyk added 2 commits November 2, 2023 05:44

Apply SequentialReductionKernel to axis0 reduction

6a0b09c

oleksandr-pavlyk requested review from antonwolfy and ndgrigorian November 2, 2023 17:16

oleksandr-pavlyk added 2 commits November 2, 2023 14:05

Split _tensor_impl into three extensions

f74eae0

_tensor_impl continues holding constructors, where, clip _tensor_elementwise_impl holds elementwise functions _tensor_reductions_impl holds reduction functions.

Used new native extension modules

421b270

oleksandr-pavlyk marked this pull request as ready for review November 3, 2023 00:31

oleksandr-pavlyk added 4 commits November 3, 2023 02:12

Added docstrings and getter methods for ElementwiseFunc classes

41ec378

Added stable API to retrieve implementation functions in each elementwise function class instance to allow `dpnp` to access that information using stable API.

Instantiate atomic reduction templates for min/max ops for double/flo…

645044a

…at types Added entries for float and double types to TypePairSupportDataForCompReductionAtomic as spotted by @ndgrigorian in the PR review. Also moved comments around.

Modified sycl_timer example to use dpctl.tensor function

097ecf5

This removes use of dpnp.matmul from the example, making this example self-contained.

Fixed misspelled words

d4d4992

ndgrigorian and others added 2 commits November 3, 2023 13:19

Remove superfluous includes in tensor_ctors.cpp per PR review

eb21e50

ndgrigorian reviewed Nov 3, 2023

View reviewed changes

dpctl/tensor/_elementwise_common.py Outdated Show resolved Hide resolved

ndgrigorian reviewed Nov 3, 2023

View reviewed changes

dpctl/tensor/_elementwise_common.py Outdated Show resolved Hide resolved

ndgrigorian approved these changes Nov 4, 2023

View reviewed changes

oleksandr-pavlyk merged commit 9018745 into master Nov 4, 2023
26 checks passed

oleksandr-pavlyk deleted the optimize-small-size-tree-reduction branch November 4, 2023 03:11

This was referenced Nov 4, 2023

update elementwise call API IntelPython/dpnp#1617

Merged

in place divide and floor_divide IntelPython/dpnp#1587

Merged

oleksandr-pavlyk mentioned this pull request Jan 26, 2024

Populated changelog of new features since 0.15.0 release #1510

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of reduction for small number of elements to reduce for types where tree-reduction is needed #1463

Improve performance of reduction for small number of elements to reduce for types where tree-reduction is needed #1463

oleksandr-pavlyk commented Nov 2, 2023 •

edited

Loading

github-actions bot commented Nov 2, 2023

github-actions bot commented Nov 2, 2023

antonwolfy commented Nov 2, 2023

coveralls commented Nov 3, 2023 •

edited

Loading

github-actions bot commented Nov 3, 2023

ndgrigorian commented Nov 3, 2023

github-actions bot commented Nov 3, 2023

github-actions bot commented Nov 3, 2023

ndgrigorian commented Nov 3, 2023

github-actions bot commented Nov 3, 2023

oleksandr-pavlyk commented Nov 4, 2023 •

edited

Loading

ndgrigorian left a comment

github-actions bot commented Nov 4, 2023

Improve performance of reduction for small number of elements to reduce for types where tree-reduction is needed #1463

Improve performance of reduction for small number of elements to reduce for types where tree-reduction is needed #1463

Conversation

oleksandr-pavlyk commented Nov 2, 2023 • edited Loading

github-actions bot commented Nov 2, 2023

github-actions bot commented Nov 2, 2023

antonwolfy commented Nov 2, 2023

coveralls commented Nov 3, 2023 • edited Loading

github-actions bot commented Nov 3, 2023

ndgrigorian commented Nov 3, 2023

github-actions bot commented Nov 3, 2023

github-actions bot commented Nov 3, 2023

ndgrigorian commented Nov 3, 2023

github-actions bot commented Nov 3, 2023

oleksandr-pavlyk commented Nov 4, 2023 • edited Loading

ndgrigorian left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 4, 2023

oleksandr-pavlyk commented Nov 2, 2023 •

edited

Loading

coveralls commented Nov 3, 2023 •

edited

Loading

oleksandr-pavlyk commented Nov 4, 2023 •

edited

Loading