-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dedicated kernels for in-place dpt.divide
and dpt.floor_divide
#1431
Conversation
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1431/index.html |
Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_8 ran successfully. |
dpctl/tensor/libtensor/include/kernels/elementwise_functions/true_divide.hpp
Show resolved
Hide resolved
dpctl/tensor/libtensor/include/kernels/elementwise_functions/true_divide.hpp
Show resolved
Hide resolved
Includes floor division and true division
Checks that the result type is either the same as the third template parameter, or none Adds a comment to TrueDivideInplaceOutputType
3ed1d66
to
237b2d0
Compare
Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_33 ran successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ndgrigorian for implementing this!
This pull request implements kernels for in-place division and floor division, i.e.,
dpt.divide(x, y, out=x)
anddpt.floor_divide(x, y, out=x)
.This avoids allocating an additional buffer for the output and copying back to the first operand.