-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary operations with Arrays of different memory layout #681
Comments
Yep, this is a trade-off that I made at the outset, for two reasons:
It's obviously embarrassing that adding different-stride arrays silently gives wrong results, and I've been meaning to at least add a (C-based, to avoid a big impact on said hot code path) check to raise an error. PRs are definitely welcome along those lines. That said, even if this issue were fixed, eagerly evaluating array calculations is still not a winning move, so I'm not super likely to devote much more attention than adding that check, cf. https://github.com/inducer/pytato/ for a possible remedy. |
Thanks for the clarification. It is clear to me that the array arithmetic is not optimal in regard with performance - but convenient to use. I consider moving my code to pytorch or JAX for this purpose. To add a check that throws an error if memory layout is different, I ended up patching PyOpenCL _pyopencl_array_get_broadcasted_binary_op_result_SAVED = None
def patch_pyopencl_array():
import pyopencl.array
global _pyopencl_array_get_broadcasted_binary_op_result_SAVED
if _pyopencl_array_get_broadcasted_binary_op_result_SAVED is None:
_pyopencl_array_get_broadcasted_binary_op_result_SAVED = pyopencl.array._get_broadcasted_binary_op_result
def _get_broadcasted_binary_op_result_PATCHED(obj1, obj2, cq, dtype_getter=pyopencl.array._get_common_dtype):
if obj1.strides != obj2.strides:
raise NotImplementedError('Memory layouts (strides) of arrays are not the same')
else:
return _pyopencl_array_get_broadcasted_binary_op_result_SAVED(
obj1, obj2, cq, dtype_getter
)
pyopencl.array._get_broadcasted_binary_op_result = _get_broadcasted_binary_op_result_PATCHED Essentially, it adds a check that the strides are equal. This check is performed for add, mul and the other binary arithmetic operators, but not for the inplace operators. The inplace operators have all a explicit tests for matching shape, a little bit of refactoring would be required to move this into a common method that incorporates the additional check for same strides. Will such a PR be acceptable? Compared to the code already in place I thinks the overhead is acceptable to avoid silent failures. I am wondering if it sensible to make the check more general, similar to strides_equal Gregor |
For pyopencl.array.Array I get wrong results for binary operations such as adding if the memory layout differs for both arguments. Say, "A" is a C-contiguous, and "AF" is F-contiguous, then adding A+AF gives unexpected results. See this notebook for an example (tested with PyOpenCL 2022.3.1)
Is this intended to work? Looking quickly at the source code, I could not see that the strides are taken into account. It seems that the underlying arrays are added in memory order.
Thanks
Gregor
The text was updated successfully, but these errors were encountered: