-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix major bug in TeamBitonicSort2. #544
Conversation
Add Test_DEVICE_Common_Sorting.o to unit test executables in Makefile-based build. I think this is why the bug wasn't caught before.
Running cmake-based spot checks now (bowman/white). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thanks for the fast fix @brian-kelley !
Pthreads sparse unit test was timing out (1500 seconds) on bowman with double and complex as scalars.
- sorting tests: Fix "calling host fn from host/device fn" warning - gemm: Work around compiler bug in GCC 7.4 + CUDA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Brian! :-D Should we also patch Trilinos with this fix?
@@ -99,11 +99,11 @@ struct impl_deep_copy_matrix_block<TeamHandle,ViewTypeScratch,ViewType,Layout,bl | |||
} else { | |||
Kokkos::parallel_for(Kokkos::TeamThreadRange(team,blockDim_j), [&] (const int j) { | |||
#ifndef KOKKOS_IMPL_BATCHED_GEMM_GCC_CXX14_WORKAROUND | |||
const int idx_j = offset_j+j; | |||
int idx_j = offset_j+j; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI for other readers: See #543 (comment) .
{ | ||
ValueType temp = values[elem1]; | ||
ValueType temp1 = values[elem1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would have been a good place for a Kokkos::swap
, as a temporary replacement for std::swap
. It could even be that CUDA supports std::swap
on device now.
@mhoemmen Luckily, none of the original buggy files are in Trilinos. @ndellingwood discovered the problems when putting together the Kokkos 3.0 promotion branch, so as soon as Tpetra was calling spadd with the buggy add a bunch of Tpetra MatrixMatrix tests started failing. |
@ndellingwood Argh, I just found out that the sorting test still fails if multiple devices are enabled (e.g. Serial+Cuda) on serial, because TestExecSpace is defined to be Cuda in all the executables. I'll try to emulate the sparse unittests to do proper ETI for tests. The actual algorithms are OK, so you should be able to continue with testing Tpetra/MueLu. |
@brian-kelley are components within the tests assuming a default execution space (in which case it would be Cuda for Cuda+Serial etc)? In that case, the policy etc. should be templated on |
call RadixSort in a RangePolicy, since it no longer uses ThreadVectorRange loops internally
dd48eda
to
1ca656a
Compare
@ndellingwood It ended up being a problem in my CMake configuration, and it's working now for me locally. Hopefully the spot checks make it all the way through now. |
Spot checks: White: In the IBM builds, blas_openmp and blas_serial failed, but that's not related to these changes. |
Fix major bug/typo in BitonicSort2 where a key was compared with itself, so the swap was never happening.
Add Test_DEVICE_Common_Sorting.o to unit test executables in
Makefile-based build.
Improve test coverage in sorting tests that would have caught this, rather than relying on spadd tests to catch it indirectly. Fixed spadd tests to actually verify result indices are sorted.