Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix major bug in TeamBitonicSort2. #544

Merged
merged 6 commits into from
Dec 18, 2019

Conversation

brian-kelley
Copy link
Contributor

Fix major bug/typo in BitonicSort2 where a key was compared with itself, so the swap was never happening.

Add Test_DEVICE_Common_Sorting.o to unit test executables in
Makefile-based build.

Improve test coverage in sorting tests that would have caught this, rather than relying on spadd tests to catch it indirectly. Fixed spadd tests to actually verify result indices are sorted.

Add Test_DEVICE_Common_Sorting.o to unit test executables in
Makefile-based build. I think this is why the bug wasn't caught before.
@brian-kelley
Copy link
Contributor Author

Running cmake-based spot checks now (bowman/white).

Copy link
Contributor

@ndellingwood ndellingwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks for the fast fix @brian-kelley !

Pthreads sparse unit test was timing out (1500 seconds) on bowman
with double and complex as scalars.
- sorting tests: Fix "calling host fn from host/device fn" warning
- gemm: Work around compiler bug in GCC 7.4 + CUDA
Copy link
Contributor

@mhoemmen mhoemmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Brian! :-D Should we also patch Trilinos with this fix?

@@ -99,11 +99,11 @@ struct impl_deep_copy_matrix_block<TeamHandle,ViewTypeScratch,ViewType,Layout,bl
} else {
Kokkos::parallel_for(Kokkos::TeamThreadRange(team,blockDim_j), [&] (const int j) {
#ifndef KOKKOS_IMPL_BATCHED_GEMM_GCC_CXX14_WORKAROUND
const int idx_j = offset_j+j;
int idx_j = offset_j+j;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI for other readers: See #543 (comment) .

{
ValueType temp = values[elem1];
ValueType temp1 = values[elem1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have been a good place for a Kokkos::swap, as a temporary replacement for std::swap. It could even be that CUDA supports std::swap on device now.

@brian-kelley
Copy link
Contributor Author

@mhoemmen Luckily, none of the original buggy files are in Trilinos. @ndellingwood discovered the problems when putting together the Kokkos 3.0 promotion branch, so as soon as Tpetra was calling spadd with the buggy add a bunch of Tpetra MatrixMatrix tests started failing.

@brian-kelley
Copy link
Contributor Author

@ndellingwood Argh, I just found out that the sorting test still fails if multiple devices are enabled (e.g. Serial+Cuda) on serial, because TestExecSpace is defined to be Cuda in all the executables. I'll try to emulate the sparse unittests to do proper ETI for tests. The actual algorithms are OK, so you should be able to continue with testing Tpetra/MueLu.

@ndellingwood
Copy link
Contributor

ndellingwood commented Dec 17, 2019

if multiple devices are enabled (e.g. Serial+Cuda) on serial, because TestExecSpace is defined to be Cuda in all the executables

@brian-kelley are components within the tests assuming a default execution space (in which case it would be Cuda for Cuda+Serial etc)? In that case, the policy etc. should be templated on TestExecSpace and not assume the default. In general, the TestExecSpace should be Cuda for the KokkosKernels_{tests}_cuda exe, and Serial for the KokkosKernels_{tests}_serial exe, maybe there is a copy/paste in some testing .cpp?

call RadixSort in a RangePolicy, since it no longer uses
ThreadVectorRange loops internally
@brian-kelley
Copy link
Contributor Author

@ndellingwood It ended up being a problem in my CMake configuration, and it's working now for me locally. Hopefully the spot checks make it all the way through now.

@brian-kelley
Copy link
Contributor Author

Spot checks:
Bowman
#######################################################
PASSED TESTS
#######################################################
intel-16.4.258-Pthread-release build_time=711 run_time=967
intel-16.4.258-Pthread_Serial-release build_time=1020 run_time=1817
intel-16.4.258-Serial-release build_time=706 run_time=863
intel-17.2.174-OpenMP-release build_time=880 run_time=241
intel-17.2.174-OpenMP_Serial-release build_time=1196 run_time=1027
intel-17.2.174-Pthread-release build_time=795 run_time=871
intel-17.2.174-Pthread_Serial-release build_time=1093 run_time=1583
intel-17.2.174-Serial-release build_time=775 run_time=757
intel-18.2.199-OpenMP-release build_time=880 run_time=246
intel-18.2.199-OpenMP_Serial-release build_time=1221 run_time=1009
intel-18.2.199-Pthread-release build_time=669 run_time=831
intel-18.2.199-Pthread_Serial-release build_time=947 run_time=1597
intel-18.2.199-Serial-release build_time=649 run_time=728

White:
#######################################################
PASSED TESTS
#######################################################
cuda-9.2.88-Cuda_OpenMP-release build_time=509 run_time=445
cuda-9.2.88-Cuda_Serial-release build_time=526 run_time=578
gcc-6.4.0-OpenMP_Serial-release build_time=194 run_time=316
gcc-7.2.0-OpenMP-release build_time=123 run_time=85
gcc-7.2.0-OpenMP_Serial-release build_time=215 run_time=314
gcc-7.2.0-Serial-release build_time=113 run_time=213
ibm-16.1.0-Serial-release build_time=487 run_time=330
#######################################################
FAILED TESTS
#######################################################
ibm-16.1.0-OpenMP-release (test failed)
ibm-16.1.0-OpenMP_Serial-release (test failed)

In the IBM builds, blas_openmp and blas_serial failed, but that's not related to these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants