Fix fallback gemm for complex scalar types #550

seheracer · 2019-12-18T21:47:41Z

Description:
This PR fixes the fallback gemm resulting in large diffs with complex scalar type when the entries in the input random matrices have nonzero imaginary parts.

This defect was exposed when the upper bound of the random number generator (for A, B, and C matrices) in KokkosKernels gemm unit test was changed from 10 to Kokkos::Random_XorShift64_Pool<...>::max(), which is equal to 1+1j for complex<double>. This PR includes this change too.

Testing:
[sacer@white29 testing]$ ../scripts/test_all_sandia --spot-check --arch=Power8,Pascal60

Running on machine: white
Going to test compilers:  gcc/6.4.0 gcc/7.2.0 gcc/7.4.0 ibm/16.1.0 cuda/9.2.88 cuda/10.0.130
Testing compiler gcc/6.4.0
Testing compiler gcc/7.2.0
  Starting job gcc-6.4.0-OpenMP_Serial-release
  Starting job gcc-7.2.0-OpenMP-release
  PASSED gcc-7.2.0-OpenMP-release
  Starting job gcc-7.2.0-Serial-release
  PASSED gcc-6.4.0-OpenMP_Serial-release
Testing compiler gcc/7.4.0
  Starting job gcc-7.2.0-OpenMP_Serial-release
  PASSED gcc-7.2.0-Serial-release
Testing compiler ibm/16.1.0
  Starting job gcc-7.4.0-OpenMP-release
  PASSED gcc-7.4.0-OpenMP-release
Testing compiler cuda/9.2.88
  Starting job ibm-16.1.0-Serial-release
  PASSED gcc-7.2.0-OpenMP_Serial-release
  PASSED ibm-16.1.0-Serial-release
Testing compiler cuda/10.0.130
  Starting job cuda-9.2.88-Cuda_OpenMP-release
  PASSED cuda-9.2.88-Cuda_OpenMP-release
  Starting job cuda-10.0.130-Cuda_Serial-release
  PASSED cuda-10.0.130-Cuda_Serial-release
#######################################################
PASSED TESTS
#######################################################
cuda-10.0.130-Cuda_Serial-release build_time=1203 run_time=1057
cuda-9.2.88-Cuda_OpenMP-release build_time=1269 run_time=685
gcc-6.4.0-OpenMP_Serial-release build_time=587 run_time=1048
gcc-7.2.0-OpenMP-release build_time=444 run_time=299
gcc-7.2.0-OpenMP_Serial-release build_time=605 run_time=1359
gcc-7.2.0-Serial-release build_time=276 run_time=638
gcc-7.4.0-OpenMP-release build_time=385 run_time=300
ibm-16.1.0-Serial-release build_time=1439 run_time=968
#######################################################
FAILED TESTS
#######################################################

…dom numbers are allowed

ndellingwood

Great catch, thanks for reintroducing the test improvements!

vqd8a · 2019-12-18T22:05:45Z

@ndellingwood @seheracer I have just noticed that Kokkos::rand<typename Kokkos::Random_XorShift64_Pool<execution_space>::generator_type,ScalarA>::max()= 1 when ScalarA is Kokkos::complex<double>. Am I correct? So you just did reduce from 10 to 1.

ndellingwood · 2019-12-18T22:10:27Z

@vqd8a in the code @seheracer is using Kokkos::Random_XorShift64_Pool<execution_space>::generator_type,ScalarA>::max(); in the fill_random calls, which were previously using 10 (cast to the scalar type). Where is the value 1 you are concerned about?

vqd8a · 2019-12-18T22:14:07Z

@ndellingwood It was Kokkos::fill_random(A, rand_pool,ScalarA(10)).
Now it is Kokkos::fill_random(A, rand_pool, Kokkos::rand<Kokkos::Random_XorShift64<execution_space>,Kokkos::complex<double> >::max()) where Kokkos::rand<Kokkos::Random_XorShift64<execution_space>,Kokkos::complex<double> >::max()=1
So, I think we actually reduce from 10 to 1, not increasing. Correct me if I am wrong.

seheracer · 2019-12-18T22:14:43Z

@vqd8a This is very surprising to me! Changing it from 10 to max results in larger diffs so that several bugs were exposed. That's why I thought max should be larger than 10.

Vinh, can you point us to the code where you saw "1" as the returned value for max? Thanks!

vqd8a · 2019-12-18T22:16:27Z

@seheracer I just printed out Kokkos::rand<Kokkos::Random_XorShift64<execution_space>,Kokkos::complex<double> >::max() and saw that value. Actually it is 1+1j in the complex case.

ndellingwood · 2019-12-18T22:18:07Z

@crtrott can you confirm what Vinh is seeing is as expected?

Kokkos::rand<Kokkos::Random_XorShift64<execution_space>,Kokkos::complex<double> >::max() and saw that value. Actually it is 1+1j in the complex case.

vqd8a · 2019-12-18T22:21:55Z

I think it is 1+1j. See the file https://github.com/kokkos/kokkos/blob/develop/algorithms/src/Kokkos_Random.hpp, around line 511 to 515

crtrott · 2019-12-18T22:45:12Z

That is correct. So probably the additional exposure happens because you now have actual complex values, instead of complex values where the imaginary part is zero.

seheracer · 2019-12-18T22:53:06Z

Thanks @vqd8a @crtrott @ndellingwood !!

I think having the upper bound this way (1+1j) is better than 10 to catch errors concerning the imaginary parts. In addition, this is how Tpetra is testing gemm, so the tests are consistent now. I think we should keep ::max() instead of 10.

mhoemmen · 2019-12-19T16:57:36Z

@seheracer wrote:

In addition, this is how Tpetra is testing gemm, so the tests are consistent now.

Wait, Tpetra is calling a hand-rolled GEMM ? Tpetra only tests for types that the BLAS and cuBLAS supports. It's great that we have a fall-back implementation, but why aren't we calling cuBLAS?

Fix fallback gemm for failing tests with complex types when large ran…

57c7a26

…dom numbers are allowed

seheracer requested a review from ndellingwood December 18, 2019 21:47

seheracer added the bug fixed-in-develop label Dec 18, 2019

seheracer self-assigned this Dec 18, 2019

ndellingwood approved these changes Dec 18, 2019

View reviewed changes

ndellingwood merged commit e63acda into kokkos:develop Dec 18, 2019

seheracer deleted the fix_fallback_gemm branch December 18, 2019 21:58

brian-kelley mentioned this pull request Dec 19, 2019

Various complex<double> unit test fails with XL/16.1 OpenMP build #344

Open

seheracer mentioned this pull request Dec 19, 2019

KokkosKernels: Fix multiple gemm issues with complex type trilinos/Trilinos#6472

Merged

ndellingwood mentioned this pull request Jan 28, 2020

Kokkos + KokkosKernels Promotion To 2.9.99 trilinos/Trilinos#6671

Merged

This was referenced May 16, 2024

Fix no return warnings #2203

Closed

Interface for LAPACK geqrf() #2205

Open

c++17: add [[fallthrough]] attribute #1493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fallback gemm for complex scalar types #550

Fix fallback gemm for complex scalar types #550

seheracer commented Dec 18, 2019 •

edited

Loading

ndellingwood left a comment

vqd8a commented Dec 18, 2019

ndellingwood commented Dec 18, 2019

vqd8a commented Dec 18, 2019

seheracer commented Dec 18, 2019 •

edited

Loading

vqd8a commented Dec 18, 2019

ndellingwood commented Dec 18, 2019

vqd8a commented Dec 18, 2019

crtrott commented Dec 18, 2019

seheracer commented Dec 18, 2019

mhoemmen commented Dec 19, 2019

Fix fallback gemm for complex scalar types #550

Fix fallback gemm for complex scalar types #550

Conversation

seheracer commented Dec 18, 2019 • edited Loading

ndellingwood left a comment

Choose a reason for hiding this comment

vqd8a commented Dec 18, 2019

ndellingwood commented Dec 18, 2019

vqd8a commented Dec 18, 2019

seheracer commented Dec 18, 2019 • edited Loading

vqd8a commented Dec 18, 2019

ndellingwood commented Dec 18, 2019

vqd8a commented Dec 18, 2019

crtrott commented Dec 18, 2019

seheracer commented Dec 18, 2019

mhoemmen commented Dec 19, 2019

seheracer commented Dec 18, 2019 •

edited

Loading

seheracer commented Dec 18, 2019 •

edited

Loading