Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gemv: call fallback impl in transpose mode with M==0 #551

Merged
merged 2 commits into from
Dec 20, 2019
Merged

Conversation

ndellingwood
Copy link
Contributor

Address issue #540, problems with degenerate matrix cases when
blas tpls enabled.

Address issue #540, problems with degenerate matrix cases when
blas tpls enabled.
@ndellingwood
Copy link
Contributor Author

Detected with testing on Waterman and Blake testbeds when tpls are enabled (the default case in Trilinos), fix needed for 3.0 release.

@ndellingwood
Copy link
Contributor Author

Tests now passing in Trilinos integration test builds on waterman.

Copy link
Contributor

@brian-kelley brian-kelley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@mhoemmen mhoemmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the fall-back implementation host-only, or does it do the right thing with device Views?

@ndellingwood
Copy link
Contributor Author

ndellingwood commented Dec 20, 2019

@mhoemmen it will fallback in either case, I can update it to only do so for host execution.

fix unused variable causing -Werror
Fix for #539
@ndellingwood
Copy link
Contributor Author

@mhoemmen fixed with recent commit to not use fallback when Cuda is the exec space, thanks for the catch.

@ndellingwood
Copy link
Contributor Author

Spot-check passed on kokkos-dev-2:

Running on machine: kokkos-dev-2
Going to test compilers:  gcc/7.3.0 gcc/9.1 intel/18.0.5 clang/8.0 cuda/10.1
Testing compiler gcc/7.3.0
Testing compiler gcc/9.1
  Starting job gcc-7.3.0-OpenMP-release
  Starting job gcc-7.3.0-Pthread-release
  PASSED gcc-7.3.0-Pthread-release
  Starting job gcc-9.1-OpenMP-release
  PASSED gcc-7.3.0-OpenMP-release
Testing compiler intel/18.0.5
  Starting job gcc-9.1-Serial-release
  PASSED gcc-9.1-OpenMP-release
Testing compiler clang/8.0
  Starting job intel-18.0.5-OpenMP-release
  PASSED gcc-9.1-Serial-release
  Starting job clang-8.0-Cuda_OpenMP-release
  PASSED intel-18.0.5-OpenMP-release
Testing compiler cuda/10.1
  Starting job clang-8.0-Pthread_Serial-release
  PASSED clang-8.0-Cuda_OpenMP-release
  PASSED clang-8.0-Pthread_Serial-release
  Starting job cuda-10.1-Cuda_OpenMP-release
  PASSED cuda-10.1-Cuda_OpenMP-release
#######################################################
PASSED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release build_time=587 run_time=515
clang-8.0-Pthread_Serial-release build_time=240 run_time=334
cuda-10.1-Cuda_OpenMP-release build_time=630 run_time=482
gcc-7.3.0-OpenMP-release build_time=235 run_time=175
gcc-7.3.0-Pthread-release build_time=219 run_time=178
gcc-9.1-OpenMP-release build_time=249 run_time=160
gcc-9.1-Serial-release build_time=227 run_time=174
intel-18.0.5-OpenMP-release build_time=435 run_time=159
#######################################################

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants