Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEMV unit test failures on Ponte Vecchio with oneMKL #2063

Open
cwpearson opened this issue Dec 7, 2023 · 1 comment
Open

GEMV unit test failures on Ponte Vecchio with oneMKL #2063

cwpearson opened this issue Dec 7, 2023 · 1 comment
Labels

Comments

@cwpearson
Copy link
Contributor

The tests are failing on Blake with the Ponte Vecchio GPUs with oneMKL.
All failures are when beta=0: a possible cause is beta=0 is not clearing the input vector of NaN.

clone Kokkos to kokkos and Kokkos Kernels to kokkos-kernels

export KOKKOS_SRC="kokkos"
export KOKKOS_BUILD="kokkos-build"
export KOKKOS_INSTALL="kokkos-install"

export KERNELS_SRC="kokkos-kernels"
export KERNELS_BUILD="kokkos-kernels-build"

source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module load cmake
module load intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 intel-oneapi-mkl/2023.1.0
export ZES_ENABLE_SYSMAN=1

## Configure Kokkos
cmake -S "$KOKKOS_SRC" -B "$KOKKOS_BUILD" \
-DCMAKE_INSTALL_PREFIX="$KOKKOS_INSTALL" \
-DCMAKE_CXX_STANDARD=17 \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_COMPILER=icpx \
-DKokkos_ENABLE_SYCL=ON \
-DKokkos_ARCH_SPR=ON \
-DKokkos_ARCH_INTEL_PVC=ON \
-DKokkos_ENABLE_ONEDPL=OFF \
-DCMAKE_CXX_FLAGS="-fp-model=precise -fno-finite-math-only -mavx512f" \
-DBUILD_SHARED_LIBS=ON

## Build & Install Kokkos
cmake --build "$KOKKOS_BUILD" -j "$(nproc)" -t install

## Configure Kernels
cmake -S "$KERNELS_SRC" -B "$KERNELS_BUILD" \
-DKokkos_DIR="$KOKKOS_INSTALL/lib64/cmake/Kokkos" \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_COMPILER=icpx \
-DKokkosKernels_ENABLE_TESTS=ON \
-DKokkosKernels_ENABLE_PERFTESTS=ON \
-DKokkosKernels_ENABLE_BENCHMARK=ON \
-DKokkosKernels_ENABLE_TPL_MKL=ON \
-DCMAKE_CXX_FLAGS="-fp-model=precise -mavx512f" \
-DBUILD_SHARED_LIBS=ON

## Build Kernels
VERBOSE=1 make -C "$KERNELS_BUILD/blas/unit_test" -j "$(nproc)" 

srun -N 1 -p PV --exclude=blake15 -n1 -t 60 ctest --test-dir "$KERNELS_BUILD" -V -R blas |& tee "$LOG_DIR/ctest.log"
9: [ RUN      ] sycl_test.gemv_double
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(0)=-1.50128, h_y(0)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(1)=1.1445, h_y(1)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(2)=0.0141285, h_y(2)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(3)=-0.466376, h_y(3)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(4)=3.11782, h_y(4)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(5)=-2.10281, h_y(5)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(6)=-3.70048, h_y(6)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(7)=-3.07327, h_y(7)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(8)=-1.42057, h_y(8)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(9)=1.28458, h_y(9)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(10)=2.60594, h_y(10)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(11)=-5.60001, h_y(11)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(12)=-0.193979, h_y(12)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
9: /projects/cwpears/kk-mkl/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:147: Failure
9: Value of: 0
9: Expected: numErrors
9: Which is: 13
9: beta = 0, input contains NaN, A is 13x13, mode N: gemv incorrect
@cwpearson cwpearson added the bug label Dec 7, 2023
@cwpearson
Copy link
Contributor Author

@lucbv says this is reported to Intel, in the mean time we can apply beta=0 to the input vector ourselves as a workaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant