Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly Blas unit test failures with llvm/14, llvm/15 #2005

Closed
ndellingwood opened this issue Oct 18, 2023 · 10 comments
Closed

Nightly Blas unit test failures with llvm/14, llvm/15 #2005

ndellingwood opened this issue Oct 18, 2023 · 10 comments
Assignees
Labels

Comments

@ndellingwood
Copy link
Contributor

ndellingwood commented Oct 18, 2023

The following Blas unit tests fail in Serial builds with debugging when compiled with llvm/15.0.7, openblas/0.3.23

KokkosKernels_blas_serial

./blas/unit_test/KokkosKernels_blas_serial --gtest_filter=serial.asum_complex_double
[ RUN      ] serial.asum_complex_double
/home/ndellin/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 119.541 vs 1.19541e-05
/home/ndellin/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 119.541 vs 1.19541e-05
[  FAILED  ] serial.asum_complex_double (54 ms)

...

./blas/unit_test/KokkosKernels_blas_serial --gtest_filter=serial.nrm1_complex_double
[ RUN      ] serial.nrm1_complex_double
/home/ndellin/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 119.541 vs 1.19541e-05
/home/ndellin/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 119.541 vs 1.19541e-05
[  FAILED  ] serial.nrm1_complex_double (8 ms)

Reproducer (blake "all" queue):

salloc -N 1 -p all

module purge
module load cmake llvm openblas/0.3.23

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-serial --compiler=clang++ --with-tpls=blas,lapack --debug --with-scalars=complex_double,double

llvm/15.0.7 is the only version available on blake, but I'll test with different versions on another machine

@ndellingwood ndellingwood changed the title Blas unit test failures with llvm/15.0.7, openblas/0.3.23 Nightly Blas unit test failures with llvm/15.0.7, openblas/0.3.23 Nov 14, 2023
@ndellingwood
Copy link
Contributor Author

I tested and reproduced this with the following in debug builds:

  • llvm/15.0.7 + openblas/0.3.23 (blake)
  • llvm/15.0.7 + openblas/0.3.20 (blake)
  • sems-clang/14.0.6 + sems-openblas/0.3.21 (kokkos-dev-2)
  • sems-clang/14.0.6 , no tpls

The tests pass with sems-clang/11.0.1

The tests also pass in the cases above with debugging disabled

@ndellingwood ndellingwood changed the title Nightly Blas unit test failures with llvm/15.0.7, openblas/0.3.23 Nightly Blas unit test failures with llvm/14, llvm/15 Jun 4, 2024
@ndellingwood
Copy link
Contributor Author

For reference, the test seems to fail for non-power-of-two cases,

Test::impl_test_asum<view_type_a_ll, Device>(13);
; I modified the test to run with 14 (fail) and 16 (pass); for the failing cases asum returned a value of 0

@cwpearson
Copy link
Contributor

cwpearson commented Oct 15, 2024

I have reproduced thin on blake - vectors of length [1, 15] produce a result of 0. Everything else (0, 16 and larger) work.

I can't reproduce it on my workstation with the same LLVM and OpenBLAS.

@cwpearson
Copy link
Contributor

cwpearson commented Oct 15, 2024

There have been some suspicious changes to asum and/or SPR since OpenBLAS 0.3.23:

  • 0.3.25:
    • added AVX512 optimizations for ?ASUM on Sapphire Rapids and Cooper Lake
  • 0.3.26:
    • fixed computation of CASUM on SkylakeX and newer targets in the special case that AVX512 is not supported by the compiler or operating environment
    • fixed potential undefined behaviour in the CASUM/ZASUM kernels for AVX512 targets
  • 0.3.27:
    • fixed cpu handling fallbacks for Sapphire Rapids with disabled AVX2 in DYNAMIC_ARCH mode

@lucbv
Copy link
Contributor

lucbv commented Oct 15, 2024

Maybe try the latest OpenBLAS to see if the test passes with it? Alternatively you can try BLIS

@ndellingwood
Copy link
Contributor Author

I noticed the test failures with sems-clang/14.0.6 without use of tpls (in addition to failures with various versions of openblas pre-0.3.25), just in case OpenBlas is a red herring

@ndellingwood
Copy link
Contributor Author

Also, building with debug mode was a common factor, the tests passed for me with release mode with sems-clang/14.0.6 and llvm/15.0.7
The tests were not problematic with older clang e.g. sems-clang/11.0.1 (the tests passed in release and debug mode)

@cwpearson
Copy link
Contributor

cwpearson commented Oct 15, 2024

I compiled a few of my own OpenBLAS on Blake (TARGET=SKYLAKEX, I couldn't use Sapphire Rapids due to the compiler being too old, I think).

Version Tests Passed?
0.3.23 No
0.3.25 No
0.3.26 Yes
0.3.28 Yes

I couldn't get the tests to fail with TPLs disabled, and the tests failed regardless of whether I use debug or release build.

@cwpearson
Copy link
Contributor

cwpearson commented Oct 15, 2024

Openblas does define a version macro in openblas_config.h (only if it's installed, not just built). However, the version macro is like this...

#define OPENBLAS_VERSION " OpenBLAS 0.3.26 "

It's possible to write a compile-time string parsing thing in C++ 17 to make a constexpr comparison with this version:

#include <string_view>
#include <array>
#include <tuple>

#define OPENBLAS_VERSION " OpenBLAS 0.3.26 "

struct Version {
    int major;
    int minor;
    int patch;
};

constexpr int parse_int(std::string_view sv) {
    int result = 0;
    for (char c : sv) {
        if (c >= '0' && c <= '9') {
            result = result * 10 + (c - '0');
        } else {
            break;
        }
    }
    return result;
}

constexpr Version parse_version(std::string_view sv) {
    Version v{0, 0, 0};
    size_t start = sv.find_first_of("0123456789");
    if (start == std::string_view::npos) return v;

    size_t end = sv.find_first_not_of("0123456789", start);
    v.major = parse_int(sv.substr(start, end - start));

    start = sv.find_first_of("0123456789", end);
    if (start == std::string_view::npos) return v;

    end = sv.find_first_not_of("0123456789", start);
    v.minor = parse_int(sv.substr(start, end - start));

    start = sv.find_first_of("0123456789", end);
    if (start == std::string_view::npos) return v;

    end = sv.find_first_not_of("0123456789", start);
    v.patch = parse_int(sv.substr(start, end - start));

    return v;
}

constexpr bool operator>=(const Version& lhs, const Version& rhs) {
    return std::tie(lhs.major, lhs.minor, lhs.patch) >= std::tie(rhs.major, rhs.minor, rhs.patch);
}

template <size_t N>
constexpr bool check_openblas_version_gte(const char (&version)[N], const Version& min_version) {
    return parse_version(std::string_view(version, N - 1)) >= min_version;
}

int main(void) {
    static_assert(check_openblas_version_gte(OPENBLAS_VERSION, Version{0, 3, 27}), "OpenBLAS version is too old");
}

However, I think we'd have to have a separate configure-time check to see if our host blas is OpenBLAS or not, and if so, include the appropriate header file that defines OPENBLAS_VERSION.

@cwpearson
Copy link
Contributor

cwpearson commented Oct 15, 2024

And/or a workaround like this:

template <>
double HostBlas<std::complex<double> >::asum(KK_INT n, const std::complex<double>* x, KK_INT x_inc) {

  // workaround for clang 15.0.7 + openblas 0.3.23 + sapphire rapids
  // this combo doesn't work for vectors < 16 long, so copy into a vector of length 16
  // and put zeros in the rest
  std::complex<double> scratch[16];

  const std::complex<double> *p;

  if (n > 0 && n < 16) {
    for (int i = 0; i < 16; ++i) {
      if (i < n) {
        scratch[i] = x[i];
      } else {
        scratch[i] = std::complex<double>{0,0};
      }
    }
    p = scratch;
    n = 16;
  } else {
    p = x;
  }

  return F77_FUNC_DZASUM(&n, p, &x_inc);

}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants