Skip to content

CLBlast 1.6.0

Compare
Choose a tag to compare
@CNugteren CNugteren released this 21 May 19:22
· 31 commits to master since this release
b0b3028

CLBlast version 1.6.0. Changes since previous release (version 1.5.3):

  • Improved performance on Qualcomm Adreno GPUs:
    • Unique database entries for specific Adreno devices
    • Toggle OpenCL kernel compilation options for Adreno
    • New preprocessor directive RELAX_WORKGROUP_SIZE
  • Fixed a bug in handling of #undef in CLBlast loop unrolling and array-to-register mapping functions
  • Fixed a bug in XAMAX/XAMIN routines related to inadvertently including the increment and offset in the result
  • Fixed a bug in XAMAX/XAMIN routines that would cause only the real part of a complex number to be taken into account
  • Fixed a bug that caused tests to not properly do integer-output testing (for XAMAX/XAMIN)
  • Fixes a minor issue with the expected input buffer size in the TRMV/TBMV/TPMV/TRSV routines
  • Fixes an issue with crashes on Android related to calling clReleaseProgram
  • Fixes two small issues in the plotting script
  • Fixed a documentation bug in the 'ld' requirements
  • Enabled Github Actions CI builds for testing and releasing
  • Various minor fixes and enhancements
  • Added tuned parameters for various devices (see doc/tuning.md)