Releases: NVIDIA/AMGX
v2.4.0
Changes:
- Increased maximum CUDA version to 12.2, and now supporting HPC SDK 23.7
- Fixed issue preventing parameters from being updated after config initialisation
- Restructured all source files (now under src) and removed plugin feature
- Replaced custom memory pool with
cudaMallocAsync
when definingUSE_CUDAMALLOCASYNC
- Changed the cuSPARSE SpMV algorithm choice to
CUSPARSE_CSRMV_ALG1
, which should improve solve performance for recent versions of cuSPARSE - Added single-kernel
csrmv
that is invoked when total number of rows in the local matrix falls below 3 times the number of SMs on the target GPUs - Changes to thrust
- Increased thrust version to 2.1.0
- Added specific tested version of thrust as a submodule, please usegit clone --recursive
to pull AmgX from v2.4.0 onwards
- Wrapped thrust in namespace to avoid shared library sharing issues referenced here https://github.com/NVIDIA/thrust/releases/tag/1.14.0
- Removed many superfluous points of synchronisation introduced by thrust - Improved performance of writing matrices to file
- Improved Clang compatibility
- Add a divergence check, providing new config parameter
rel_div_tolerance
- Added compile-time definition to avoid exception handling, in order to improve experience when debugging (
DISABLE_EXCEPTION_HANDLING
) - Fixed multiple synchronisation issues that can show up on newer GPU architectures (
sm_70
+) - Fixed partition reordering for block_sizes > 1
- Fixed build issue that arose when AmgX is built as a subproject
- Fixed issue with OpenMP and NO_MPI linking
- Replaced some inline asm with intrinsics
- Fixed issue with
exact_coarse_solve
grid sizing - Fixed issue with
use_sum_stopping_criteria
- Fixed
SIGFPE
that could occur when the initial norm is 0 - Added a new API call
AMGX_matrix_check_symmetry
, that tests if a matrix is structurally or completely symmetric
Tested configurations:
Linux x86-64:
-- Ubuntu 20.04, Ubuntu 22.04
-- NVHPC 23.7, GCC 9.4.0, GCC 12.1
-- OpenMPI 4.0.x
-- CUDA 11.2, 11.8, 12.2
-- A100, H100
Note that while AMGX has support for building in Windows, testing on Windows is very limited.
v2.3.0
Changes:
- Increased minimum CMake version to 3.18 and adapted to use CUDA as a language, making it possible to compile with HPC SDK
- Improved performance of compute_values_kernel by ~1.3x
- Optimised block tuning for aggressive coarsening
- Added an exact coarse solve, accessible via the default scope flag "exact_coarse_solve"
- Fixed issue where latency hiding could be enabled/disabled asymmetrically across available ranks
- Fixed bug with SpGEMM fallback that deleted cuSPARSE handle incorrectly
- Fixed bug with use of shared memory in estimate_c_hat_kernel
Tested configurations:
- Linux x86-64:
-- Ubuntu 20.04, Ubuntu 18.04
-- gcc 7.4.0, gcc 9.3.0
-- OpenMPI 4.0.x
-- CUDA 11.0, 11.2 - Windows 10 x86-64:
-- MS Visual Studio 2019 (msvc 19.28)
-- MS MPI v10.1.2
-- CUDA 11.0
Note that while AMGX has support for building in Windows, testing on Windows is very limited.
v2.2.0
AMGX v2.2.0
Changes:
- Fixing GPU Direct support (now correct results and better perf)
- Fixing latency hiding (general perf and couple bugfixes in some specific cases)
- Tunings for Volta for agg and classical setup phase
- Gauss-siedel perf improvements on Volta+
- Ampere support
- Minor bugfixes and enhancements including reported/requested by community
Tested configurations:
- Linux x86-64:
-- Ubuntu 20.04, Ubuntu 18.04
-- gcc 7.4.0, gcc 9.3.0
-- OpenMPI 4.0.x
-- CUDA 10.2, 11.0, 11.2 - Windows 10 x86-64:
-- MS Visual Studio 2019 (msvc 19.28)
-- MS MPI v10.1.2
-- CUDA 10.2, 11.0
Note that while AMGX has support for building in Windows, testing on Windows is very limited.
v2.1.0
AMGX v2.1.0
Changelog:
- Added new API that allows user to provide distributed matrix partitioning information in a new way - offset to the partition's first row in a matrix. Works only if partitions own continuous rows in matrix. Added example case for this new API (see examples/amgx_mpi_capi_cla.c)
- Distributed code improvements
Tested configurations:
- gcc 7.5, gcc 6.4
- CUDA 9.0, CUDA 10.0
- OpenMPI 4.0
- NVIDIA V100
Note that while AMGX has support for building in Windows, it is not actively tested and may malfunction or has issues building the library.
v2.0.1
AMGX v2.0.1
Thanks to community a lot of updates/fixes on initial release were added and incorporated to this release.
Changelog:
- Dropped CUDA 7.x support and added CUDA 10.0 support
- Build fixes when using MS VS 2015 and 2017
- Build fixes when using CUDA 8 and CUDA 9
- Fixed DEBUG build configuration
- Minor code improvements
Tested configurations:
v2.0.0
AMGX first open-source release, versioned v2.0.0
Tested build configurations:
- Linux: gcc 4.8.3, OpenMPI
- Windows: MS VS 2015, MPICH
- CUDA versions 7.0 - 9.0
Tested on Kepler - Volta GPU families.
API's documentation and examples contained document in doc
directory