Releases · NVIDIA/AMGX

25 Oct 11:58

mattmartineau

v2.4.0

2b4762f

v2.4.0 Latest

Latest

Changes:

Increased maximum CUDA version to 12.2, and now supporting HPC SDK 23.7
Fixed issue preventing parameters from being updated after config initialisation
Restructured all source files (now under src) and removed plugin feature
Replaced custom memory pool with cudaMallocAsync when defining USE_CUDAMALLOCASYNC
Changed the cuSPARSE SpMV algorithm choice to CUSPARSE_CSRMV_ALG1, which should improve solve performance for recent versions of cuSPARSE
Added single-kernel csrmv that is invoked when total number of rows in the local matrix falls below 3 times the number of SMs on the target GPUs
Changes to thrust
- Increased thrust version to 2.1.0
- Added specific tested version of thrust as a submodule, please use git clone --recursive to pull AmgX from v2.4.0 onwards
- Wrapped thrust in namespace to avoid shared library sharing issues referenced here https://github.com/NVIDIA/thrust/releases/tag/1.14.0
- Removed many superfluous points of synchronisation introduced by thrust
Improved performance of writing matrices to file
Improved Clang compatibility
Add a divergence check, providing new config parameter rel_div_tolerance
Added compile-time definition to avoid exception handling, in order to improve experience when debugging (DISABLE_EXCEPTION_HANDLING)
Fixed multiple synchronisation issues that can show up on newer GPU architectures (sm_70+)
Fixed partition reordering for block_sizes > 1
Fixed build issue that arose when AmgX is built as a subproject
Fixed issue with OpenMP and NO_MPI linking
Replaced some inline asm with intrinsics
Fixed issue with exact_coarse_solve grid sizing
Fixed issue with use_sum_stopping_criteria
Fixed SIGFPE that could occur when the initial norm is 0
Added a new API call AMGX_matrix_check_symmetry, that tests if a matrix is structurally or completely symmetric

Tested configurations:

Linux x86-64:
-- Ubuntu 20.04, Ubuntu 22.04
-- NVHPC 23.7, GCC 9.4.0, GCC 12.1
-- OpenMPI 4.0.x
-- CUDA 11.2, 11.8, 12.2
-- A100, H100

Note that while AMGX has support for building in Windows, testing on Windows is very limited.

Assets 2

29 Jun 23:24

marsaev

v2.3.0

32e1f44

v2.3.0

Changes:

Increased minimum CMake version to 3.18 and adapted to use CUDA as a language, making it possible to compile with HPC SDK
Improved performance of compute_values_kernel by ~1.3x
Optimised block tuning for aggressive coarsening
Added an exact coarse solve, accessible via the default scope flag "exact_coarse_solve"
Fixed issue where latency hiding could be enabled/disabled asymmetrically across available ranks
Fixed bug with SpGEMM fallback that deleted cuSPARSE handle incorrectly
Fixed bug with use of shared memory in estimate_c_hat_kernel

Tested configurations:

Linux x86-64:
-- Ubuntu 20.04, Ubuntu 18.04
-- gcc 7.4.0, gcc 9.3.0
-- OpenMPI 4.0.x
-- CUDA 11.0, 11.2
Windows 10 x86-64:
-- MS Visual Studio 2019 (msvc 19.28)
-- MS MPI v10.1.2
-- CUDA 11.0

Note that while AMGX has support for building in Windows, testing on Windows is very limited.

Assets 2

06 Apr 21:05

marsaev

v2.2.0

861ba5a

v2.2.0

AMGX v2.2.0

Changes:

Fixing GPU Direct support (now correct results and better perf)
Fixing latency hiding (general perf and couple bugfixes in some specific cases)
Tunings for Volta for agg and classical setup phase
Gauss-siedel perf improvements on Volta+
Ampere support
Minor bugfixes and enhancements including reported/requested by community

Tested configurations:

Linux x86-64:
-- Ubuntu 20.04, Ubuntu 18.04
-- gcc 7.4.0, gcc 9.3.0
-- OpenMPI 4.0.x
-- CUDA 10.2, 11.0, 11.2
Windows 10 x86-64:
-- MS Visual Studio 2019 (msvc 19.28)
-- MS MPI v10.1.2
-- CUDA 10.2, 11.0

Note that while AMGX has support for building in Windows, testing on Windows is very limited.

Assets 2

21 Mar 00:54

marsaev

v2.1.0

cf285c1

v2.1.0

AMGX v2.1.0

Changelog:

Added new API that allows user to provide distributed matrix partitioning information in a new way - offset to the partition's first row in a matrix. Works only if partitions own continuous rows in matrix. Added example case for this new API (see examples/amgx_mpi_capi_cla.c)
Distributed code improvements

Tested configurations:

gcc 7.5, gcc 6.4
CUDA 9.0, CUDA 10.0
OpenMPI 4.0
NVIDIA V100

Note that while AMGX has support for building in Windows, it is not actively tested and may malfunction or has issues building the library.

Assets 2

21 Mar 00:36

marsaev

v2.0.1

6025603

v2.0.1

AMGX v2.0.1

Thanks to community a lot of updates/fixes on initial release were added and incorporated to this release.

Changelog:

Dropped CUDA 7.x support and added CUDA 10.0 support
Build fixes when using MS VS 2015 and 2017
Build fixes when using CUDA 8 and CUDA 9
Fixed DEBUG build configuration
Minor code improvements

Tested configurations:

Windows
- MSVS 2017 (fixed and reported by @ftvkun), MSVS 2015 (fixed and reported by @ibaned)
Linux
- CUDA 9.0, CUDA 10.0
- gcc 4.8, gcc 5.3
- OpenMPI 3.1.3
Tested on Volta GPU arch

Assets 2

21 Mar 00:16

marsaev

v2.0.0

6af2390

v2.0.0

AMGX first open-source release, versioned v2.0.0

Tested build configurations:

Linux: gcc 4.8.3, OpenMPI
Windows: MS VS 2015, MPICH
CUDA versions 7.0 - 9.0

Tested on Kepler - Volta GPU families.

API's documentation and examples contained document in doc directory

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: NVIDIA/AMGX

v2.4.0

v2.3.0

v2.2.0

v2.1.0

v2.0.1

v2.0.0