-
Notifications
You must be signed in to change notification settings - Fork 146
Home
AmgX uses CMake, so you can follow a standard CMake build process.
A key parameter is CUDA_ARCH
, which accepts the numerical value for the architecture (e.g. 80 for SM80). The parameter currently accepts up to 90.
An example script for building AmgX:
#!/bin/bash -ex
BUILD_TYPE=RelWithTraces
mkdir $BUILD_TYPE
cd $BUILD_TYPE
cmake -DCMAKE_INSTALL_PREFIX=../install/ \
-DCMAKE_C_COMPILER=mpicc \
-DCMAKE_CXX_COMPILER=mpic++ \
-DCMAKE_CUDA_HOST_COMPILER=mpic++
-DCUDA_ARCH=90 \
-DCMAKE_BUILD_TYPE=$BUILD_TYPE ..
make VERBOSE=true -j
make install
You can find examples of the AmgX API in the examples
directory, e.g., amgx_mpi_capi.c
.
MPI_Init(&argc, &argv); // Initialise MPI if you require multi-GPU
cudaSetDevice(local_rank); // Set the CUDA device, potentially to the local rank (if 1 rank per GPU)
AMGX_SAFE_CALL(AMGX_initialize()); // Initialise AmgX
AMGX_SAFE_CALL(AMGX_register_print_callback(&print_callback)); // Pass
Example print callback function that ensures only 1 MPI process outputs to stdout:
void print_callback(const char *msg, int length)
{
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) { printf("%s", msg); }
}
-
"min_rows_latency_hiding": <number_of_rows>
- declared in the outermost scope of the configuration file (i.e., in the same scope as "config_version": 2)
- Enables latency hiding that will be disabled after <number_of_rows>
- Latency hiding overlaps communication and computation, with the <number_of_rows> allowing the feature to be disabled when there is not enough compute to overlap
- Typically a value of somewhere around 30-50000 is reasonable, but depends upon the problem and GPU
For debugging and sharing test cases, API calls are provided to output matrices to file:
AMGX_write_system(const AMGX_matrix_handle mtx, const AMGX_vector_handle rhs, const AMGX_vector_handle sol, const char *filename)
AMGX_write_system_distributed(const AMGX_matrix_handle mtx, const AMGX_vector_handle rhs, const AMGX_vector_handle sol, const char *filename, int allocated_halo_depth, int num_partitions, const int *partition_sizes, int partition_vector_size, const int *partition_vector)
In order to control the type of output, there is a configuration parameter, matrix_writer
, which you can set to either matrixmarket
or binary
. The matrix market format is a simple readable ASCII format, useful if you want to look at the matrix, while the binary format is suited for cases where the matrix data is large.
To set the writer to binary, add the following to the outermost scope of the configuration file:
"matrix_writer": "binary"