Skip to content

Profiling CUDA UVM host device memory transfers using Caliper

brian-kelley edited this page Mar 24, 2020 · 2 revisions

David Poliakoff's experimental branch of the Caliper profiling library can track the automatic transfers of UVM pages between host and device. Caliper is a standalone dynamic library that conforms to the Kokkos profiling interface. It can do much more than just UVM tracking, but these instructions are only about UVM.

Limitation: doesn't work with MPI yet, but full support of MPI in the Tpetra stack is in progress.

Installation

module load $cuda_stuff

Build your app with -DKokkos_ENABLE_PROFILING=ON, but otherwise the usual CUDA configuration Recommended to have -DCMAKE_BUILD_TYPE=Debug to have accurate kernel names in backtraces.

Download Caliper and checkout UVM branch (this is an experimental extension to https://github.com/LLNL/Caliper):

git clone git@github.com:DavidPoliakoff/caliper
cd caliper
git checkout feature/uvm

Build and install caliper - the install prefix can be anywhere. Here it's called $CALIPER_ROOT.

mkdir build
cd build
cmake -DWITH_KOKKOS_PROFILING=ON -DWITH_CUPTI=ON  -DCUDA_TOOLKIT_ROOT_DIR=$CUDA_ROOT -DCUPTI_PREFIX=$CUDA_ROOT/extras/CUPTI -DCMAKE_INSTALL_PREFIX=$CALIPER_ROOT ..
make install

Basic usage: list kernels, views, transfer direction and size in bytes

export KOKKOS_PROFILE_LIBRARY=$CALIPER_ROOT/lib64/libcaliper-serial.so
export CALI_CALIPER_ATTRIBUTE_DEFAULT_SCOPE=process
export CALI_LOG_VERBOSITY=1
export CALI_REPORT_CONFIG="SELECT SUM(cupti.uvm.bytes),* GROUP BY alloc.label#cupti.uvm.address,cupti.uvm.direction,function FORMAT TABLE"
export CALI_ALLOC_RESOLVE_ADDRESSES=TRUE

Now, just run a program that uses Kokkos+CUDA+UVM. After Kokkos is finalized, Caliper will print out a report. Here is an example line from running the TpetraExt MatrixMatrix unit tests:

KokkosSparse::StructureC::GPU_EXEC   BYTES_TRANSFER_HTOD   98304   Tpetra::CrsMatrix::val

This 4-column format (defined in $CALI_REPORT_CONFIG) has the demangled kernel/functor name, the transfer direction (here, host to device), the total number of bytes transferred, and the Kokkos::View label.

The reported byte count is a sum over all the times the transfer happened in the same kernel, on the same view in the same transfer direction.

Clone this wiki locally