Skip to content

LIKWID and SLURM

Robert Schade edited this page Jun 9, 2022 · 7 revisions

LIKWID and SLURM

When running jobs on a HPC system, it is nowadays commonly managed through the job scheduler SLURM. If you want access to the performance monitoring features of your system, some cluster-specific flags might be required when submitting a job. When a job is started, it is commonly restricted to the requested resources which might interfere with the execution of LIKWID (and other tools). This page contains some helpful hints for users as well as configuration ideas for administrators.

Using LIKWID in SLURM jobs

likwid-perfctr

Check with your compute center if they are running some sort of job-specific monitoring that might interfere with your reading hardware-performance counters. Usually, there will be the possibility to disable the job-specific monitoring for individual jobs with additional parameters during job submission.

likwid-mpirun

Enabling CPU performance monitoring

Using the accessdaemon mode

Using the perf_event mode

To avoid possible security and privacy concerns it is advisable to set the paranoid value (see likwid-perfctr) to 0 if a compute job has allocated a compute node exclusively. An exclusive usage also avoids contention issues in local shared resources during benchmarking.

A suitable way for HPC clusters with Slurm is to configure a prolog that detects if a job is running exclusively on a node and then sets /proc/sys/kernel/perf_event_paranoid to 0. Correspondingly, an epilog is needed that sets it back to the default value of 2.

Enabling Nvidia GPU performance monitoring

Reading of GPU performance counter by non-admin users is often disabled on HPC clusters due to security concerns. Trying to read them you will get a message referring you to the Nvidia-documentation about ERR_NVGPUCTRPERM.

A suitable way for HPC clusters with Slurm is to configure a prolog that detects if a job is running exclusively on a node and then

  1. stops all systemd services accessing the GPU devices, for example nvidia-persistenced and nvidia-persistenced,
  2. then unloads all relevant nvidia kernel modules, for example modprobe -r nvidia_uvm nvidia_drm nvidia_modeset nv_peer_mem nvidia,
  3. then reloads the nvidia kernel module with the current parameter, modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0
  4. and finally starts the services again.

A corresponding epilog also needs to be created where modprobe nvidia NVreg_RestrictProfilingToAdminUsers=1 is used instead. Be warned that such a prolog and epilog increase the job start/end duration because especially the restart of the nvidia systemd-services can take some time, likely up to one minute. A workaround would be to add a SPANK plugin that makes enabling the access to performance counters optional via a job submission parameter.

Clone this wiki locally