Map GPU operation to GPU ID #455

berceanu · 2021-02-07T23:33:11Z

The flow project contains one GPU operation per job. The goal is to track the GPU memory consumption during a run, on a per-job, per-GPU basis.

Workaround to SLURM not yet having a --gpus-per-task option:

{% set cmd_suffix = cmd_suffix|default('') ~ (' &' if parallel else '') %}
{% for operation in operations %}
export CUDA_VISIBLE_DEVICES={{ loop.index0 }}
{{ operation.cmd }}{{ cmd_suffix }}
{% endfor %}

In this way each GPU operation executes on a distinct GPU. I would like to have a flow operation which launches simulataneously with each GPU operation and tracks its memory usage.

Current approach: periodically get the PIDs of the running GPU compute processes, and map them to signac jobs IDs by grepping ps aux output for PID | -j <job_id>.

The text was updated successfully, but these errors were encountered:

vyasr · 2021-02-08T19:00:17Z

This issue is closely related to #115 regarding the best way to handle GPU assignment and management within signac-flow.

berceanu · 2021-02-13T14:35:30Z

Indeed, it;s the same problem, but now using SLURM.
However, I did find a solution: use srun instead of mpirun and specify --gres=gpu:x for each srun command.
Example SLURM script here. The --mpi=pmi2 flag was needed for srun to pick up MPICH which was installed by mpi4py in the conda environment.

berceanu changed the title ~~Map flow GPU operation to GPU UUID?~~ Map GPU operation to GPU ID Feb 8, 2021

berceanu mentioned this issue Feb 8, 2021

Add before_operations block to base_script.sh #457

Closed

berceanu closed this as completed Feb 13, 2021

berceanu mentioned this issue Feb 13, 2021

CUDA initialized before forking #115

Closed

joaander mentioned this issue Dec 15, 2021

Fix Expanse parallel gpu submission #594

Closed

12 tasks

vyasr mentioned this issue Jan 8, 2022

Clarify limits of submit --bundle --parallel glotzerlab/signac-docs#157

Merged

3 tasks

joaander mentioned this issue Nov 3, 2023

Refactor directives. #785

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map GPU operation to GPU ID #455

Map GPU operation to GPU ID #455

berceanu commented Feb 7, 2021 •

edited

Loading

vyasr commented Feb 8, 2021

berceanu commented Feb 13, 2021 •

edited

Loading

Map GPU operation to GPU ID #455

Map GPU operation to GPU ID #455

Comments

berceanu commented Feb 7, 2021 • edited Loading

vyasr commented Feb 8, 2021

berceanu commented Feb 13, 2021 • edited Loading

berceanu commented Feb 7, 2021 •

edited

Loading

berceanu commented Feb 13, 2021 •

edited

Loading