Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map GPU operation to GPU ID #455

Closed
berceanu opened this issue Feb 7, 2021 · 2 comments
Closed

Map GPU operation to GPU ID #455

berceanu opened this issue Feb 7, 2021 · 2 comments

Comments

@berceanu
Copy link
Contributor

berceanu commented Feb 7, 2021

The flow project contains one GPU operation per job. The goal is to track the GPU memory consumption during a run, on a per-job, per-GPU basis.

Workaround to SLURM not yet having a --gpus-per-task option:

{% set cmd_suffix = cmd_suffix|default('') ~ (' &' if parallel else '') %}
{% for operation in operations %}
export CUDA_VISIBLE_DEVICES={{ loop.index0 }}
{{ operation.cmd }}{{ cmd_suffix }}
{% endfor %}

In this way each GPU operation executes on a distinct GPU. I would like to have a flow operation which launches simulataneously with each GPU operation and tracks its memory usage.

Current approach: periodically get the PIDs of the running GPU compute processes, and map them to signac jobs IDs by grepping ps aux output for PID | -j <job_id>.

@berceanu berceanu changed the title Map flow GPU operation to GPU UUID? Map GPU operation to GPU ID Feb 8, 2021
@vyasr
Copy link
Contributor

vyasr commented Feb 8, 2021

This issue is closely related to #115 regarding the best way to handle GPU assignment and management within signac-flow.

@berceanu
Copy link
Contributor Author

berceanu commented Feb 13, 2021

Indeed, it;s the same problem, but now using SLURM.
However, I did find a solution: use srun instead of mpirun and specify --gres=gpu:x for each srun command.
Example SLURM script here. The --mpi=pmi2 flag was needed for srun to pick up MPICH which was installed by mpi4py in the conda environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants