-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added multi-GPU recipe. #27
Conversation
Added recipe for running operations in parallel on multiple GPUs.
@joaander I believe CUDA_VISIBLE_DEVICES will only work for certain configurations, correct? Is it the recommended solution on modern GPU configurations? Are there caveats we should include here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, there is no general way to solve the problem of GPU scheduling.
There are some specific cases where one can schedule bundled signac-flow
operations to GPUS in a mostly general way:
- If you use the SLURM scheduler, you can use
srun
to schedule tasks to individual GPUs within a job - If you run on Summit, use
jsrun
How to run parallel tasks on a multi-GPU machine | ||
================================================ | ||
Using Using **signac-flow** on a multi-GPU system, via ``python project.py run --parallel``, all the parallel tasks will be sent to same GPU. In case one wants to run a single task per GPU, but use all GPUs at the same time, the solution is to use | ||
``python project.py submit --bundle=N --parallel --test | /bin/bash``, where ``N`` is the number of (free) GPUs on the machine. The ``--test`` switch will generate a script which is then piped to the ``bash`` interpreter for execution. To check the script, one can redirect it to a file instead. For this recipe to work, your project folder must contain a ``templates/script.sh`` file with the following contents: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation will not choose among free GPUs, it will always choose GPUs 0,1,...N whether they are free or not.
|
||
{% set cmd_suffix = cmd_suffix|default('') ~ (' &' if parallel else '') %} | ||
{% for operation in operations %} | ||
export CUDA_VISIBLE_DEVICES={{ loop.index0 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA_VISIBLE_DEVICES
should not be used on systems that already make use of it for scheduling (i.e. SDSC Comet). It is less general, but a workable solution is to pass the loop index into the operation and select that GPU with the API of whatever tool the operation invokes.
Hey @berceanu, thanks for the PR. Unfortunately, there isn't a very general solution for this problem. In the future, it might be possible to suggest using a cluster-specific run utility where supported, e.g. |
Added recipe for running operations in parallel on multiple GPUs.