-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update mig2gpu.sh to work with MIG on Cgroups on YARN #151
Conversation
…_GPUS_FOR_CGROUPS
value returned from nvidia-smi
Did some quick test and it works fine to me:
|
mig2gpu_migGpu_out+=("$mig2gpu_gpuMinorNumber") | ||
# if using this with CGROUP workaround we need the minor number to be from nvidia-caps access | ||
if [[ "$ENABLE_MIG_GPUS_FOR_CGROUPS" == 1 ]]; then | ||
mig_minor_dev_num=`cat /proc/driver/nvidia-caps/mig-minors | grep gpu$mig2gpu_gpuIdx/gi$mig2gpu_migGpuInstanceId/access | cut -d ' ' -f 2` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was striving to use only bash-intrinsic operators. We could use regex matching of the input lines instead of cat, grep & cut. https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.04/examples/MIG-Support/yarn-unpatched/scripts/mig2gpu.sh#L30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @gerashegalov I'll followup with this and see if I can use regex
part of NVIDIA/spark-rapids#5261. There is a description in the issue how this fits in.
This adds an option to the mig2gpu.sh to publish the minor device number of the mig device as the gi access device number.
This will be used with other scripts to allow MIG to work when YARN is configured with gpu scheduling and CGROUPs. Not updating docs here as more is required and the use of this will be part of other scripts.