Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mig2gpu.sh to work with MIG on Cgroups on YARN #151

Merged
merged 6 commits into from
Apr 20, 2022

Conversation

tgravescs
Copy link
Collaborator

part of NVIDIA/spark-rapids#5261. There is a description in the issue how this fits in.

This adds an option to the mig2gpu.sh to publish the minor device number of the mig device as the gi access device number.

This will be used with other scripts to allow MIG to work when YARN is configured with gpu scheduling and CGROUPs. Not updating docs here as more is required and the use of this will be part of other scripts.

@tgravescs tgravescs added the enhancement New feature or request label Apr 18, 2022
@tgravescs tgravescs self-assigned this Apr 18, 2022
@tgravescs tgravescs merged commit 22732e0 into NVIDIA:branch-22.06 Apr 20, 2022
@tgravescs tgravescs deleted the migcgroups branch April 20, 2022 16:03
@viadea
Copy link
Collaborator

viadea commented Apr 20, 2022

Did some quick test and it works fine to me:

Then let’s generate the output and compare:
nvidia-smi -q -x | ENABLE_MIG_GPUS_FOR_CGROUPS=1 ./mig2gpu.sh > /tmp/new.txt
nvidia-smi -q -x | ENABLE_MIG_GPUS_FOR_CGROUPS=1 ./mig2gpu.sh.old > /tmp/old.txt
Compare the new and old output:
# diff /tmp/new.txt /tmp/old.txt
9c9
< 		<minor_number>30</minor_number>
---
> 		<minor_number>0</minor_number>
35c35
< 		<minor_number>39</minor_number>
---
> 		<minor_number>0</minor_number>
61c61
< 		<minor_number>48</minor_number>
---
> 		<minor_number>0</minor_number>

mig2gpu_migGpu_out+=("$mig2gpu_gpuMinorNumber")
# if using this with CGROUP workaround we need the minor number to be from nvidia-caps access
if [[ "$ENABLE_MIG_GPUS_FOR_CGROUPS" == 1 ]]; then
mig_minor_dev_num=`cat /proc/driver/nvidia-caps/mig-minors | grep gpu$mig2gpu_gpuIdx/gi$mig2gpu_migGpuInstanceId/access | cut -d ' ' -f 2`
Copy link
Collaborator

@gerashegalov gerashegalov Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was striving to use only bash-intrinsic operators. We could use regex matching of the input lines instead of cat, grep & cut. https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.04/examples/MIG-Support/yarn-unpatched/scripts/mig2gpu.sh#L30

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @gerashegalov I'll followup with this and see if I can use regex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants