-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Create MIG with Cgroups on YARN Dataproc scripts #5261
Comments
One way we can get this to work is by making YARN use the major/minor device numbers for the MIG GI access nodes when it denies access via Cgroups. By denying access to the GI, it denies access to all CI's underneath it.
With the above changes, YARN will internally pass the major and minor devices for mig GI access to the container-executor to add all GPU MIG devices that this container shouldn't have access to the Cgroup Deny list for that container. To do this on Dataproc we need an option for user to enable MIG with scripts to do it. This requires A100's and MIG to be enabled and then a reboot to happen for it to take affect. The only way I found to do that is using the --metadata=startup-script-url= option and in that script it installs gpu drivers and enables mig, then does a reboot. On reboot that script will get run again and the script has to configure MIG CI instances per user and then do the normal YARN and spark rapids plugin install and initialization. |
Is your feature request related to a problem? Please describe.
We found a way to make MIG work with Cgroups on YARN without modifications to yarn. Get this running on Dataproc and create the necessary initialization scripts.
The text was updated successfully, but these errors were encountered: