Nomad Nvidia Virtual Device Plugin

This repo contains a device plugin for Nomad to support exposing a number of virtual GPUs for each physical GPU present on the machine. This enables running workloads which don't consume the whole GPU.

Installation requirements

This plugin needs the following dependencies to function:

Nomad 0.9+
GNU/Linux x86_64 with kernel version > 3.10
NVIDIA GPU with Architecture > Fermi (2.1)
NVIDIA drivers >= 340.29 with binary nvidia-smi
Docker v19.03+

Copy the plugin binary to the plugins directory and configure the plugin in the client config. Also, see the requirements for the official nvidia-plugin.

plugin "nvidia-vgpu" {
  config {
    ignored_gpu_ids    = ["uuid1", "uuid2"]
    fingerprint_period = "5s"
    vgpus = 16
  }
}

Usage

Use the device stanza in the job file to schedule with device support.

job "gpu-test" {
  datacenters = ["dc1"]
  type = "batch"

  group "smi" {
    task "smi" {
      driver = "docker"

      config {
        image = "nvidia/cuda:11.0-base"
        command = "nvidia-smi"
      }

      resources {
        device "letmutx/gpu" {
          count = 1

          # Add an affinity for a particular model
          affinity {
            attribute = "${device.model}"
            value     = "Tesla K80"
            weight    = 50
          }
        }
      }
    }
  }
}

Notes

GPU memory allocation/usage is handled in a cooperative manner. This means that one bad GPU process using more memory than assigned can cause starvation for other processes.
Managing memory isolation per task is left to the user. It depends on a lot of factors like MPS, GPU architecture etc. This doc has some information.

Testing

The best way to test the plugin is to go to a target machine with Nvidia GPU and run the plugin using Nomad's plugin launcher with:

make eval

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
cmd		cmd
examples		examples
.gitignore		.gitignore
GNUmakefile		GNUmakefile
LICENSE		LICENSE
README.md		README.md
device.go		device.go
fingerprint.go		fingerprint.go
go.mod		go.mod
go.sum		go.sum
stats.go		stats.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nomad Nvidia Virtual Device Plugin

Installation requirements

Usage

Notes

Testing

Inspired by

About

Releases 1

Packages

Contributors 3

Languages

License

letmutx/nomad-nvidia-vgpu-plugin

Folders and files

Latest commit

History

Repository files navigation

Nomad Nvidia Virtual Device Plugin

Installation requirements

Usage

Notes

Testing

Inspired by

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages