Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s-device-plugin fails with k8s static CPU policy #145

Closed
10 tasks
johnathanhegge opened this issue Nov 13, 2019 · 15 comments
Closed
10 tasks

k8s-device-plugin fails with k8s static CPU policy #145

johnathanhegge opened this issue Nov 13, 2019 · 15 comments

Comments

@johnathanhegge
Copy link

1. Issue or feature description

Kubelet configured with a static CPU policy (e.g. --cpu-manager-policy=static --kube-reserved cpu=0.1) will cause nvidia-smi to fail after short delay.

Configure a test pod to request a nvidia.com/gpu resource, then run a simple nvidia-smi command as "sleep 30; nvidia-smi" and this will always fail with:
"Failed to initialize NVML: Unknown Error"

Running the same without the sleep, command works and nvidia-smi returns the expected info

2. Steps to reproduce the issue

Kubernetes 1.14
$ kubelet --version
Kubernetes v1.14.8
Device plugin: nvidia/k8s-device-plugin:1.11 (also with 1.0.0.0-beta4)

apply the daemonset for the nvidia plugin
then apply a pod yaml for a pod requesting one device:

kind: Pod
metadata:
  name: gputest
spec:
  containers:
  - command:
    - /bin/bash
    args:
    - -c
    - "sleep 30; nvidia-smi"
    image: nvidia/cuda:8.0-runtime-ubuntu16.04
    name: app
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
        nvidia.com/gpu: "1"
      requests:
        cpu: "1"
        memory: 1Gi
        nvidia.com/gpu: "1"
  restartPolicy: Never
  tolerations:
  - effect: NoSchedule
    operator: Exists
  nodeSelector:
    beta.kubernetes.io/arch: amd64

then follow the pod logs:

Failed to initialize NVML: Unknown Error

The pod persists in this state

3. Information to attach (optional if deemed irrelevant)

Common error checking:

  • The output of nvidia-smi -a on your host

==============NVSMI LOG==============

Timestamp                           : Tue Nov 12 12:22:08 2019
Driver Version                      : 390.30

Attached GPUs                       : 1
GPU 00000000:03:00.0
    Product Name                    : Tesla M2090
    Product Brand                   : Tesla
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : N/A
    Accounting Mode Buffer Size     : N/A
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0320512020115
    GPU UUID                        : GPU-f473d23b-0a01-034e-933b-58d52ca40425
    Minor Number                    : 0
    VBIOS Version                   : 70.10.46.00.01
    MultiGPU Board                  : No
    Board ID                        : 0x300
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : N/A
        OEM Object                  : 1.1
        ECC Object                  : 2.0
        Power Management Object     : 4.0
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    PCI
        Bus                         : 0x03
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x109110DE
        Bus Id                      : 00000000:03:00.0
        Sub System Id               : 0x088710DE
        GPU Link Info
            PCIe Generation
                Max                 : 2
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : N/A
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : N/A
    Performance State               : P12
    Clocks Throttle Reasons         : N/A
    FB Memory Usage
        Total                       : 6067 MiB
        Used                        : 0 MiB
        Free                        : 6067 MiB
    BAR1 Memory Usage
        Total                       : N/A
        Used                        : N/A
        Free                        : N/A
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : N/A
        Decoder                     : N/A
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Disabled
        Pending                     : Disabled
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        GPU Current Temp            : N/A
        GPU Shutdown Temp           : N/A
        GPU Slowdown Temp           : N/A
        GPU Max Operating Temp      : N/A
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 29.81 W
        Power Limit                 : 225.00 W
        Default Power Limit         : N/A
        Enforced Power Limit        : N/A
        Min Power Limit             : N/A
        Max Power Limit             : N/A
    Clocks
        Graphics                    : 50 MHz
        SM                          : 101 MHz
        Memory                      : 135 MHz
        Video                       : 135 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 650 MHz
        SM                          : 1301 MHz
        Memory                      : 1848 MHz
        Video                       : 540 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None
  • Your docker configuration file (e.g: /etc/docker/daemon.json)
{
    "experimental": true,
    "storage-driver": "overlay2",
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
  • The k8s-device-plugin container logs
2019/11/11 19:10:56 Loading NVML
2019/11/11 19:10:56 Fetching devices.
2019/11/11 19:10:56 Starting FS watcher.
2019/11/11 19:10:56 Starting OS watcher.
2019/11/11 19:10:56 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock
2019/11/11 19:10:56 Registered device plugin with Kubelet
  • The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)
    repeated:
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: E1112 12:32:21.880196    8053 cpu_manager.go:252] [cpumanager] reconcileState: failed to add container (pod: kube-proxy-bm82q, container: kube-proxy, container id: 92273ce7687ead38fb1c59b18934179183ea1b9e4f59107e92eec2f987bb91be, error: rpc error: code = Unknown desc
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: I1112 12:32:21.880175    8053 policy_static.go:195] [cpumanager] static policy: RemoveContainer (container id: 92273ce7687ead38fb1c59b18934179183ea1b9e4f59107e92eec2f987bb91be)
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: : unknown
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: E1112 12:32:21.880153    8053 cpu_manager.go:183] [cpumanager] AddContainer error: rpc error: code = Unknown desc = failed to update container "92273ce7687ead38fb1c59b18934179183ea1b9e4f59107e92eec2f987bb91be": Error response from daemon: Cannot update container 92273
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: : unknown
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: E1112 12:32:21.880081    8053 remote_runtime.go:350] UpdateContainerResources "92273ce7687ead38fb1c59b18934179183ea1b9e4f59107e92eec2f987bb91be" from runtime service failed: rpc error: code = Unknown desc = failed to update container "92273ce7687ead38fb1c59b1893417918

Additional information that might help better understand your environment and reproduce the bug:

  • Docker version from docker version
    Version: 18.09.1

  • Docker command, image and tag used

  • Kernel version from uname -a

Linux dal1k8s-worker-06 4.4.0-135-generic NVIDIA/nvidia-docker#161-Ubuntu SMP Mon Aug 27 10:45:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Any relevant kernel output lines from dmesg
[    2.840610] nvidia: module license 'NVIDIA' taints kernel.
[    2.879301] nvidia-nvlink: Nvlink Core is being initialized, major device number 245
[    2.911779] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  390.30  Wed Jan 31 21:32:48 PST 2018
[    2.912960] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver
[   13.893608] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 242
  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                                                      Version                                   Architecture                              Description
+++-=========================================================================-=========================================-=========================================-=======================================================================================================================================================
ii  libnvidia-container-tools                                                 1.0.1-1                                   amd64                                     NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                                                1.0.1-1                                   amd64                                     NVIDIA container runtime library
ii  nvidia-390                                                                390.30-0ubuntu1                           amd64                                     NVIDIA binary driver - version 390.30
ii  nvidia-container-runtime                                                  2.0.0+docker18.09.1-1                     amd64                                     NVIDIA container runtime
ii  nvidia-container-runtime-hook                                             1.4.0-1                                   amd64                                     NVIDIA container runtime hook
un  nvidia-current                                                            <none>                                    <none>                                    (no description available)
un  nvidia-docker                                                             <none>                                    <none>                                    (no description available)
ii  nvidia-docker2                                                            2.0.3+docker18.09.1-1                     all                                       nvidia-docker CLI wrapper
un  nvidia-driver-binary                                                      <none>                                    <none>                                    (no description available)
un  nvidia-legacy-340xx-vdpau-driver                                          <none>                                    <none>                                    (no description available)
un  nvidia-libopencl1-390                                                     <none>                                    <none>                                    (no description available)
un  nvidia-libopencl1-dev                                                     <none>                                    <none>                                    (no description available)
un  nvidia-opencl-icd                                                         <none>                                    <none>                                    (no description available)
ii  nvidia-opencl-icd-390                                                     390.30-0ubuntu1                           amd64                                     NVIDIA OpenCL ICD
un  nvidia-persistenced                                                       <none>                                    <none>                                    (no description available)
ii  nvidia-prime                                                              0.8.2                                     amd64                                     Tools to enable NVIDIA's Prime
ii  nvidia-settings                                                           410.79-0ubuntu1                           amd64                                     Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                                                    <none>                                    <none>                                    (no description available)
un  nvidia-smi                                                                <none>                                    <none>                                    (no description available)
un  nvidia-vdpau-driver                                                       <none>                                    <none>                                    (no description available)
  • NVIDIA container library version from nvidia-container-cli -V
version: 1.0.1
build date: 2019-01-15T23:24+00:00
build revision: 038fb92d00c94f97d61492d4ed1f82e981129b74
build compiler: gcc-5 5.4.0 20160609
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections```


 - [ ] NVIDIA container library logs (see [troubleshooting](https://github.com/NVIDIA/nvidia-docker/wiki/Troubleshooting))
@klueska
Copy link
Contributor

klueska commented Nov 13, 2019

This is a known issue and reported before:
NVIDIA/nvidia-container-toolkit#138

Unfortunately, there is no upstream fix for this yet. The plan is to address it as part of the upcoming redeisgn for the device plugins: https://docs.google.com/document/d/1wPlJL8DsVpHnbVbTaad35ILB-jqoMLkGFLnQpWWNduc/edit

@johnathanhegge
Copy link
Author

Thank you for the links, I read through the ticket and its further links to gain more context, including your PRs for Kubernetes upstream. This is a more nuanced issue than I expected. In our case, we'd like the static policy but it's not required, so we'll watch as this develops.

Glad to see the document about the redesign on device plugins, been wondering where that was heading while dealing with the plugin for RDMA as well in a fork.

@klueska
Copy link
Contributor

klueska commented Jun 15, 2020

Should have updated this issue back in April with this comment:
NVIDIA/nvidia-docker#966 (comment)

@hSATAC
Copy link

hSATAC commented Dec 16, 2020

Is this fixed?
We still encounter this issue when using nvidia-docker 2.5.0-1 & k8s-device-plugin 0.7.0 on A100 MIG mixed mode & single mode.

@klueska
Copy link
Contributor

klueska commented Dec 16, 2020

There was a flag added a while back called compatWithCPUManager. It’s should be explained by n the README. Are you setting this when you run the plugin?

@hSATAC
Copy link

hSATAC commented Dec 16, 2020

Yes, we set compatWithCPUManager=true. It works without MIG enabled, but it's not working if we enable MIG, no matter single or mixed mode. MIG works without cpu manager policy.

@klueska
Copy link
Contributor

klueska commented Dec 16, 2020

I see. I think I can picture what the issue might be. Let me confirm it later today and I’ll provide an update here. Thanks.

@klueska
Copy link
Contributor

klueska commented Dec 16, 2020

Yes, I can confirm that this is an issue.

MIG support in the k8s-device-plugin was tested together with the compatWithCPUManager when it first came out and it worked just fine. However, since that time, the way that the underlying GPU driver exposes MIG to a container has changed. It was originally based on something called /proc based nvidia-capabilities and now it's based on something called /dev based nvidia-capabilities (more info on this here).

Without going into too much detail, when the underlying driver switched its implementation for this, it broke compatWithCPUManager in the k8s-device-plugin when MIG is enabled.

The fix should be fairly straightforward and will involve listing out the set of device nodes associated with the nvidia-capabibilities that grant access to the MIG device being allocated -- and sending them back to the kubelet (the same way the device nodes for full GPUs are sent back here).

I have added this to our list of tasks for v0.8.0 which will be released sometime in January.

In the meantime, if you need this to work today, you can follow the advice in "Working with nvidia-capabilities"
and flip your driver settings from /dev based nvidia-capabilities to /proc based nvidia-capabilities via:

$ modprobe nvidia nv_cap_enable_devfs=0

That should get things working again until a fix come out. It is not a long-term fix, however, as support for /proc based nvidia-capabilities will disappear in a future driver release.

Thanks for reporting!

@hSATAC
Copy link

hSATAC commented Dec 16, 2020

Thanks for your reply! It's good to confirm this is an issue in current version.
We did try the nv_cap_enable_devfs=0 thing but in vain, maybe we'll give it another shot.

@klueska
Copy link
Contributor

klueska commented Dec 16, 2020

Please do -- and if it doesn't work, let me know (it should).

@xial-thu
Copy link

Another tricky solution is disabling the modification of cgroup device list during setting cgroup cpuset by runc . See https://github.com/NVIDIA/nvidia-container-runtime/pull/55/files. Apply the patch to runc and it works. I'm using it in my system.

I don't have V100 GPU, so I'm not sure if it works with current version.

@klueska
Copy link
Contributor

klueska commented Dec 28, 2020

@xial-thu the nvidia-container-runtime moved away from a fork of runc in early 2019 so your patch unfortunately no longer applies.

@xial-thu
Copy link

xial-thu commented Dec 31, 2020

@xial-thu the nvidia-container-runtime moved away from a fork of runc in early 2019 so your patch unfortunately no longer applies.

Do the same thing to runc, it still works~After all, the origin of the issue is that kubelet' operation bypasses nv-container-runtime.

@klueska
Copy link
Contributor

klueska commented Feb 24, 2021

PR to fix this is tested and ready to be merged.
Will be included in the upcoming v0.9.0 release.

https://gitlab.com/nvidia/kubernetes/device-plugin/-/merge_requests/80

@klueska
Copy link
Contributor

klueska commented Feb 25, 2021

This has now been merged.

@klueska klueska closed this as completed Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants