Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging resource name after edit configmap #245

Closed
harper1011 opened this issue Jun 20, 2020 · 4 comments
Closed

Hanging resource name after edit configmap #245

harper1011 opened this issue Jun 20, 2020 · 4 comments

Comments

@harper1011
Copy link

harper1011 commented Jun 20, 2020

What happened?

Deploy "sriov-network-device-plugin" with default configmap as

apiVersion: v1
data:
  config.json: |
    {
        "resourceList": [{
                "resourceName": "intel_sriov_hostdevice",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["154c", "10ed"],
                    "drivers": ["i40evf", "ixgbevf", "iavf"]
                }
            },
            {
                "resourceName": "intel_sriov_dpdk_device",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["154c", "10ed"],
                    "drivers": ["vfio-pci", "igb_uio"]
                }
            }
        ]
    }

After all Pods are running, delete old configmap and create a new configmap with as following:

apiVersion: v1
kind: ConfigMap
metadata:
  name: sriovdp-config
  namespace: kube-system
data:
  config.json: |
    {
        "resourceList": [{
                "resourceName": "intel_sriov_left",
                "selectors": {
                    "vendors": ["8086"],
                    "drivers": ["i40evf", "ixgbevf", "iavf"],
                    "pfNames": ["enp24s0f0", "enp175s0f0"]
                }
            },
            {
                "resourceName": "intel_sriov_right",
                "selectors": {
                    "vendors": ["8086"],
                    "drivers": ["i40evf", "ixgbevf", "iavf"],
                    "pfNames": ["enp24s0f1", "enp175s0f1"]
                }
            },
            {
                "resourceName": "intel_sriov_dpdk_left",
                "selectors": {
                    "vendors": ["8086"],
                    "drivers": ["vfio-pci", "igb_uio"],
                    "pfNames": ["enp24s0f0", "enp134s0f0", "enp175s0f0"]
                }
            },
            {
                "resourceName": "intel_sriov_dpdk_right",
                "selectors": {
                    "vendors": ["8086"],
                    "drivers": ["vfio-pci", "igb_uio"],
                    "pfNames": ["enp24s0f1", "enp134s0f1", "enp175s0f1"]                }
            },
            {
               "resourceName": "mlnx_sriov_rdma_left",
               "resourcePrefix": "mellanox.com",
               "selectors": {
                   "vendors": ["15b3"],
                   "devices": ["1018"],
                   "drivers": ["mlx5_core"],
                   "pfNames": ["enp24s0f0", "enp134s0f0"], 
                   "isRdma": true
                }
            },
            {
               "resourceName": "mlnx_sriov_rdma_right",
               "resourcePrefix": "mellanox.com",
               "selectors": {
                  "vendors": ["15b3"],
                  "devices": ["1018"],
                  "drivers": ["mlx5_core"],
                  "pfNames": ["enp24s0f1", "enp134s0f1"], 
                  "isRdma": true
                }
            },
            {
               "resourceName": "mlnx_dpdk_rdma_left",
               "resourcePrefix": "mellanox.com",
               "selectors": {
                   "vendors": ["15b3"],
                   "devices": ["1018"],
                   "drivers": ["vfio-pci", "igb_uio"],
                   "pfNames": ["enp24s0f0", "enp134s0f0"], 
                   "isRdma": true
                }
            },
            {
               "resourceName": "mlnx_dpdk_rdma_right",
               "resourcePrefix": "mellanox.com",
               "selectors": {
                  "vendors": ["15b3"],
                  "devices": ["1018"],
                  "drivers": ["vfio-pci", "igb_uio"],
                  "pfNames": ["enp24s0f1", "enp134s0f1"], 
                  "isRdma": true
                }
            }
        ]
    }

and restart all sriov network device plugin Pods in order to take new configmap into use.
but from the node "status.allocatable" we still see the entry with resouceName in old configmap.

# kubectl describe node pool2-76d6c64449-j2dwz | grep intel 
 intel.com/intel_sriov_dpdk_device: 0
 intel.com/intel_sriov_dpdk_left: 4
 intel.com/intel_sriov_dpdk_right: 4
 intel.com/intel_sriov_hostdevice: 0
 intel.com/intel_sriov_left: 12
 intel.com/intel_sriov_right: 12
 intel.com/mlnx_dpdk_rdma_left: 0
 intel.com/mlnx_dpdk_rdma_right: 0
 intel.com/mlnx_sriov_rdma_left: 0
 intel.com/mlnx_sriov_rdma_right: 0

What did you expect to happen?

The node "status.allocatable" entries should be updated with new configmap only.
Or we would like to know whether this is expected behavior.

What are the minimal steps needed to reproduce the bug?

  • The same as above in "What happened?"

Anything else we need to know?

Component Versions

Please fill in the below table with the version numbers of components used.

Component Version
SR-IOV Network Device Plugin 3.1 + the PR of #195
SR-IOV CNI Plugin 2.2
Multus 3.4.1
Kubernetes 1.17.3
OS SELS15-SP1

Config Files

Config file locations may be config dependent.

Device pool config file location (Try '/etc/pcidp/config.json')
Multus config (Try '/etc/cni/multus/net.d')
CNI config (Try '/etc/cni/net.d/')
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)

Bare Metal

Kubeconfig file
SR-IOV Network Custom Resource Definition

Logs

SR-IOV Network Device Plugin Logs (use kubectl logs $PODNAME)
Multus logs (If enabled. Try '/var/log/multus.log' )
Kubelet logs (journalctl -u kubelet)
@adrianchiris
Copy link
Contributor

This seems to be related to the kubelet side of things, as it does not remove resources from node that are no longer reported by device plugin but rather updates their capacity/allocatable to 0

@harper1011
Copy link
Author

This seems to be related to the kubelet side of things, as it does not remove resources from node that are no longer reported by device plugin but rather updates their capacity/allocatable to 0

Thx for ur reply.
Is it a confirmed “issue” from kubelet?
Or we need to double confirm it from kubelet side?

@adrianchiris
Copy link
Contributor

its just from skimming through the kubelet code so you would need to confirm that from kubelet side.

Also in general a device plugin reports to kubelet resources that it has not the ones that it doesn't :)

https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md

@killianmuldoon
Copy link
Collaborator

It'll be interesting to see what the feedback is from the kubelet side - I have had this issue with running fuzz tests on device plugins which leaves !hundreds of 0 resources listed as part of the node info object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants