Skip to content

Commit

Permalink
Add support for VMS -- Remove PF dependency
Browse files Browse the repository at this point in the history
  • Loading branch information
atyronesmith committed Nov 16, 2020
1 parent b26d306 commit ec2b90a
Show file tree
Hide file tree
Showing 9 changed files with 419 additions and 25 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ _build-%:
_plugin-%: vet
@hack/build-plugins.sh $*

plugins: _plugin-intel _plugin-mellanox _plugin-generic
plugins: _plugin-intel _plugin-mellanox _plugin-generic _plugin-virtual

clean:
@rm -rf $(TARGET_DIR)
Expand Down
33 changes: 21 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,22 @@
# sriov-network-operator

The Sriov Network Operator is design to help user to provision and configure SR-IOV CNI plugin and Device plugin in Openshift cluster.
The Sriov Network Operator is designed to help the user to provision and configure SR-IOV CNI plugin and Device plugin in the Openshift cluster.

## Motivation

SR-IOV network is an optional feature of Openshift cluster. To make it work, it requires different components to be provisioned and configured accordingly. It makes sense to have one operator to coordinate those relevant components in one place, instead of having them managed by different operators. And also, to hide the complexity, we should provide an elegant user interface to simplify the process of enabling SR-IOV.
SR-IOV network is an optional feature of an Openshift cluster. To make it work, it requires different components to be provisioned and configured accordingly. It makes sense to have one operator to coordinate those relevant components in one place, instead of having them managed by different operators. And also, to hide the complexity, we should provide an elegant user interface to simplify the process of enabling SR-IOV.

## Features

- Initialize the supported SR-IOV NIC types on selected nodes.
- provision/upgrade SR-IOV device plugin executable on selected node.
- provision/upgrade SR-IOV CNI plugin executable on selected nodes.
- manage configuration of SR-IOV device plugin on host.
- generate net-att-def CRs for SR-IOV CNI plugin
- Provision/upgrade SR-IOV device plugin executable on selected node.
- Provision/upgrade SR-IOV CNI plugin executable on selected nodes.
- Manage configuration of SR-IOV device plugin on host.
- Generate net-att-def CRs for SR-IOV CNI plugin
- Supports operation in a virtualized Kubernetes deployment
- Discovers VFs attached to the Virtual Machine (VM)
- Does not require attached of associated PFs
- VFs can be associated to SriovNetworks by selecting the appropriate PciAddress as the RootDevice in the SriovNetworkNodePolicy

## Quick Start

Expand All @@ -30,7 +34,7 @@ The SR-IOV network operator introduces following new CRDs:

### SriovNetwork

A custom resource of SriovNetwork could represent the a layer-2 broadcast domain where some SR-IOV devices attach to. It is primarily used to generate the a NetworkAttachmentDefinition CR with SR-IOV CNI plugin configuration.
A custom resource of SriovNetwork could represent the a layer-2 broadcast domain where some SR-IOV devices are attach to. It is primarily used to generate a NetworkAttachmentDefinition CR with an SR-IOV CNI plugin configuration.

This SriovNetwork CR also contains the ‘resourceName’ which is aligned with the ‘resourceName’ of SR-IOV device plugin. One SriovNetwork obj maps to one ‘resoureName’, but one ‘resourceName’ can be shared by different SriovNetwork CRs.

Expand Down Expand Up @@ -101,8 +105,8 @@ spec:

The custom resource to represent the SR-IOV interface states of each host, which should only be managed by the operator itself.

- The ‘spec’ of this CR represents the desired configuration which should be apply to the interfaces and SR-IOV device plugin.
- The ‘status’ contains current states of those PFs, and the states of the VFs. It helps user to discover SR-IOV network hardware on node.
- The ‘spec’ of this CR represents the desired configuration which should be applied to the interfaces and SR-IOV device plugin.
- The ‘status’ contains current states of those PFs (baremetal only), and the states of the VFs. It helps user to discover SR-IOV network hardware on node, or attached VFs in the case of a virtual deployment.

The spec is rendered by sriov-policy-controller, and consumed by sriov-config-daemon. Sriov-config-daemon is responsible for updating the ‘status’ field to reflect the latest status, this information can be used as input to create SriovNetworkNodeConfigPolicy CR.

Expand Down Expand Up @@ -154,13 +158,13 @@ status:
vendor: "8086"
```

From this example, in status field, user can find out there are 2 SRIOV capable NICs on node 'work-node-1'; in spec field, user can learn what the expected configure is generated from the combination of SriovNetworkNodeConfigPolicy CRs.
From this example, in status field, the user can find out there are 2 SRIOV capable NICs on node 'work-node-1'; in spec field, user can learn what the expected configure is generated from the combination of SriovNetworkNodeConfigPolicy CRs. In the virtual deployment case, a single VF will be associated with each device.

### SriovNetworkNodeConfigPolicy

This CRD is the key of SR-IOV network operator. This custom resource should be managed by cluster admin, to instruct the operator to:

1. Render the spec of SriovNetworkNodeState CR for selected node, to configure the SR-IOV interfaces.
1. Render the spec of SriovNetworkNodeState CR for selected node, to configure the SR-IOV interfaces. In virtual deployment, the VF interface is read-only.
2. Deploy SR-IOV CNI plugin and device plugin on selected node.
3. Generate the configuration of SR-IOV device plugin.

Expand All @@ -187,7 +191,12 @@ spec:
resourceName: intelnics
```

In this example, user selected the nice from vendor '8086' which is intel, device module is '1583' which is XL710 for 40GbE, on nodes labeled with 'network-sriov.capable' equals 'true'. Then for those PFs, create 4 VFs each, set mtu to 1500 and the load the vfio-pci driver to those virtual functions.
In this example, user selected the nic from vendor '8086' which is intel, device module is '1583' which is XL710 for 40GbE, on nodes labeled with 'network-sriov.capable' equals 'true'. Then for those PFs, create 4 VFs each, set mtu to 1500 and the load the vfio-pci driver to those virtual functions.

In a virtual deployment:
- The mtu of the PF is set by the underlying virtualization platform and cannot be changed by the sriov-network-operator.
- The numVfs parameter has no effect as there is always 1 VF
- The deviceType field depends upon whether the underlying device/driver is [native-bifurcating or non-bifurcating](https://doc.dpdk.org/guides/howto/flow_bifurcation.html) For example, the supported Mellanox devices support native-bifurcating drivers and therefore deviceType should be netdevice (default). The support Intel devices are non-bifurcating and should be set to vfio-pci.

## Components and design

Expand Down
27 changes: 25 additions & 2 deletions cmd/sriov-network-config-daemon/start.go
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
package main

import (
"context"
"flag"
"fmt"
"net"
"os"
"strings"
"time"

"github.com/golang/glog"
Expand All @@ -13,6 +15,7 @@ import (
"github.com/openshift/sriov-network-operator/pkg/daemon"
"github.com/openshift/sriov-network-operator/pkg/version"
"github.com/spf13/cobra"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/kubernetes/scheme"
"k8s.io/client-go/rest"
Expand All @@ -32,6 +35,11 @@ var (
kubeconfig string
nodeName string
}

// PlatformMap contains supported platforms for virtual VF
platformMap = map[string]daemon.PlatformType{
"openstack": daemon.Virtual,
}
)

func init() {
Expand Down Expand Up @@ -107,9 +115,23 @@ func runStartCmd(cmd *cobra.Command, args []string) {
destdir = "/host/etc"
}

platformType := daemon.Baremetal

nodeInfo, err := kubeclient.CoreV1().Nodes().Get(context.Background(), startOpts.nodeName, v1.GetOptions{})
if err == nil {
for key, pType := range platformMap {
if strings.Contains(strings.ToLower(nodeInfo.Spec.ProviderID), strings.ToLower(key)) {
platformType = pType
}
}
} else {
glog.Warningf("Failed to fetch node state %s, %v!", startOpts.nodeName, err)
}
glog.V(0).Infof("Running on platform: %s", platformType.String())

// block the deamon process until nodeWriter finish first its run
nodeWriter.Run(stopCh, refreshCh, syncCh, destdir, true)
go nodeWriter.Run(stopCh, refreshCh, syncCh, "", false)
nodeWriter.Run(stopCh, refreshCh, syncCh, destdir, true, platformType)
go nodeWriter.Run(stopCh, refreshCh, syncCh, "", false, platformType)

glog.V(0).Info("Starting SriovNetworkConfigDaemon")
err = daemon.New(
Expand All @@ -120,6 +142,7 @@ func runStartCmd(cmd *cobra.Command, args []string) {
stopCh,
syncCh,
refreshCh,
platformType,
).Run(stopCh, exitCh)
if err != nil {
glog.Errorf("failed to run daemon: %v", err)
Expand Down
35 changes: 33 additions & 2 deletions pkg/daemon/daemon.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,27 @@ import (
"github.com/openshift/sriov-network-operator/pkg/utils"
)

// PlatformType foo
type PlatformType int

const (
// Baremetal platform
Baremetal PlatformType = iota
// Virtual platform
Virtual
)

func (e PlatformType) String() string {
switch e {
case Baremetal:
return "Baremetal"
case Virtual:
return "Virtual"
default:
return fmt.Sprintf("%d", int(e))
}
}

const (
// updateDelay is the baseline speed at which we react to changes. We don't
// need to react in milliseconds as any change would involve rebooting the node.
Expand All @@ -60,6 +81,8 @@ type Daemon struct {
name string
namespace string

platform PlatformType

client snclientset.Interface
// kubeClient allows interaction with Kubernetes, including the node we are running on.
kubeClient *kubernetes.Clientset
Expand Down Expand Up @@ -128,9 +151,11 @@ func New(
stopCh <-chan struct{},
syncCh <-chan struct{},
refreshCh chan<- Message,
platformType PlatformType,
) *Daemon {
return &Daemon{
name: nodeName,
platform: platformType,
client: client,
kubeClient: kubeClient,
exitCh: exitCh,
Expand Down Expand Up @@ -544,8 +569,14 @@ func (dn *Daemon) restartDevicePluginPod() error {
}

func (dn *Daemon) loadVendorPlugins(ns *sriovnetworkv1.SriovNetworkNodeState) error {
pl := registerPlugins(ns)
pl = append(pl, GenericPlugin)
var pl []string

if dn.platform == Virtual {
pl = append(pl, VirtualPlugin)
} else {
pl = registerPlugins(ns)
pl = append(pl, GenericPlugin)
}
dn.LoadedPlugins = make(map[string]VendorPlugin)

for _, pn := range pl {
Expand Down
6 changes: 4 additions & 2 deletions pkg/daemon/plugin.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,15 @@ type VendorPlugin interface {
}

var pluginMap = map[string]string{
"8086": "intel_plugin",
"15b3": "mellanox_plugin",
"8086": "intel_plugin",
"15b3": "mellanox_plugin",
"virtual": "virtual_plugin",
}

const (
SpecVersion = "1.0"
GenericPlugin = "generic_plugin"
VirtualPlugin = "virtual_plugin"
)

// loadPlugin loads a single plugin from a file path
Expand Down
19 changes: 13 additions & 6 deletions pkg/daemon/writer.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,12 @@ func NewNodeStateStatusWriter(c snclientset.Interface, n string, f func()) *Node

// Run reads from the writer channel and sets the interface status. It will
// return if the stop channel is closed. Intended to be run via a goroutine.
func (writer *NodeStateStatusWriter) Run(stop <-chan struct{}, refresh <-chan Message, syncCh chan<- struct{}, destDir string, runonce bool) {
func (writer *NodeStateStatusWriter) Run(stop <-chan struct{}, refresh <-chan Message, syncCh chan<- struct{}, destDir string, runonce bool, platformType PlatformType) {
glog.V(0).Infof("Run(): start writer")
msg := Message{}
if runonce {
glog.V(0).Info("Run(): once")
if err := writer.pollNicStatus(); err != nil {
if err := writer.pollNicStatus(platformType); err != nil {
glog.Errorf("Run(): first poll failed: %v", err)
}
ns, _ := writer.setNodeStateStatus(msg)
Expand All @@ -59,7 +59,7 @@ func (writer *NodeStateStatusWriter) Run(stop <-chan struct{}, refresh <-chan Me
return
case msg = <-refresh:
glog.V(0).Info("Run(): refresh trigger")
if err := writer.pollNicStatus(); err != nil {
if err := writer.pollNicStatus(platformType); err != nil {
continue
}
writer.setNodeStateStatus(msg)
Expand All @@ -68,17 +68,24 @@ func (writer *NodeStateStatusWriter) Run(stop <-chan struct{}, refresh <-chan Me
}
case <-time.After(30 * time.Second):
glog.V(2).Info("Run(): period refresh")
if err := writer.pollNicStatus(); err != nil {
if err := writer.pollNicStatus(platformType); err != nil {
continue
}
writer.setNodeStateStatus(msg)
}
}
}

func (writer *NodeStateStatusWriter) pollNicStatus() error {
func (writer *NodeStateStatusWriter) pollNicStatus(platformType PlatformType) error {
glog.V(2).Info("pollNicStatus()")
iface, err := utils.DiscoverSriovDevices()
var iface []sriovnetworkv1.InterfaceExt
var err error

if platformType == Virtual {
iface, err = utils.DiscoverSriovDevicesVirtual()
} else {
iface, err = utils.DiscoverSriovDevices()
}
if err != nil {
return err
}
Expand Down
Loading

0 comments on commit ec2b90a

Please sign in to comment.