Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vDPA support #306

Merged
merged 7 commits into from
Mar 9, 2022
Merged

Conversation

amorenoz
Copy link
Contributor

@amorenoz amorenoz commented Dec 10, 2020

vDPA devices can be exposed to Pods in two different ways:
If they are bound to the virtio_vdpa driver, they are exposed as normal netdevs
If they are bound to the vhost_vdpa driver they are exposed as vhost-vdpa devices. These are char devices that support the vhost-vdpa extensions to the vhost uAPI

The implementation of the vDPA support is done by implementing the DeviceInfoProvider interface.
A new string field is added to the 'NetDeviceSelectors': vdpaType. It can optionally have two values: vhost or virtio. This selector is used to determine the way the devices are exposed to the Pods as well as to filter the discovered devices based on their vdpa driver.
For the low level vdpa bus handling, we depend on github.com/redhat-virtio-net/govdpa a golang library to interact with the vdpa subsystem.

This PR also adds support for writing the vdpa DeviceInfo file as per the specification.
Fixes: #305

@amorenoz
Copy link
Contributor Author

Still in WIP state because I'd like to post a PR on the SRIOV-CNI with some code that is needed to test the E2E solution.
But reviews and comments can begin.

One specific design decision that I'm not 100% sure about the selector value, current state is:

{ ...
"vdpaType": "vhost"
or
"vdpaType": "virtio"
...}

We could use vdpaDriver: ["vhost_vdpa" , "virtio_vdpa"]. I finally went with the former in the hope of:

  • Avoid redundancy of the "_vdpa" part
  • Convey the message that this is not only a selector (i.e: a filter) based on the vDPA driver. It also modifies the way the device is exposed.

The not so good side of this is: we have to add some vdpaType to vdpaDriver boilerplate code here and there

WDYT?

@amorenoz
Copy link
Contributor Author

Works in combination with k8snetworkplumbingwg/sriov-cni#163

Copy link
Contributor

@mcoquelin mcoquelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor cosmetic comments, otherwise loks good to me.

Thanks,
Maxime

docs/vdpa/README.md Outdated Show resolved Hide resolved
pkg/netdevice/vdpa.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@zshi-redhat zshi-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amorenoz just to understand the scope, this vDPA support is for SR-IOV + vDPA, it will/can not be used for SF/ADI vDPA, correct?

pkg/factory/factory.go Outdated Show resolved Hide resolved
docs/vdpa/README.md Outdated Show resolved Hide resolved
docs/vdpa/README.md Outdated Show resolved Hide resolved
@amorenoz
Copy link
Contributor Author

amorenoz commented Jan 7, 2021

@amorenoz just to understand the scope, this vDPA support is for SR-IOV + vDPA, it will/can not be used for SF/ADI vDPA, correct?

Correct, this PR implements SR-IOV + vDPA

SF/ADI is a different story: will it even be supported by DevicePlugin?

@zshi-redhat
Copy link
Collaborator

@amorenoz just to understand the scope, this vDPA support is for SR-IOV + vDPA, it will/can not be used for SF/ADI vDPA, correct?

Correct, this PR implements SR-IOV + vDPA

SF/ADI is a different story: will it even be supported by DevicePlugin?

Haven't discussed this, my understanding is perhaps not.

@zshi-redhat
Copy link
Collaborator

/cc @martinkennelly

@amorenoz
Copy link
Contributor Author

amorenoz commented Jan 8, 2021

That is one strange CI error. Has anyone seen this before?

Resource manager

/home/travis/gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/.gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/cmd/sriovdp/manager_test.go:48

  discovering devices

  /home/travis/gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/.gopath/pkg/mod/github.com/onsi/ginkgo@v1.12.0/extensions/table/table.go:92

    no devices [It]

    /home/travis/gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/.gopath/pkg/mod/github.com/onsi/ginkgo@v1.12.0/extensions/table/table_entry.go:43

    Unexpected error:

        <*errors.errorString | 0xc0004c5990>: {

            s: "discoverDevices(): error getting PCI info: gzip: invalid header",

        }

        discoverDevices(): error getting PCI info: gzip: invalid header

    occurred

    /home/travis/gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/.gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/cmd/sriovdp/manager_test.go:315

------------------------------

@zshi-redhat
Copy link
Collaborator

That is one strange CI error. Has anyone seen this before?

Resource manager

/home/travis/gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/.gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/cmd/sriovdp/manager_test.go:48

  discovering devices

  /home/travis/gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/.gopath/pkg/mod/github.com/onsi/ginkgo@v1.12.0/extensions/table/table.go:92

    no devices [It]

    /home/travis/gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/.gopath/pkg/mod/github.com/onsi/ginkgo@v1.12.0/extensions/table/table_entry.go:43

    Unexpected error:

        <*errors.errorString | 0xc0004c5990>: {

            s: "discoverDevices(): error getting PCI info: gzip: invalid header",

        }

        discoverDevices(): error getting PCI info: gzip: invalid header

    occurred

    /home/travis/gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/.gopath/src/github.com/k8snetworkplumbingwg/sriov-network-device-plugin/cmd/sriovdp/manager_test.go:315

------------------------------

I think @martinkennelly mentioned this in last resource mgmt meeting, he might be working on the fix (to update ghw lib).
@adrianchiris suspected this might be due to the VM image update in travis.

@amorenoz amorenoz changed the title WIP: Add vDPA support Add vDPA support Jan 12, 2021
@martinkennelly
Copy link
Member

martinkennelly commented Jan 12, 2021

Couldn't reproduce consistently locally so doesn't seem like ghw issue. I was hoping it was transient error so gave up. @adrianchiris suggested upgrading to bionic (18.04) VM.

Ill post a test PR with updated travis file to see if it fixes it.
Edit: Updating travis VM failed.

I looked at the Ghw code and it seems to fetch pci-ids file in gzip format from a network location if it cannot find pci ids file locally: https://github.com/jaypipes/pcidb/blob/98ef3ee36c69f20650fe7855047ad042338c6983/discover.go#L36 referenced by const PCIIDS_URI. If this fails, you'd see the error shown by @amorenoz

Copy link
Contributor

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments on the first two commits

docs/vdpa/README.md Outdated Show resolved Hide resolved
docs/vdpa/README.md Show resolved Hide resolved
docs/vdpa/README.md Outdated Show resolved Hide resolved
pkg/netdevice/vdpa.go Show resolved Hide resolved
pkg/netdevice/vdpa.go Outdated Show resolved Hide resolved
pkg/netdevice/vdpa.go Outdated Show resolved Hide resolved
pkg/types/types.go Show resolved Hide resolved
pkg/netdevice/pciNetDevice.go Outdated Show resolved Hide resolved
@jaypipes
Copy link
Contributor

Couldn't reproduce consistently locally so doesn't seem like ghw issue. I was hoping it was transient error so gave up. @adrianchiris suggested upgrading to bionic (18.04) VM.

Ill post a test PR with updated travis file to see if it fixes it.
Edit: Updating travis VM failed.

I looked at the Ghw code and it seems to fetch pci-ids file in gzip format from a network location if it cannot find pci ids file locally: https://github.com/jaypipes/pcidb/blob/98ef3ee36c69f20650fe7855047ad042338c6983/discover.go#L36 referenced by const PCIIDS_URI. If this fails, you'd see the error shown by @amorenoz

Hi folks! Hey, please check out this PR from @pearsonk in the pcidb library that addresses systems that store the PCIIDS file in gzip-format:

jaypipes/pcidb#23

I plan to merge that today along with some other PRs from @martinkennelly and cut a new release for pcidb and the upstream ghw project today.

Best,
-jay

@jaypipes
Copy link
Contributor

/cc @fromanirh you might be interested in this repo/PR as well

pkg/netdevice/vdpa.go Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
pkg/netdevice/netResourcePool.go Outdated Show resolved Hide resolved
pkg/netdevice/pciNetDevice.go Show resolved Hide resolved
pkg/netdevice/vdpa.go Show resolved Hide resolved
pkg/netdevice/vdpa.go Outdated Show resolved Hide resolved
@amorenoz amorenoz force-pushed the vdpaInfoProvider branch 2 times, most recently from d287384 to b17bc28 Compare August 19, 2021 15:44
docs/vdpa/README.md Outdated Show resolved Hide resolved
pkg/netdevice/netResourcePool.go Outdated Show resolved Hide resolved
pkg/netdevice/pciNetDevice_test.go Outdated Show resolved Hide resolved
pkg/netdevice/vdpa.go Show resolved Hide resolved
@amorenoz
Copy link
Contributor Author

Updated to pull govdpa from NPWG organization

docs/vdpa/README.md Outdated Show resolved Hide resolved
@amorenoz
Copy link
Contributor Author

@adrianchiris , do you have any further comments on this PR?

Copy link

@Billy99 Billy99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Few minor things in the documentation.

docs/vdpa/README.md Outdated Show resolved Hide resolved
docs/vdpa/README.md Outdated Show resolved Hide resolved
docs/vdpa/README.md Outdated Show resolved Hide resolved
docs/vdpa/README.md Outdated Show resolved Hide resolved
docs/README.md Outdated Show resolved Hide resolved
@amorenoz
Copy link
Contributor Author

Thanks @Billy99

@amorenoz
Copy link
Contributor Author

@adrianchiris if you want to try this PR, I've tested it with linux v5.14.

@adrianchiris
Copy link
Contributor

@amorenoz planning to look at it today, sorry for the delay

Copy link
Contributor

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on it Adrian !

overall i think it looks good, added some comments

pkg/types/types.go Outdated Show resolved Hide resolved
pkg/netdevice/vdpa.go Show resolved Hide resolved
pkg/netdevice/vdpa.go Outdated Show resolved Hide resolved
pkg/netdevice/pciNetDevice.go Show resolved Hide resolved
pkg/netdevice/netDeviceProvider.go Show resolved Hide resolved
pkg/netdevice/netResourcePool.go Show resolved Hide resolved
@amorenoz
Copy link
Contributor Author

@adrianchiris at your convenience, PTAL

Copy link
Contributor

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of nits but otherwise LGTM.

Thank you for your patience Adrian !


// VdpaDevice is an interface to access vDPA device information
type VdpaDevice interface {
GetDriver() string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetDriver() is currently not being used by sriov-network-device-plugin. i think we can remove it for now WDYT ?

when filling device information i see that the VDPA type is used as driver:

in netResourcePool.go L#88
Driver: string(vdpaDev.GetType()),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'll remove it from the interface

@@ -159,6 +171,8 @@ type DeviceProvider interface {
GetDevices(*ResourceConfig) []PciDevice

GetFilteredDevices([]PciDevice, *ResourceConfig) ([]PciDevice, error)

ValidConfig(*ResourceConfig) bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you add a docstring to this interface method ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was not addressed, however will not block on it. it can be addressed as followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I added it to the implementations in netDeviceProvider and accelDeviceProvider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now added it to the interface method

Low level vdpa management is performed on
github.com/k8snetworkplumbingwg/govdpa

Two new types are introduced:

- VdpaDevice is an interface to expose vDPA device information.

- VdpaType is used to define the types of supported vdpa devices to use
It can have two values: "vhost" or "virtio".

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
The vdpaInfoProvider determines device information (as required by the
API) from a vdpa device.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
If a vdpaType is given, select the appropriate infoProvider
Also, filter devices based on the vdpaType configured

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Copy link
Contributor

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

Add a config validation function to the DeviceProvider API so each one
of them can do DeviceType-specific config validation.

Specifically, the netDevice config validation checks that both IsRdma
and VdpaType are not configured simultaneously.

Doing this at the manager level makes the rest of the code cleaner.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
@adrianchiris
Copy link
Contributor

@zshi-redhat @SchSeba PTAL

Its been a long ride with this PR :)

@zshi-redhat
Copy link
Collaborator

/cc @bn222

@amorenoz
Copy link
Contributor Author

amorenoz commented Mar 9, 2022

@zshi-redhat do you have any further comments on this PR?

@zshi-redhat zshi-redhat merged commit 6b69239 into k8snetworkplumbingwg:master Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add vDPA support
9 participants