DRA for 1.31 #125488

pohly · 2024-06-13T13:15:30Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

This is an implementation of the revised API and features from kubernetes/enhancements#4709.

Which issue(s) this PR fixes:

Related-to:

Fixes: #125665, #124041, #125210

Special notes for your reviewer:

Several of these changes where already proposed earlier in separate PRs and/or might get split out. Here's a list:

In this PR, please only review commits starting with "DRA: remove immediate allocation".

Does this PR introduce a user-facing change?

DRA: new API and several new features

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]:  https://github.com/kubernetes/enhancements/issues/4381

k8s-ci-robot · 2024-06-13T13:15:32Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

k8s-ci-robot · 2024-06-13T13:15:38Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

This is a complete revamp of the original API. Some of the key differences: - refocused on structured parameters and allocating devices - support for constraints across devices - support for allocating "all" or a fixed amount of similar devices in a single request - no class for ResourceClaims, instead individual device requests are associated with a mandatory DeviceClass For the sake of simplicity, optional basic types (ints, strings) where the null value is the default are represented as values in the API types. This makes Go code simpler because it doesn't have to check for nil (consumers) and values can be set directly (producers). The effect is that in protobuf, these fields always get encoded because `opt` only has an effect for pointers. The roundtrip test data for v1.29.0 and v1.30.0 changes because of the new "request" field. This is considered acceptable because the entire `claims` field in the pod spec is still alpha. The implementation is complete enough to bring up the apiserver. Adapting other components follows.

Publishing ResourceSlices now supports network-attached devices and the new v1alpha3 API. The logic for splitting up across different slices is missing.

This adds the ability to select specific requests inside a claim for a container. NodePrepareResources is always called, even if the claim is not used by any container. This could be useful for drivers where that call has some effect other than injecting CDI device IDs into containers. It also ensures that drivers can validate configs. The pod resource API can no longer report a class for each claim because there is no such 1:1 relationship anymore. Instead, that API reports claim, API devices (with driver/pool/device as ID) and CDI device IDs. The kubelet itself doesn't extract that information from the claim. Instead, it relies on drivers to report this information when the claim gets prepared. This isolates the kubelet from API changes. Because of a faulty E2E test, kubelet was told to contact the wrong driver for a claim. This was not visible in the kubelet log output. Now changes to the claim info cache are getting logged. While at it, naming of variables and some existing log output gets harmonized. Co-authored-by: Oksana Baranova <oksana.baranova@intel.com> Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>

The advantages of using a validation admission policy (VAP) are that no changes are needed in Kubernetes and that admins have full flexibility if and how they want to control which users are allowed to use "admin access" in their requests. The downside is that without admins taking actions, the feature is enabled out-of-the-box in a cluster. Documentation for DRA will have to make it very clear that something needs to be done in multi-tenant clusters. The test/e2e/testing-manifests/dra/admin-access-policy.yaml shows how to do this. The corresponding E2E tests ensures that it actually works as intended. For some reason, adding the namespace to the message expression leads to a type check errors, so it's currently commented out.

The resource claim controller is completely agnostic to the claim spec. It doesn't care about classes or devices, therefore it needs no changes in 1.31 besides the v1alpha2 -> v1alpha3 renaming from a previous commit.

The structured parameter allocation logic was written from scratch in staging/src/k8s.io/dynamic-resource-allocation/structured where it might be useful for out-of-tree components. Besides the new features (amount, admin access) and API it now supports backtracking when the initial device selection doesn't lead to a complete allocation of all claims. Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com> Co-authored-by: John Belamaric <jbelamaric@google.com>

In the API, the effect of the feature gate is that alpha fields get dropped on create. They get preserved during updates if already set. The PodSchedulingContext registration is *not* restricted by the feature gate. This enables deleting stale PodSchedulingContext objects after disabling the feature gate. The scheduler checks the new feature gate before setting up an informer for PodSchedulingContext objects and when deciding whether it can schedule a pod. If any claim depends on a control plane controller, the scheduler bails out, leading to: Status: Pending ... Warning FailedScheduling 73s default-scheduler 0/1 nodes are available: resourceclaim depends on disabled DRAControlPlaneController feature. no new claims to deallocate, preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling. The rest of the changes prepare for testing the new feature separately from "structured parameters". The goal is to have base "dra" jobs which just enable and test those, then "classic-dra" jobs which add DRAControlPlaneController.

This fixes the message (node name and "cluster-scoped" were switched) and simplifies the VAP: - a single matchCondition short circuits completely unless they're a user we care about - variables to extract the userNodeName and objectNodeName once (using optionals to gracefully turn missing claims and fields into empty strings) - leaves very tiny concise validations Co-authored-by: Jordan Liggitt <liggitt@google.com>

Some of the E2E node tests were flaky. Their timeout apparently was chosen under the assumption that kubelet would retry immediately after a failed gRPC call, with a factor of 2 as safety margin. But according to kubernetes@0449cef, kubelet has a different, higher retry period of 90 seconds, which was exactly the test timeout. The test timeout has to be higher than that. As the tests don't use the gRPC call timeout anymore, it can be made private. While at it, the name and documentation gets updated.

Huang-Wei

/approve

for scheduler changes.

PS: there was one nit for integration test coverage. For now DRA is using scheduler_perf tests for both perf and functionality tests. However, scheduler_perf is not included in a PR's presubmit-CI, so it makes sense to add some integration tests under test/integration/scheduler. Not a blocker though.

pohly · 2024-07-22T17:25:52Z

/test pull-kubernetes-node-e2e-crio-cgrpv2-dra

Setup failed with a flake (? "Error while dialing: dial unix /var/run/crio/crio.sock: connect: no such file or directory").

#125488 (review):

However, scheduler_perf is not included in a PR's presubmit-CI

It is enabled, and there is one test scenario ("SchedulingWithResourceClaimTemplateStructured") for the new API. More can and will be added once we start investigating performance a bit more. Right now, it's more about functionality.

@klueska: would you do us the honor and LGTM together with lifting the hold?

We have positive reviews and approval from (not collecting links, but they are there...):

@thockin (API)
@johnbelamaric (scheduling logic, diverse aspects)
@Huang-Wei, @kerthcet (SIG Scheduling)
@liggitt (SIG Auth, api-machinery)
@klueska (SIG Node)
and several others

Thanks everyone!

klueska · 2024-07-22T17:35:51Z

Gladly. Thanks everyone! One step closer...

/approve
/lgtm
/unhold

k8s-ci-robot · 2024-07-22T17:35:59Z

LGTM label has been added.

Git tree hash: ce1d7771db93f922ecda8bf61041a6be1c84de6d

k8s-ci-robot · 2024-07-22T17:36:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei, klueska, pohly, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~api/OWNERS~~ [thockin]
~~cmd/kube-controller-manager/OWNERS~~ [thockin]
~~hack/OWNERS~~ [pohly,thockin]
~~pkg/api/OWNERS~~ [thockin]
~~pkg/apis/OWNERS~~ [thockin]
~~pkg/controller/resourceclaim/OWNERS~~ [klueska,pohly,thockin]
~~pkg/controlplane/OWNERS~~ [thockin]
~~pkg/features/OWNERS~~ [thockin]
~~pkg/generated/openapi/OWNERS~~ [thockin]
~~pkg/kubeapiserver/authorizer/OWNERS~~ [thockin]
~~pkg/kubectl/OWNERS~~ [thockin]
~~pkg/kubelet/OWNERS~~ [klueska,thockin]
~~pkg/printers/OWNERS~~ [thockin]
~~pkg/registry/OWNERS~~ [thockin]
~~pkg/scheduler/OWNERS~~ [Huang-Wei,thockin]
~~plugin/pkg/admission/noderestriction/OWNERS~~ [thockin]
~~plugin/pkg/auth/authorizer/OWNERS~~ [thockin]
~~staging/publishing/OWNERS~~ [thockin]
~~staging/src/k8s.io/api/OWNERS~~ [thockin]
~~staging/src/k8s.io/apiserver/pkg/cel/OWNERS~~ [thockin]
~~staging/src/k8s.io/cli-runtime/OWNERS~~ [thockin]
~~staging/src/k8s.io/client-go/OWNERS~~ [thockin]
~~staging/src/k8s.io/dynamic-resource-allocation/OWNERS~~ [klueska,pohly,thockin]
~~staging/src/k8s.io/kubelet/OWNERS~~ [klueska,thockin]
~~test/OWNERS~~ [pohly,thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This behavior was useful for kubernetes#125488 but wasn't obvious when reading the documentation.

The manual deep comparison code is hard to maintain (would need to be updated in kubernetes#125488) and error prone. In fact, one test case failed when doing a full automatic comparison with cmp.Diff because it wasn't setting allMemory.

This behavior was useful for kubernetes#125488 but wasn't obvious when reading the documentation.

The manual deep comparison code is hard to maintain (would need to be updated in kubernetes#125488) and error prone. In fact, one test case failed when doing a full automatic comparison with cmp.Diff because it wasn't setting allMemory.

This behavior was useful for kubernetes#125488 but wasn't obvious when reading the documentation.

k8s-ci-robot requested review from bart0sh and chendave June 13, 2024 13:16

This was referenced Jun 14, 2024

DRA: structured parameters kubernetes/enhancements#4381

Open

DRA: control plane controller ("classic DRA") kubernetes/enhancements#3063

Open

pohly force-pushed the dra-1.31 branch from aa4ef25 to 283fe0e Compare June 18, 2024 11:13

pohly and others added 10 commits July 22, 2024 18:09

DRA: update helper packages

20f98f3

Publishing ResourceSlices now supports network-attached devices and the new v1alpha3 API. The logic for splitting up across different slices is missing.

DRA e2e: adapt to v1alpha3 API

0b62bfb

DRA resource claim controller: update test

0fc78b9

The resource claim controller is completely agnostic to the claim spec. It doesn't care about classes or devices, therefore it needs no changes in 1.31 besides the v1alpha2 -> v1alpha3 renaming from a previous commit.

pohly force-pushed the dra-1.31 branch from 43567fc to d11b58e Compare July 22, 2024 16:09

Huang-Wei reviewed Jul 22, 2024

View reviewed changes

k8s-ci-robot assigned klueska Jul 22, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 22, 2024

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2024

k8s-ci-robot merged commit d21b172 into kubernetes:master Jul 22, 2024
22 checks passed

k8s-ci-robot added this to the v1.31 milestone Jul 22, 2024

pohly added a commit to pohly/kubernetes that referenced this pull request Jul 31, 2024

scheduler: document behavior of Error status returned by Filter

d71d59b

This behavior was useful for kubernetes#125488 but wasn't obvious when reading the documentation.

pohly mentioned this pull request Jul 31, 2024

scheduler: document behavior of Error status returned by Filter #126476

Merged

hungnguyen243 pushed a commit to hungnguyen243/kubernetes that referenced this pull request Aug 16, 2024

scheduler: document behavior of Error status returned by Filter

db7cfc2

This behavior was useful for kubernetes#125488 but wasn't obvious when reading the documentation.

SoulPancake pushed a commit to SoulPancake/k8s that referenced this pull request Sep 11, 2024

scheduler: document behavior of Error status returned by Filter

b7209c7

This behavior was useful for kubernetes#125488 but wasn't obvious when reading the documentation.

klueska mentioned this pull request Sep 16, 2024

DRA API: consumable capacity in v1beta1. #127386

Closed

thockin mentioned this pull request Oct 9, 2024

KEP-4816: DRA Prioritized Alternatives in Device Requests kubernetes/enhancements#4871

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRA for 1.31 #125488

DRA for 1.31 #125488

pohly commented Jun 13, 2024 •

edited

Loading

k8s-ci-robot commented Jun 13, 2024

k8s-ci-robot commented Jun 13, 2024

Huang-Wei left a comment

pohly commented Jul 22, 2024

klueska commented Jul 22, 2024 •

edited

Loading

k8s-ci-robot commented Jul 22, 2024

k8s-ci-robot commented Jul 22, 2024

DRA for 1.31 #125488

DRA for 1.31 #125488

Conversation

pohly commented Jun 13, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Jun 13, 2024

k8s-ci-robot commented Jun 13, 2024

Huang-Wei left a comment

Choose a reason for hiding this comment

pohly commented Jul 22, 2024

klueska commented Jul 22, 2024 • edited Loading

k8s-ci-robot commented Jul 22, 2024

k8s-ci-robot commented Jul 22, 2024

pohly commented Jun 13, 2024 •

edited

Loading

klueska commented Jul 22, 2024 •

edited

Loading