Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add defaults configurable runtimeClassName #15271

Conversation

BobyMCbobs
Copy link
Member

@BobyMCbobs BobyMCbobs commented May 29, 2024

allow the setting of Pod RuntimeClassName via defaults

Fixes #12729

Proposed Changes

  • add configuration in config-defaults for setting RuntimeClassName via selectors

Release Note

Pod runtime-class-names can be set to run Services which matches specified labels with an available RuntimeClass. This feature can be configured with the property `runtime-class-name` in the config-deployment configmap.

inspired by

func NewDomainFromConfigMap(configMap *corev1.ConfigMap) (*Domain, error) {
c := Domain{Domains: map[string]DomainConfig{}}
hasDefault := false
for k, v := range configMap.Data {
if k == configmap.ExampleKey {
continue
}
internalConfig := domainInternalConfig{}
err := yaml.Unmarshal([]byte(v), &internalConfig)
if err != nil {
return nil, err
}
if len(internalConfig.Selector) == 0 {
hasDefault = true
internalConfig.Type = DomainTypeWildcard
}
c.Domains[k] = DomainConfig{
Selector: &LabelSelector{Selector: internalConfig.Selector},
Type: internalConfig.Type,
}
}
if !hasDefault {
c.Domains[DefaultDomain] = DomainConfig{Selector: &LabelSelector{}, Type: DomainTypeWildcard}
}
return &c, nil
}
// LookupDomainForLabels returns a domain given a set of labels.
// Since we reject configuration without a default domain, this should
// always return a value.
func (c *Domain) LookupDomainForLabels(labels map[string]string) string {
domain := ""
specificity := -1
// If we see VisibilityLabelKey sets with VisibilityClusterLocal, that
// will take precedence and the route will get a Cluster's Domain Name.
if labels[networking.VisibilityLabelKey] == serving.VisibilityClusterLocal {
return "svc." + network.GetClusterDomainName()
}
for k, v := range c.Domains {
// Ignore if selector doesn't match, or decrease the specificity.
if !v.Selector.Matches(labels) || v.Selector.specificity() < specificity {
continue
}
if v.Selector.specificity() > specificity || strings.Compare(k, domain) < 0 {
domain = k
specificity = v.Selector.specificity()
}
}
return domain
}

@knative-prow knative-prow bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 29, 2024
@knative-prow knative-prow bot requested review from dprotaso and ReToCode May 29, 2024 05:19
@BobyMCbobs
Copy link
Member Author

Some other things I can add, please advise your thoughts:

  • Determine if RuntimeClass exists (webhook)
  • e2e test for values set in Pod on live cluster

Copy link

codecov bot commented May 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.58%. Comparing base (09b4cd3) to head (f0832f4).
Report is 32 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15271      +/-   ##
==========================================
- Coverage   84.79%   84.58%   -0.22%     
==========================================
  Files         218      219       +1     
  Lines       13504    13584      +80     
==========================================
+ Hits        11451    11490      +39     
- Misses       1687     1727      +40     
- Partials      366      367       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BobyMCbobs BobyMCbobs force-pushed the add-runtimeclassname-defaults-configuration branch from 400da99 to dd292b5 Compare May 29, 2024 05:36
@dprotaso
Copy link
Member

dprotaso commented Jun 4, 2024

I think we should follow what we've been doing with podAffinity here - #15250

Specifically defaulting runtimeClass seems like an operator concern so I don't think we should expose that knob was set to the user.

  1. Move the option to config-deployment
  2. Not default when creating/updating the service/config/revision but instead apply the defaults when we reconcile the revision to a deployment.

@dprotaso
Copy link
Member

dprotaso commented Jun 4, 2024

Determine if RuntimeClass exists (webhook)

Currently, how does setting runtime class surface up through a pod up to a deployment?

e2e test for values set in Pod on live cluster

I think a unit test is fine - given that it's simply a passthrough to the PodSpec vs. something more complex

@BobyMCbobs BobyMCbobs force-pushed the add-runtimeclassname-defaults-configuration branch from dd292b5 to f63e7f9 Compare June 5, 2024 19:48
@BobyMCbobs
Copy link
Member Author

BobyMCbobs commented Jun 5, 2024

@dprotaso, thank you kindly for your review.

I think we should follow what we've been doing with podAffinity here - #15250

1. Move the option to `config-deployment`

2. Not default when creating/updating the service/config/revision but instead apply the defaults when we reconcile the revision to a deployment.

This makes a lot more sense and I've updated my changes accordingly.

Specifically defaulting runtimeClass seems like an operator concern so I don't think we should expose that knob was set to the user.

Currently, that's defaulted to a non-accessible field in the Service spec, which makes sense.

Determine if RuntimeClass exists (webhook)

Currently, how does setting runtime class surface up through a pod up to a deployment?

With a Pod, it is immediately rejected. With a Deployment, there's a condition like the following:

  - lastTransitionTime: "2024-06-05T19:53:46Z"
    lastUpdateTime: "2024-06-05T19:53:46Z"
    message: 'pods "nginx-7c4446d454-" is forbidden: pod rejected: RuntimeClass "kata"
      not found'
    reason: FailedCreate
    status: "True"
    type: ReplicaFailure

e2e test for values set in Pod on live cluster

I think a unit test is fine - given that it's simply a passthrough to the PodSpec vs. something more complex

Yeah this is reasonable.

@BobyMCbobs BobyMCbobs force-pushed the add-runtimeclassname-defaults-configuration branch from f63e7f9 to 9924b19 Compare June 5, 2024 20:16
@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2024
@BobyMCbobs BobyMCbobs force-pushed the add-runtimeclassname-defaults-configuration branch from 9924b19 to 136d743 Compare June 6, 2024 20:39
@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2024
@BobyMCbobs BobyMCbobs force-pushed the add-runtimeclassname-defaults-configuration branch 2 times, most recently from ad884ce to e03d298 Compare June 6, 2024 20:55
pkg/deployment/config.go Outdated Show resolved Hide resolved
@BobyMCbobs BobyMCbobs force-pushed the add-runtimeclassname-defaults-configuration branch 5 times, most recently from 18b6943 to d639d68 Compare June 8, 2024 00:17
if runtimeClassName == "" {
return nil
}
return ptr.String(runtimeClassName)
Copy link
Contributor

@skonto skonto Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to validate with https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/validation.go#L314 and fail fast?
For example right now random names pass eg. "runtime-class-name": "_ : {}", although they should print:

error : [a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')]

Tested with

{
		name:    "runtime class name should fail",
		wantErr: false,
		data: map[string]string{
			QueueSidecarImageKey: defaultSidecarImage,
			"runtime-class-name": "_ : {}",
		},
		wantConfig: &Config{
			DigestResolutionTimeout:        digestResolutionTimeoutDefault,
			ProgressDeadline:               ProgressDeadlineDefault,
			QueueSidecarCPURequest:         &QueueSidecarCPURequestDefault,
			QueueSidecarImage:              defaultSidecarImage,
			QueueSidecarTokenAudiences:     sets.New(""),
			RegistriesSkippingTagResolving: sets.New("kind.local", "ko.local", "dev.local"),
			RuntimeClassNames:              map[string]RuntimeClassNameLabelSelector{"_": RuntimeClassNameLabelSelector{}},
			DefaultAffinityType:            defaultAffinityTypeValue,
		},
	},

in config_test.

Copy link
Member Author

@BobyMCbobs BobyMCbobs Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skonto, thank you for your thoughts. More validation is essential and will make it more robust.

I've tried adding in

+			if err := apivalidation.ValidateRuntimeClassName(class, field.NewPath(".data."+RuntimeClassNameKey)); err != nil {
+				return nil, fmt.Errorf("%v invalid: %v", RuntimeClassNameKey, err)
+			}

but now have an import failure. Haven't gotten one of these errors in a bit

$ go mod tidy
go: finding module for package k8s.io/kubernetes/pkg/apis/core/validation
go: found k8s.io/kubernetes/pkg/apis/core/validation in k8s.io/kubernetes v1.30.2
go: downloading k8s.io/component-helpers v0.0.0
go: downloading k8s.io/kubelet v0.0.0
go: downloading k8s.io/cloud-provider v0.0.0
go: finding module for package k8s.io/client-go/features
go: found k8s.io/client-go/features in k8s.io/client-go v0.30.2
go: knative.dev/serving/pkg/deployment imports
        k8s.io/kubernetes/pkg/apis/core/validation imports
        k8s.io/component-helpers/node/util/sysctl: reading k8s.io/component-helpers/go.mod at revision v0.0.0: unknown revision v0.0.0
go: knative.dev/serving/pkg/deployment imports
        k8s.io/kubernetes/pkg/apis/core/validation imports
        k8s.io/component-helpers/scheduling/corev1: reading k8s.io/component-helpers/go.mod at revision v0.0.0: unknown revision v0.0.0
go: knative.dev/serving/pkg/deployment imports
        k8s.io/kubernetes/pkg/apis/core/validation imports
        k8s.io/kubelet/pkg/apis: reading k8s.io/kubelet/go.mod at revision v0.0.0: unknown revision v0.0.0
go: knative.dev/serving/pkg/deployment imports
        k8s.io/kubernetes/pkg/apis/core/validation imports
        k8s.io/kubernetes/pkg/cluster/ports imports
        k8s.io/cloud-provider/options: reading k8s.io/cloud-provider/go.mod at revision v0.0.0: unknown revision v0.0.0

from adding the dependencies

	"k8s.io/apimachinery/pkg/util/validation/field"
	apivalidation "k8s.io/kubernetes/pkg/apis/core/validation"

I suspect it is something with version pinning and will continue investigating further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just import apimachineryvalidation.NameIsDNSSubdomain(name, false) and it should be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in 4adbced

wantErr: true,
data: map[string]string{
QueueSidecarImageKey: defaultSidecarImage,
"runtime-class-name": ` ???; 231424 `,
Copy link
Contributor

@skonto skonto Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear the error here is captured due to yaml unmarshaling:

error unmarshaling JSON: while decoding JSON: json: cannot unmarshal string into Go value of type map[string]deployment.RuntimeClassNameLabelSelector

and does not cover the K8s validation aspects see comment above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to implement the use of ValidateRuntimeClassName as suggested.
If the YAML cannot be parsed, how should the validation be handled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to emphasize that some K8s validation might be needed. After yaml is parsed we can check for a valid name according to K8s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it makes a lot of sense to have it.

if !v.Matches(lbs) || v.specificity() < specificity {
continue
}
if v.specificity() > specificity || strings.Compare(k, runtimeClassName) < 0 {
Copy link
Contributor

@skonto skonto Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is string comparison a rule to select a class when they have equal specificity?
Looks a bit ad hoc imho. Btw could you add a test for this and some scenario with multiple runtime classes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation comes from the existing domain selection code for a Service

func NewDomainFromConfigMap(configMap *corev1.ConfigMap) (*Domain, error) {
c := Domain{Domains: map[string]DomainConfig{}}
hasDefault := false
for k, v := range configMap.Data {
if k == configmap.ExampleKey {
continue
}
internalConfig := domainInternalConfig{}
err := yaml.Unmarshal([]byte(v), &internalConfig)
if err != nil {
return nil, err
}
if len(internalConfig.Selector) == 0 {
hasDefault = true
internalConfig.Type = DomainTypeWildcard
}
c.Domains[k] = DomainConfig{
Selector: &LabelSelector{Selector: internalConfig.Selector},
Type: internalConfig.Type,
}
}
if !hasDefault {
c.Domains[DefaultDomain] = DomainConfig{Selector: &LabelSelector{}, Type: DomainTypeWildcard}
}
return &c, nil
}
// LookupDomainForLabels returns a domain given a set of labels.
// Since we reject configuration without a default domain, this should
// always return a value.
func (c *Domain) LookupDomainForLabels(labels map[string]string) string {
domain := ""
specificity := -1
// If we see VisibilityLabelKey sets with VisibilityClusterLocal, that
// will take precedence and the route will get a Cluster's Domain Name.
if labels[networking.VisibilityLabelKey] == serving.VisibilityClusterLocal {
return "svc." + network.GetClusterDomainName()
}
for k, v := range c.Domains {
// Ignore if selector doesn't match, or decrease the specificity.
if !v.Selector.Matches(labels) || v.Selector.specificity() < specificity {
continue
}
if v.Selector.specificity() > specificity || strings.Compare(k, domain) < 0 {
domain = k
specificity = v.Selector.specificity()
}
}
return domain
}
.

TLDR; the one with most matching labels wins.

Good thinking. I've added two tests for this behaviour in 4adbced#diff-2a14f9c49eb40c3a55958c8e2d57523cd618232821a9d310c1ddd7cc64d91ef9R476-R517

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR; the one with most matching labels wins.

Also a classname can't have a label (key, value) that does not exist in the service labels.

On top of that, if you have two class names that have the same specificity, lexicographic order in class names is what matters:
("three" < "two"):

{
		name: "priority with multiple label selectors and three labels set",
		serviceLabels: map[string]string{
			"needs-two":   "yes",
			"needs-three": "yes",
			"needs-four":  "yes",
		},
		runtimeClassNames: map[string]RuntimeClassNameLabelSelector{
			"one": {},
			"two": {
				Selector: map[string]string{
					"needs-two":   "yes",
					"needs-three": "yes",
				},
			},
			"three": {
				Selector: map[string]string{
					"needs-two":   "yes",
					"needs-three": "yes",
				},
			},
		},
		want: ptr.String("three"),
	},

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a classname can't have a label (key, value) that does not exist in the service labels.

I'm not sure I understand. Are you meaning to validate a selector against the labels of Services that exist?
If so, that wouldn't make a lot of sense to me.

On top of that, if you have two class names that have the same specificity, lexicographic order in class names is what matters:
("three" < "two"):

Yes that is true, but I would think an uncommon practical issue.

Do you have any alternative patterns for selected which I may consider?

Copy link
Contributor

@skonto skonto Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand.

		serviceLabels: map[string]string{
			"needs-two":   "yes",
			"needs-three": "yes",
			"needs-four":  "yes",
		},
		runtimeClassNames: map[string]RuntimeClassNameLabelSelector{
			"one": {},
			"two": {
				Selector: map[string]string{
					"needs-two":   "yes",
					"needs-five": "yes",
				},
			},

two is ignored due to the v.MatchLabels condition. Matches has this loop that checks against the service labels:

	for label, expectedValue := range s.Selector {
		value, ok := labels[label]
		if !ok || expectedValue != value {
			return false
		}
	}

I am trying to summarize the key points for the logic. It is not just "the one with most matching labels wins."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any alternative patterns for selected which I may consider?

I am ok as long as we document it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That appears to be correct behaviour, the service labels in the above sample don't include "need-five".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skonto
Copy link
Contributor

skonto commented Jun 14, 2024

The feature needs proper documentation as well. Pls create a PR against knative/docs after this is merged.

@BobyMCbobs BobyMCbobs force-pushed the add-runtimeclassname-defaults-configuration branch 2 times, most recently from 4adbced to 3726752 Compare June 17, 2024 20:09
func (d Config) PodRuntimeClassName(lbs map[string]string) *string {
runtimeClassName := ""
specificity := -1
for k, v := range d.RuntimeClassNames {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create a func for calculating specificity and also unify types: LabelSelector (domain) and RuntimeClassNameLabelSelector to avoid duplication?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say do it in a follow up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skonto, apologies since I missed this message. Agree, that would be helpful!

@skonto
Copy link
Contributor

skonto commented Jun 19, 2024

@dprotaso gentle ping for review.

Comment on lines 240 to 242
if slices.Contains(apimachineryvalidation.NameIsDNSSubdomain(class, false), class) {
return nil, fmt.Errorf("%v %v selector not valid DNSSubdomain", RuntimeClassNameKey, class)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing and it doesn't seem right.

Looking at the code any string returned by apimachineryvalidation.NameIsDNSSubdomain implies there's an error.

https://github.com/kubernetes/apimachinery/blob/37988e577e160bf7749437141fee1d6b8ca15dd6/pkg/util/validation/validation.go#L215-L223

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having some weird results with the function.

package main

import (
        "fmt"

        apimachineryvalidation "k8s.io/apimachinery/pkg/api/validation"
)

func main() {
        for _, v := range []string{"a", "gvisor", "kata"} {
                fmt.Println(v, apimachineryvalidation.NameIsDNSSubdomain(v, false))
        }
}

results in

🐚(2) go run .
a []
gvisor []
kata []

looking at the tests, a should succeed
https://github.com/kubernetes/apimachinery/blob/master/pkg/util/validation/validation_test.go#L55

I set it how it is currently around the tests, as the tests seems to match up right.
Happy to update and understand what's happening here!

Copy link
Member

@dprotaso dprotaso Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty array means it's valid - so I think we should just error out if there is a non-empty array returned

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, revisited it again and according to the tests the behaviour should be in place now.
Updated in 6b6ed60

@BobyMCbobs BobyMCbobs force-pushed the add-runtimeclassname-defaults-configuration branch from 3726752 to 6b6ed60 Compare June 25, 2024 06:48
@@ -175,6 +228,19 @@ func NewConfigFromMap(configMap map[string]string) (*Config, error) {
return nil, fmt.Errorf("unsupported %s value: %q", defaultAffinityTypeKey, affinity)
}
}
if err := yaml.Unmarshal([]byte(runtimeClassNames), &nc.RuntimeClassNames); err != nil {
return nil, fmt.Errorf("%v cannot be parsed, please check the format", RuntimeClassNameKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should include the error messages in the error we are returning

Copy link
Member Author

@BobyMCbobs BobyMCbobs Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking. Updated in fa369bb f0832f4

@BobyMCbobs BobyMCbobs force-pushed the add-runtimeclassname-defaults-configuration branch 2 times, most recently from fa369bb to 0cc8edc Compare June 25, 2024 19:51
allow the setting of Pod RuntimeClassName via defaults
@BobyMCbobs BobyMCbobs force-pushed the add-runtimeclassname-defaults-configuration branch from 0cc8edc to f0832f4 Compare June 25, 2024 19:52
@dprotaso
Copy link
Member

/lgtm
/approve

thanks for all the changes!

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jun 25, 2024
Copy link

knative-prow bot commented Jun 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BobyMCbobs, dprotaso

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 25, 2024
@BobyMCbobs
Copy link
Member Author

/lgtm /approve

thanks for all the changes!

@dprotaso @skonto, thank you kindly for your patience and review! Excited to have this in!

@knative-prow knative-prow bot merged commit 1dff15d into knative:main Jun 25, 2024
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow setting of default runtimeClassName
4 participants