-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-container probing #14853
Add multi-container probing #14853
Conversation
cc @skonto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some inline comments to help with reviews
# | ||
# WARNING: Cannot safely be disabled once enabled. | ||
# See: https://knative.dev/docs/serving/configuration/feature-flags/#multiple-container-probing | ||
multi-container-probing: "disabled" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might not just want to enable this when multi-container
is enabled, as it will change the QP spec (array instead of single probe) which causes a redeploy of all Knative Services, thus a new flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it is a good idea for people to test this and then we should remove the flag once this is mature enough. We need to add this to the docs under https://knative.dev/docs/serving/configuration/feature-flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added docs in knative/docs#5855
pkg/apis/serving/k8s_validation.go
Outdated
} | ||
} else if cfg.Features.MultiContainerProbing == config.Enabled { | ||
// Liveness Probes | ||
errs = errs.Also(validateProbe(container.LivenessProbe, nil, true).ViaField("livenessProbe")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if enabled, we validate & default the probes as on the user-container
pkg/apis/serving/k8s_validation.go
Outdated
if getPort.StrVal != "" && getPort.StrVal != port.Name { | ||
errs = errs.Also(apis.ErrInvalidValue(getPort.String(), "httpGet.port", "Probe port must match container port")) | ||
if !isSidecar { | ||
getPort := h.HTTPGet.Port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on sidecars we do not want to rewrite the port to the container-port.
@@ -132,9 +132,10 @@ func (rs *RevisionSpec) applyDefault(ctx context.Context, container *corev1.Cont | |||
// If there are multiple containers then default probes will be applied to the container where user specified PORT | |||
// default probes will not be applied for non serving containers | |||
if len(rs.PodSpec.Containers) == 1 || len(container.Ports) != 0 { | |||
rs.applyProbesWithDefaults(container) | |||
rs.applyGRPCProbeDefaults(container) | |||
rs.applyUserContainerDefaultReadinessProbe(container) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only enforce a readiness probe on the user-container, we do not enforce it on sidecars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this ServingContainer
} | ||
|
||
func (*RevisionSpec) applyReadinessProbeDefaults(container *corev1.Container) { | ||
if container.ReadinessProbe == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if a readiness probe is defined on sidecars, we want the same defaulting behaviour as for the UC
@@ -256,6 +265,19 @@ func makeContainer(container corev1.Container, rev *v1.Revision) corev1.Containe | |||
if container.TerminationMessagePolicy == "" { | |||
container.TerminationMessagePolicy = corev1.TerminationMessageFallbackToLogsOnError | |||
} | |||
|
|||
if container.ReadinessProbe != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want this on on sidecars as well, thus moved from makeServingContainer
userProbe := container.ReadinessProbe.DeepCopy() | ||
applyReadinessProbeDefaultsForExec(userProbe, probePort) | ||
|
||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this block out, as we now want to assemble multiple probes
case sc.ReadinessProbe.GRPC != nil && sc.ReadinessProbe.GRPC.Port > 0: | ||
probePort = sc.ReadinessProbe.GRPC.Port | ||
default: | ||
return nil, fmt.Errorf("sidecar readiness probe does not define probe port on container: %s", sc.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sidecars explicitly need to define on what port, we cannot just rely on defaulting to user-port on sidecars.
} | ||
|
||
} else { | ||
readinessProbeJSON, err = readiness.EncodeSingleProbe(userContainerReadinessProbe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if disabled, we keep the env var as is
Ports: ports, | ||
StartupProbe: execProbe, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
execProbe
was never set - it confused me so I removed it here
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #14853 +/- ##
==========================================
- Coverage 85.81% 85.74% -0.07%
==========================================
Files 198 198
Lines 15117 15226 +109
==========================================
+ Hits 12972 13055 +83
- Misses 1824 1844 +20
- Partials 321 327 +6 ☔ View full report in Codecov by Sentry. |
}), | ||
data: map[string]string{ | ||
"multi-container-probing": "Disabled", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency reasons I think you need another test with name: "multi-container-probing Enabled".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically what we have for other features is this:
{
name: "kubernetes.podspec-topologyspreadconstraints Allowed",
wantErr: false,
wantFeatures: defaultWith(&Features{
PodSpecTopologySpreadConstraints: Allowed,
}),
data: map[string]string{
"kubernetes.podspec-topologyspreadconstraints": "Allowed",
},
}, {
name: "kubernetes.podspec-topologyspreadconstraints Enabled",
wantErr: false,
wantFeatures: defaultWith(&Features{
PodSpecTopologySpreadConstraints: Enabled,
}),
data: map[string]string{
"kubernetes.podspec-topologyspreadconstraints": "Enabled",
},
}, {
name: "kubernetes.podspec-topologyspreadconstraints Disabled",
wantErr: false,
wantFeatures: defaultWith(&Features{
PodSpecTopologySpreadConstraints: Disabled,
}),
data: map[string]string{
"kubernetes.podspec-topologyspreadconstraints": "Disabled",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied the "multi-container" one. It has only two cases. I think because it already tests Enabled in the features Enabled
test. But I'm fine with adding that as well.
// Liveness Probes | ||
errs = errs.Also(validateProbe(container.LivenessProbe, port).ViaField("livenessProbe")) | ||
errs = errs.Also(validateProbe(container.LivenessProbe, &port, true).ViaField("livenessProbe")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also pass context to validateProbe and define a simple func IsUserContainer(ctx). We use that pattern elsewhere in this file. In this case it is simple to pass the flag but in general it might be better in cases where we have multiple conditions. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe a bit overkill here for one private only bool param 🤷
@@ -798,7 +807,7 @@ func validateReadinessProbe(p *corev1.Probe, port corev1.ContainerPort) *apis.Fi | |||
return errs | |||
} | |||
|
|||
func validateProbe(p *corev1.Probe, port corev1.ContainerPort) *apis.FieldError { | |||
func validateProbe(p *corev1.Probe, port *corev1.ContainerPort, isUserContainer bool) *apis.FieldError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few functions we could use to do some early checking from "k8s.io/apimachinery/pkg/util/validation", we already import that:
validation.IsValidPortNum()
validation.IsValidPortName()
validation is the same pkg K8s project uses to do validation. So I am wondering if we want to fail fast and if so which part of the logic here: https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/validation.go#L3057 we want to re-use or copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better addressed in a separate issue/PR? This one is about adding the same logic to multiple containers. When we use K8s functionality, we change also the existing validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}}, | ||
}, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add another test case to compare against liveness probes for which we don't do any defaulting:
{
name: "multiple containers with probes and overrides",
in: &Revision{
Spec: RevisionSpec{
PodSpec: corev1.PodSpec{
Containers: []corev1.Container{{
ReadinessProbe: &corev1.Probe{
ProbeHandler: corev1.ProbeHandler{
Exec: &corev1.ExecAction{
Command: []string{"echo", "hi"},
},
},
},
}, {
LivenessProbe: &corev1.Probe{
PeriodSeconds: 10,
ProbeHandler: corev1.ProbeHandler{
TCPSocket: &corev1.TCPSocketAction{
Host: "127.0.0.2",
},
},
},
}},
},
},
},
want: &Revision{
Spec: RevisionSpec{
TimeoutSeconds: ptr.Int64(config.DefaultRevisionTimeoutSeconds),
ContainerConcurrency: ptr.Int64(config.DefaultContainerConcurrency),
PodSpec: corev1.PodSpec{
Containers: []corev1.Container{{
Name: config.DefaultUserContainerName + "-0",
Resources: defaultResources,
ReadinessProbe: &corev1.Probe{
SuccessThreshold: 1,
ProbeHandler: corev1.ProbeHandler{
Exec: &corev1.ExecAction{
Command: []string{"echo", "hi"},
},
},
},
}, {
Name: config.DefaultUserContainerName + "-1",
Resources: defaultResources,
LivenessProbe: &corev1.Probe{
SuccessThreshold: 0, // Not touched only for readiness probes
PeriodSeconds: 10,
ProbeHandler: corev1.ProbeHandler{
TCPSocket: &corev1.TCPSocketAction{
Host: "127.0.0.2",
},
},
},
}},
},
},
},
}
Also should we care about proper defaulting to avoid issues reported in: #13204?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also argue to tackle this in a separate issue/PR. I'd rather not change existing behaviour in this PR.
t.Fatalf("Failed to create initial Service: %v: %v", names.Service, err) | ||
} | ||
|
||
url := resources.Route.Status.URL.URL() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use something similar to:
// Validate State after Creation
if err := validateControlPlane(t, clients, names, "1" /*1 is the expected generation value*/); err != nil {
t.Error(err)
}
if err := validateDataPlane(t, clients, names, test.MultiContainerResponse); err != nil {
t.Error(err)
}
for example you can get the url from names.URL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is basically this with multiple probes.
I don't think I get this comment. Can you elaborate a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code there testing about the skvc response is already provided elsewhere. You don't need to call pkgTest.CheckEndpointState explicitly. Check the above functions for example you can get the url from names.URL no need to have resources.Route.Status.URL.URL() .
v1test "knative.dev/serving/test/v1" | ||
) | ||
|
||
func TestMultiContainerReadiness(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add more scenarios like triggering a container's readiness failure (sidecar failure), assuming we have scale 1, check service status and then get it back to health status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can, but we should add the same without the feature enabled. Our current e2e-testing of probes in general is lacking, so we should tackle this in a separate issue as it exceeds the scope of this change (We also need to extend the readiness
web-server with endpoints for this and deploy a proxy to do it for non containerPort ports, like what I have on my GH).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> #14869
@dprotaso gentle ping. |
Thanks @ReToCode ! Looks good to me on a first sight, I'm wondering what still needs to be done to get it merged ? 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small question and non-blocking nits
Also note mutli-container error reporting is broken. See this related PR/issue #14744
}, | ||
}, | ||
}, | ||
want: apis.ErrDisallowedFields("livenessProbe", "readinessProbe", "readinessProbe.failureThreshold", "readinessProbe.periodSeconds", "readinessProbe.successThreshold", "readinessProbe.timeoutSeconds"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this list isn't just livenessProbe
and readiness
probe? The extra fields seem in the error seem noisy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, that error handling existed before: https://github.com/knative/serving/pull/14853/files#diff-874e991d6178734f4fec8bb5d3823e1e1582d313e761c21c390a78c2aa91477cL507.
@@ -132,9 +132,10 @@ func (rs *RevisionSpec) applyDefault(ctx context.Context, container *corev1.Cont | |||
// If there are multiple containers then default probes will be applied to the container where user specified PORT | |||
// default probes will not be applied for non serving containers | |||
if len(rs.PodSpec.Containers) == 1 || len(container.Ports) != 0 { | |||
rs.applyProbesWithDefaults(container) | |||
rs.applyGRPCProbeDefaults(container) | |||
rs.applyUserContainerDefaultReadinessProbe(container) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this ServingContainer
@dprotaso I started renaming to
I think it might be better to stick to that - or rename all of these things in a separate PR. |
sounds good let's do it in a follow up /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dprotaso, ReToCode The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Add multi-container probing * Add e2e test for multi container probing (cherry picked from commit a194cb2)
* Add multi-container probing (knative#14853) * Add multi-container probing * Add e2e test for multi container probing (cherry picked from commit a194cb2) * Run generate-release.sh
Fixes #8949
Proposed Changes
multi-container-probing
to enable the new behaviourSERVING_READINESS_PROBE
to also allow containing a slice of readiness probesProbe
implementation to support multiple checksRelease Note
Documentation
knative/docs#5855