Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add startup probe support to Knative Service #15309

Merged
merged 4 commits into from
Jul 8, 2024

Conversation

ReToCode
Copy link
Member

@ReToCode ReToCode commented Jun 6, 2024

Fixes #10037

Context

see feature document: https://docs.google.com/document/d/1TmimPy7qNLtc5IHVFKEme8X-NiIUBtVgR44GlaTqoWs/edit?usp=sharing

Proposed Changes

  • Adds startup probe support to Knative Service
  • Startup Probes disable the Knative's probe optimisation, same as exec probes do, as they are executed on the user-container by the Kubelet

Release Note

Knative Service now supports setting startup probes in the spec. Please note that this increases the cold-start time of your service (more info in docs).

@knative-prow knative-prow bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 6, 2024
@knative-prow knative-prow bot requested review from dprotaso and skonto June 6, 2024 12:04
@ReToCode
Copy link
Member Author

ReToCode commented Jun 6, 2024

/assign @skonto
/assign @dprotaso
/assign @izabelacg

More info in the links above.

Copy link

codecov bot commented Jun 6, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.61%. Comparing base (a310476) to head (be536f3).
Report is 1 commits behind head on main.

Files Patch % Lines
pkg/activator/net/revision_backends.go 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15309      +/-   ##
==========================================
+ Coverage   84.56%   84.61%   +0.05%     
==========================================
  Files         219      219              
  Lines       13584    13587       +3     
==========================================
+ Hits        11487    11497      +10     
+ Misses       1727     1724       -3     
+ Partials      370      366       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2024
@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2024
@skonto
Copy link
Contributor

skonto commented Jun 19, 2024

@dprotaso gentle ping.

Comment on lines 373 to 390
startupProbeMaxDuration := int32(0)
for _, container := range podSpec.Containers {
if container.StartupProbe != nil {
maxSuccessDuration := container.StartupProbe.PeriodSeconds *
container.StartupProbe.SuccessThreshold *
container.StartupProbe.TimeoutSeconds

maxFailDuration := container.StartupProbe.PeriodSeconds *
container.StartupProbe.FailureThreshold *
container.StartupProbe.TimeoutSeconds

maxDuration := max(maxSuccessDuration, maxFailDuration)
if maxDuration > startupProbeMaxDuration {
startupProbeMaxDuration = container.StartupProbe.InitialDelaySeconds + maxDuration
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorta inclined to think that the progressdeadline encapsulates all the (startup time + ready time).

So I'm thinking of not adding this extra time here just because there's a startup probe.

Copy link
Contributor

@skonto skonto Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can omit that and keep the K8s UX. It seems that the only validation that happens at the K8s side for progressdeadline is about MinReadySeconds, https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/apps/validation/validation.go#L592. I am wondering how that field affects our stuff in case would want to set it (right now it is not exposed). 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I think we can, just means a user has to be aware of it and set that field himself accordingly. As startup probes are potentially used for slow-starting things the increase is quite necessary. Otherwise we will never see the pod starting up. Im fine with both ways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dprotaso @skonto I removed the increase. I'll add it to the docs. PTAL.

@knative-prow knative-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 8, 2024
@dprotaso
Copy link
Member

dprotaso commented Jul 8, 2024

/lgtm
/approve

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jul 8, 2024
Copy link

knative-prow bot commented Jul 8, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, ReToCode

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 8, 2024
@dprotaso
Copy link
Member

dprotaso commented Jul 8, 2024

A follow question I have is - what happens when a user container restarts and it has a startup probe?

@knative-prow knative-prow bot merged commit ac7f585 into knative:main Jul 8, 2024
67 checks passed
@ReToCode
Copy link
Member Author

ReToCode commented Jul 9, 2024

A follow question I have is - what happens when a user container restarts and it has a startup probe?

I think K8s runs the startup probe on every container creating (including a restarting one): kubernetes/kubernetes#102230

ReToCode added a commit to ReToCode/serving that referenced this pull request Jul 9, 2024
* Add startup probe to CRDs

* Allow startup probes in Knative Services

* Increase ProgressDeadlineSeconds when startup probe is present

* Remove ProgressDeadlineSeconds increase when startup probe is present

(cherry picked from commit ac7f585)
openshift-merge-bot bot pushed a commit to openshift-knative/serving that referenced this pull request Jul 9, 2024
* Add startup probe support to Knative Service (knative#15309)

* Add startup probe to CRDs

* Allow startup probes in Knative Services

* Increase ProgressDeadlineSeconds when startup probe is present

* Remove ProgressDeadlineSeconds increase when startup probe is present

(cherry picked from commit ac7f585)

* Run make generated files
@dprotaso
Copy link
Member

dprotaso commented Jul 9, 2024

A follow question I have is - what happens when a user container restarts and it has a startup probe?

I think K8s runs the startup probe on every container creating (including a restarting one): kubernetes/kubernetes#102230

I guess to confirm does Knative behave correctly with your changes?

@ReToCode
Copy link
Member Author

ReToCode commented Jul 10, 2024

I guess to confirm does Knative behave correctly with your changes?

I think it looks good: https://github.com/ReToCode/knative-multicontainer-probing/blob/main/TESTING_STARTUP_PROBES.md#testing-startup-probe-with-liveness-failures

# Logs of the user container
user-container Liveness probe called, responding with:  true
user-container Liveness is now:  false
user-container Liveness probe called, responding with:  false
user-container Liveness probe called, responding with:  false
user-container Liveness probe called, responding with:  false
queue-proxy {"severity":"INFO","timestamp":"2024-07-10T07:18:02.386311375Z","logger":"queueproxy","caller":"sharedmain/handlers.go:109","message":"Attached drain handler from user-container&{GET /wait-for-drain HTTP/1.1 1 1 map[Accept:[*/*] Accept-Encoding:[gzip] User-Agent:[kube-lifecycle/1.28]] {} <nil> 0 ] false 10.244.2.47:8022 map] map] <nil> map] 10.244.2.1:57410 /wait-for-drain <nil> <nil> <nil> 0x4000289cc0 0x400013f260 ] map]}","commit":"2156812","knative.dev/key":"default/runtime-00001","knative.dev/pod":"runtime-00001-deployment-5d6cf5bbc9-brpdg"}
Stream closed EOF for default/runtime-00001-deployment-5d6cf5bbc9-brpdg (user-container) # restarted by K8s

# restarted user-container
user-container Starting server. Listening on port:  8080
queue-proxy {"severity":"INFO","timestamp":"2024-07-10T07:11:02.57859614Z","logger":"queueproxy","caller":"sharedmain/main.go:271","message":"Starting queue-proxy","commit":"2156812","knative.dev/key":"default/runtime-00001","knative.dev/pod":"runtime-00001-deployment-5d6cf5bbc9-brpdg"}
queue-proxy {"severity":"INFO","timestamp":"2024-07-10T07:11:02.57867039Z","logger":"queueproxy","caller":"sharedmain/main.go:277","message":"Starting http server metrics:9090","commit":"2156812","knative.dev/key":"default/runtime-00001","knative.dev/pod":"runtime-00001-deployment-5d6cf5bbc9-brpdg"}
queue-proxy {"severity":"INFO","timestamp":"2024-07-10T07:11:02.57870039Z","logger":"queueproxy","caller":"sharedmain/main.go:277","message":"Starting http server admin:8022","commit":"2156812","knative.dev/key":"default/runtime-00001","knative.dev/pod":"runtime-00001-deployment-5d6cf5bbc9-brpdg"}
queue-proxy {"severity":"INFO","timestamp":"2024-07-10T07:11:02.578705224Z","logger":"queueproxy","caller":"sharedmain/main.go:277","message":"Starting http server main:8012","commit":"2156812","knative.dev/key":"default/runtime-00001","knative.dev/pod":"runtime-00001-deployment-5d6cf5bbc9-brpdg"}
user-container Startup probe called, responding with:  false
user-container Startup probe called, responding with:  false

After the user-container is restarted, the startup probe is executed again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support startupProbe in webhook
5 participants