Use user-defined readinessProbe in queue-proxy #4731

joshrider · 2019-07-12T14:33:45Z

Signed-off-by: Shash Reddy shashwathireddy@gmail.com
Co-authored-by: Shash Reddy shashwathireddy@gmail.com

Proposed Changes

add default readiness probe to revision spec when user does not specify one
remove HTTP and TCP readiness probes from user-container when creating deployments, instead translate them into probe performed by queue-proxy against user-container
when user specifies an Exec readiness probe, it will stay on the user-container and the queue-proxy will perform a TCP probe against the user-container to ensure a path is open
have the handler used by the activator (to check that the pod is ready) use the same readiness criteria defined by the user

NOTE: for the activator's probe, we are using the same count of "successful probes" as the pod's usual readiness probe. That is, if the activator and "kubelet" are both probing concurrently and the probe's SuccessThreshold is 4, they will only need 4 consecutive successes collectively (as opposed to 4 each). Please poke holes in this.

Release Note

HTTP and TCP readinessProbes are performed by the queue-proxy against the user-container

googlebot · 2019-07-12T14:33:47Z

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

knative-prow-robot

@joshrider: 0 warnings.

In response to this:

Signed-off-by: Shash Reddy shashwathireddy@gmail.com
Co-authored-by: Shash Reddy shashwathireddy@gmail.com

Fixes #4014

Proposed Changes

add default readiness probe to revision spec when user does not specify one

remove HTTP and TCP readiness probes from user-container when creating deployments, instead translate them into probe performed by queue-proxy against user-container

when user specifies an Exec readiness probe, it will stay on the user-container and the queue-proxy will perform a TCP probe against the user-container to ensure a path is open

have the handler used by the activator (to check that the pod is ready) use the same readiness criteria defined by the user

NOTE: for the activator's probe, we are using the same count of "successful probes" as the pod's usual readiness probe. That is, if the activator and "kubelet" are both probing concurrently and the probe's SuccessThreshold is 4, they will only need 4 consecutive successes collectively (as opposed to 4 each). Please poke holes in this.

Release Note
HTTP and TCP readinessProbes are performed by the queue-proxy against the user-container

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

joshrider · 2019-07-12T15:31:56Z

/test pull-knative-serving-integration-tests

pkg/reconciler/revision/resources/queue.go

knative-metrics-robot · 2019-07-12T17:42:51Z

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/apis/serving/k8s_validation.go	98.9%	98.6%	-0.3
pkg/apis/serving/v1beta1/revision_defaults.go	87.5%	89.5%	2.0

pkg/apis/serving/k8s_validation.go

shashwathi · 2019-07-13T15:36:37Z

/test pull-knative-serving-integration-tests

markusthoemmes

A few flyby comments. I have a really hard time keeping track of what calls what, which probes go where and which retries are applied at which spots.

Do you mind drawing a picture of where we want to apply which retry? The nested retrying feels a little odd to me, maybe there's room for an interim change there as well as this PR is pretty big.

Thanks for doing this though, this is great stuff 🙂

markusthoemmes · 2019-07-15T12:20:25Z

cmd/queue/main.go

-			if probeUserContainer() {
-				// Respond with the name of the component handling the request.
-				w.Write([]byte(queue.Name))
+			if prober != nil {


Maybe in a separate PR: Is there a reason why we don't return the state from healthState here? Seems unnecessarily redundant to probe on this path 🤔

@greghaynes do you need that for your "direct to ip" work?

That seems like an excellent suggestion. 👍

cmd/queue/main.go

pkg/reconciler/revision/resources/queue.go

joshrider · 2019-07-15T13:30:52Z

/test pull-knative-serving-smoke-tests

cmd/queue/main.go

pkg/apis/serving/k8s_validation.go

pkg/reconciler/revision/resources/queue.go

Signed-off-by: Shash Reddy <shashwathireddy@gmail.com> Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>

Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>

- merge logic for knative probes and user defined probes - use probe-period as argument name - pass probe as environment variable instead of container args Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>

- Use context for timeout - do not override exec probe - simplify the logic for errors when multiple probes are mentioned Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>

shashwathi · 2019-07-16T05:30:26Z

@mattmoor : Addressed all your comments. Ready for another review 👍

cmd/queue/main.go

vagababov · 2019-07-16T05:38:08Z

cmd/queue/main.go

-	// started as early as possible while still wanting to give the container some breathing
-	// room to get up and running.
-	timeoutErr := wait.PollImmediate(25*time.Millisecond, timeout, func() (bool, error) {
+	timeoutErr := wait.PollImmediateUntil(aggressivePollInterval, func() (bool, error) {


Though I know Matt suggested it I liked the previous version more, it's shorter :)
🤷‍♀

What'd I suggest?

cmd/queue/main_test.go

Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>

mattmoor · 2019-07-16T14:38:30Z

I think things largely look good. Going to give others a chance to leave comments, but if nothing comes up I'll do a final pass later so we can get this baking. It may be worth checking out the data race failure above, since this PR touches the queue logic. thanks for all the work leading up to this!

joshrider · 2019-07-16T15:02:26Z

Sounds good. Neither of us have been able to recreate that data race locally. Would be curious to hear if someone else knows how it happened.

/test pull-knative-serving-unit-tests

Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>

mattmoor

/lgtm
/approve
🎉

googlebot · 2019-07-16T19:18:57Z

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

ℹ️ Googlers: Go here for more info.

mattmoor · 2019-07-16T19:50:57Z

/approve

knative-prow-robot · 2019-07-16T19:51:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joshrider, mattmoor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/queue/OWNERS~~ [mattmoor]
~~pkg/apis/OWNERS~~ [mattmoor]
~~pkg/queue/OWNERS~~ [mattmoor]
~~pkg/reconciler/OWNERS~~ [mattmoor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This patch makes a tiny fix which removes invalid setting in configuration example. After knative#4731, `periodSeconds` needs to be set with `failureThreshold` and `timeoutSeconds`. This patch simply removes `periodSeconds` from the config.

This patch makes a tiny fix which removes invalid setting in configuration example. After #4731, `periodSeconds` needs to be set with `failureThreshold` and `timeoutSeconds`. This patch simply removes `periodSeconds` from the config.

Since knative/serving#4731, periodSeconds also requires failureThreshold and timeoutSeconds to be set.

googlebot added the cla: no Indicates the PR's author has not signed the CLA. label Jul 12, 2019

knative-prow-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 12, 2019

knative-prow-robot requested review from markusthoemmes and mdemirhan July 12, 2019 14:33

knative-prow-robot reviewed Jul 12, 2019

View reviewed changes

knative-prow-robot added area/API API objects and controllers area/networking labels Jul 12, 2019

joshrider changed the title ~~use user-defined readinessprobe in queue-proxy~~ Use user-defined readinessProbe in queue-proxy Jul 12, 2019

vagababov reviewed Jul 12, 2019

View reviewed changes

pkg/reconciler/revision/resources/queue.go Outdated Show resolved Hide resolved

taragu reviewed Jul 12, 2019

View reviewed changes

pkg/apis/serving/k8s_validation.go Outdated Show resolved Hide resolved

joshrider force-pushed the queue-probe branch from e75388c to 41e095d Compare July 12, 2019 20:41

shashwathi force-pushed the queue-probe branch from 41e095d to 08cd78e Compare July 12, 2019 21:19

joshrider force-pushed the queue-probe branch from 08cd78e to 24a1e5c Compare July 12, 2019 21:25

markusthoemmes reviewed Jul 15, 2019

View reviewed changes

joshrider force-pushed the queue-probe branch from 24a1e5c to dbe0ec8 Compare July 15, 2019 13:50

mattmoor assigned mattmoor and dgerd Jul 15, 2019

shashwathi force-pushed the queue-probe branch from c141077 to a37f22b Compare July 15, 2019 23:54

mattmoor reviewed Jul 16, 2019

View reviewed changes

cmd/queue/main.go Outdated Show resolved Hide resolved

pkg/apis/serving/k8s_validation.go Outdated Show resolved Hide resolved

pkg/reconciler/revision/resources/queue.go Outdated Show resolved Hide resolved

joshrider and others added 6 commits July 15, 2019 21:31

use user-defined readinessprobe in queue-proxy

5f8ac75

Signed-off-by: Shash Reddy <shashwathireddy@gmail.com> Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>

validate probe handler count

b51d131

Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>

move guard to switch

d7ab18d

Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>

remove unreachable branch

08b12d9

Address comments

3f9dbe2

- merge logic for knative probes and user defined probes - use probe-period as argument name - pass probe as environment variable instead of container args Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>

Address comments

7ab8681

- Use context for timeout - do not override exec probe - simplify the logic for errors when multiple probes are mentioned Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>

shashwathi force-pushed the queue-probe branch from 468c0cd to 7ab8681 Compare July 16, 2019 05:25

vagababov reviewed Jul 16, 2019

View reviewed changes

joshrider and others added 2 commits July 16, 2019 09:48

use url.Parse in queue test

dc9663d

Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>

use 'probe-period' as flag in queue binary

63295fe

Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>

Add comment for using pollImmediateUntil instead of pollImmediate

8eaf076

Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>

mattmoor reviewed Jul 16, 2019

View reviewed changes

knative-prow-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 16, 2019

mattmoor added cla: yes Indicates the PR's author has signed the CLA. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: no Indicates the PR's author has not signed the CLA. labels Jul 16, 2019

knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2019

knative-prow-robot merged commit f53271a into knative:master Jul 16, 2019

nak3 mentioned this pull request Jul 17, 2019

Fix invalid helloworld example #4780

Merged

joshrider deleted the queue-probe branch August 6, 2019 15:17

joshrider mentioned this pull request Aug 6, 2019

Bubble up probe serialisation error #5074

Merged

joshrider mentioned this pull request Sep 30, 2019

Translate to tcp probe in queue-proxy when exec probe was used for readinessprobe in user-container #5712

Merged

antoineco mentioned this pull request May 27, 2020

Remove periodSeconds from examples knative/docs#2515

Merged

knative-prow-robot pushed a commit to knative/docs that referenced this pull request Jun 5, 2020

Remove periodSeconds from examples (#2515)

c70670e

Since knative/serving#4731, periodSeconds also requires failureThreshold and timeoutSeconds to be set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use user-defined readinessProbe in queue-proxy #4731

Use user-defined readinessProbe in queue-proxy #4731

joshrider commented Jul 12, 2019

googlebot commented Jul 12, 2019

knative-prow-robot left a comment

joshrider commented Jul 12, 2019

knative-metrics-robot commented Jul 12, 2019

shashwathi commented Jul 13, 2019

markusthoemmes left a comment

markusthoemmes Jul 15, 2019

joshrider Jul 15, 2019

joshrider commented Jul 15, 2019

shashwathi commented Jul 16, 2019

vagababov Jul 16, 2019

mattmoor Jul 16, 2019

mattmoor commented Jul 16, 2019

joshrider commented Jul 16, 2019

mattmoor left a comment

googlebot commented Jul 16, 2019

mattmoor commented Jul 16, 2019

knative-prow-robot commented Jul 16, 2019

Use user-defined readinessProbe in queue-proxy #4731

Use user-defined readinessProbe in queue-proxy #4731

Conversation

joshrider commented Jul 12, 2019

Proposed Changes

googlebot commented Jul 12, 2019

knative-prow-robot left a comment

Choose a reason for hiding this comment

Proposed Changes

joshrider commented Jul 12, 2019

knative-metrics-robot commented Jul 12, 2019

shashwathi commented Jul 13, 2019

markusthoemmes left a comment

Choose a reason for hiding this comment

markusthoemmes Jul 15, 2019

Choose a reason for hiding this comment

joshrider Jul 15, 2019

Choose a reason for hiding this comment

joshrider commented Jul 15, 2019

shashwathi commented Jul 16, 2019

vagababov Jul 16, 2019

Choose a reason for hiding this comment

mattmoor Jul 16, 2019

Choose a reason for hiding this comment

mattmoor commented Jul 16, 2019

joshrider commented Jul 16, 2019

mattmoor left a comment

Choose a reason for hiding this comment

googlebot commented Jul 16, 2019

mattmoor commented Jul 16, 2019

knative-prow-robot commented Jul 16, 2019