-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use user-defined readinessProbe in queue-proxy #4731
Conversation
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the ℹ️ Googlers: Go here for more info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshrider: 0 warnings.
In response to this:
Signed-off-by: Shash Reddy shashwathireddy@gmail.com
Co-authored-by: Shash Reddy shashwathireddy@gmail.comFixes #4014
Proposed Changes
- add default readiness probe to revision spec when user does not specify one
- remove HTTP and TCP readiness probes from user-container when creating deployments, instead translate them into probe performed by queue-proxy against user-container
- when user specifies an Exec readiness probe, it will stay on the user-container and the queue-proxy will perform a TCP probe against the user-container to ensure a path is open
- have the handler used by the
activator
(to check that the pod is ready) use the same readiness criteria defined by the userNOTE: for the activator's probe, we are using the same count of "successful probes" as the pod's usual readiness probe. That is, if the activator and "kubelet" are both probing concurrently and the probe's SuccessThreshold is 4, they will only need 4 consecutive successes collectively (as opposed to 4 each). Please poke holes in this.
Release Note
HTTP and TCP readinessProbes are performed by the queue-proxy against the user-container
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/test pull-knative-serving-integration-tests |
The following is the coverage report on pkg/.
|
/test pull-knative-serving-integration-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few flyby comments. I have a really hard time keeping track of what calls what, which probes go where and which retries are applied at which spots.
Do you mind drawing a picture of where we want to apply which retry? The nested retrying feels a little odd to me, maybe there's room for an interim change there as well as this PR is pretty big.
Thanks for doing this though, this is great stuff 🙂
if probeUserContainer() { | ||
// Respond with the name of the component handling the request. | ||
w.Write([]byte(queue.Name)) | ||
if prober != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in a separate PR: Is there a reason why we don't return the state from healthState
here? Seems unnecessarily redundant to probe on this path 🤔
@greghaynes do you need that for your "direct to ip" work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like an excellent suggestion. 👍
/test pull-knative-serving-smoke-tests |
Signed-off-by: Shash Reddy <shashwathireddy@gmail.com> Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>
Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>
Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>
- merge logic for knative probes and user defined probes - use probe-period as argument name - pass probe as environment variable instead of container args Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>
- Use context for timeout - do not override exec probe - simplify the logic for errors when multiple probes are mentioned Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>
@mattmoor : Addressed all your comments. Ready for another review 👍 |
// started as early as possible while still wanting to give the container some breathing | ||
// room to get up and running. | ||
timeoutErr := wait.PollImmediate(25*time.Millisecond, timeout, func() (bool, error) { | ||
timeoutErr := wait.PollImmediateUntil(aggressivePollInterval, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I know Matt suggested it I liked the previous version more, it's shorter :)
🤷♀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What'd I suggest?
Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>
Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>
I think things largely look good. Going to give others a chance to leave comments, but if nothing comes up I'll do a final pass later so we can get this baking. It may be worth checking out the data race failure above, since this PR touches the queue logic. thanks for all the work leading up to this! |
Sounds good. Neither of us have been able to recreate that data race locally. Would be curious to hear if someone else knows how it happened. /test pull-knative-serving-unit-tests |
Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
🎉
A Googler has manually verified that the CLAs look good. (Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.) ℹ️ Googlers: Go here for more info. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: joshrider, mattmoor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This patch makes a tiny fix which removes invalid setting in configuration example. After knative#4731, `periodSeconds` needs to be set with `failureThreshold` and `timeoutSeconds`. This patch simply removes `periodSeconds` from the config.
This patch makes a tiny fix which removes invalid setting in configuration example. After #4731, `periodSeconds` needs to be set with `failureThreshold` and `timeoutSeconds`. This patch simply removes `periodSeconds` from the config.
Since knative/serving#4731, periodSeconds also requires failureThreshold and timeoutSeconds to be set.
Signed-off-by: Shash Reddy shashwathireddy@gmail.com
Co-authored-by: Shash Reddy shashwathireddy@gmail.com
Fixes #4014
Proposed Changes
activator
(to check that the pod is ready) use the same readiness criteria defined by the userNOTE: for the activator's probe, we are using the same count of "successful probes" as the pod's usual readiness probe. That is, if the activator and "kubelet" are both probing concurrently and the probe's SuccessThreshold is 4, they will only need 4 consecutive successes collectively (as opposed to 4 each). Please poke holes in this.
Release Note