Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid helloworld example #4780

Merged
merged 1 commit into from
Sep 29, 2019

Conversation

nak3
Copy link
Contributor

@nak3 nak3 commented Jul 17, 2019

This patch makes a tiny fix which removes invalid setting in
configuration example.

After #4731, periodSeconds
needs to be set with failureThreshold and timeoutSeconds. This
patch simply removes periodSeconds from the config.

Current error:

$ ko apply -f test/test_images/helloworld/helloworld.yaml 
...
2019/07/17 17:58:13 Published gcr.io/gcp-compute-engine-223401/helloworld-edca531b677458dd5cb687926757a480@sha256:88311e84da104e258959a2417f067a81009065aad93bab2240bfc79969e94056
route.serving.knative.dev/route-example configured
Error from server (InternalError): error when creating "STDIN": Internal error occurred: admission webhook "webhook.serving.knative.dev" denied the request: mutation failed: expected 1 <= 0 <= 2147483647: spec.template.spec.containers[0].readinessProbe.failureThreshold, spec.template.spec.containers[0].readinessProbe.timeoutSeconds
2019/07/17 17:58:14 error executing 'kubectl apply': exit status 1

Proposed Changes

  • Remove periodSeconds from config.

Release Note

NONE

This patch makes a tiny fix which removes invalid setting in
configuration example.

After knative#4731, `periodSeconds`
needs to be set with `failureThreshold` and `timeoutSeconds`. This
patch simply removes `periodSeconds` from the config.
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jul 17, 2019
@knative-prow-robot knative-prow-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 17, 2019
@knative-prow-robot
Copy link
Contributor

Hi @nak3. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/test-and-release It flags unit/e2e/conformance/perf test issues for product features labels Jul 17, 2019
@markusthoemmes
Copy link
Contributor

/ok-to-test

That sounds like a breaking change though! Should we revisit that logic and make sure we apply defaulting instead of potentially breaking users?

@knative-prow-robot knative-prow-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 17, 2019
@joshrider
Copy link
Contributor

I think it makes sense to apply k8s's probe defaults for TimeoutSeconds and FailureThreshold in the cases where PeriodSeconds != 0.

@adrcunha adrcunha requested review from joshrider and removed request for adrcunha July 17, 2019 14:31
@nak3
Copy link
Contributor Author

nak3 commented Jul 17, 2019

Please allow me to confirm. Not only PeriodSeconds != 0 case, but also #4731 had some other potentially breaking users.
For example, following configs do not work anymore.

            readinessProbe:
              httpGet:
                path: /
              timeoutSeconds: 1
            readinessProbe:
              httpGet:
                path: /
              failureThreshold: 1
            readinessProbe:
              httpGet:
                path: /
              failureThreshold: 1
              timeoutSeconds: 1

(Below may not need defaulting, but I think there are many users had port in probe and it got broken.)

            readinessProbe:
              httpGet:
	        port: 80
                path: /

Should we apply defaulting to cover all of them?

@joshrider
Copy link
Contributor

joshrider commented Jul 17, 2019

In the cases where failureThreshold or timeoutSeconds are greater than 0, but periodSeconds is unset, I could see assuming that the user wants a standard k8s-style probe and setting the normal k8s defaults for whichever of periodSeconds, failureThreshold, and timeoutSeconds are unset.

I guess we'd then reserve our aggressive probe for when periodSeconds, failureThreshold, and timeoutSeconds are all 0.

Thoughts? @mattmoor @dgerd @markusthoemmes

Also, did the 3rd example work before? I thought we always disallowed port.

@mattmoor
Copy link
Member

I think my concern is that specifying these are a subtle way of degrading your cold start time.

I feel like erroring our and simply requiring a value for periodSeconds seemed reasonable. Happy to discuss more, but that's my $0.02

@dgerd
Copy link

dgerd commented Jul 17, 2019

So K8s defaulting behavior is:

  • periodSeconds: Default 10, Min 1
  • timeoutSeconds: Default 1, Min 1
  • failureThreshold: Default 3, Min 1
  • successThreshold: Default 1, Min 1

In #4731 we allow 0 for period seconds and changed the Knative periodSeconds default to 0. Given that the K8s default for timeoutSeconds and failureThresholdare not compatible with our newperiodSeconds` default there is no way to get around this change being potentially breaking.

Given where we are in the stage of our project, I would like to make sure that our top goal here is to select behavior that is unsurprising to customers specifying probes. Secondary to this goal we should minimize impact to users today.

@nak3
Copy link
Contributor Author

nak3 commented Jul 22, 2019

In the cases where failureThreshold or timeoutSeconds are greater than 0, but periodSeconds is unset, I could see assuming that the user wants a standard k8s-style probe and setting the normal k8s defaults for whichever of periodSeconds, failureThreshold, and timeoutSeconds are unset.

I see. I have opened for the PR here #4864

Also, did the 3rd example work before? I thought we always disallowed port.

Sorry, it was my mistake!

@dgerd
Copy link

dgerd commented Jul 24, 2019

To facilitate a better discussion today at the API WG adding a few options here to this issue with an attempt to lay out the pros/cons.

Option 1 - Dynamic defaults

This is the change in #4864. If we set fields that are incompatible with our default probe then assume a user wants a regular probe and adjust.

Pros:

  • Enables current behavior of provide least amount of information necessary (i.e. only specify failureThreshold)
  • Provides most intelligence around interpreting intent
  • Most K8s probes get translated over correctly to Knative

Cons:

  • Changing one setting has a side-effect on other settings (can cascade if we do this behavior elsewhere)
  • More difficult to document (Cannot just document min, max, default)

Option 2 - Bimodal defaults (What we do in HEAD)

If periodSeconds == 0 then we have one set of defaults. If periodSeconds != 0 then we have another set of defaults. This is similar to Option 1, but trying to make less assumption about user intent.

Pros:

  • K8s style probes that specify periodSeconds work without breakage
  • Easy to document in a 2 column table

Cons:

  • Requires users to always set periodSeconds to use kubernetes style probes
  • Changing periodSeconds has a side-effect on other settings (can cascade if we do this behavior elsewhere)

Option 3 - Unspecified is special (not defaulted)

This option is to take a step back on our defaulting behavior. Instead of periodSeconds being defaulted to 0 and shown in the API we interpret an unspecified probe to mean full defaults on readinessProbe. With this users cannot customize successThreshold or other settings as otherwise we would slip back into the two cases above. :(

Pros:

  • Keeps API very similar to previous releases

Cons:

  • Less visibility of probing in our API
  • Less customizable probing and undoes some of our work on improving probing here

Option 4 - Introduce new fields

See kubernetes/kubernetes#76951

I would not recommend this unless we can eventually get the behavior in upstream K8s, but we could add periodMilliseconds or similar here instead of extending the interpretation of exiting fields.

Pros:

  • Keeps us synced with upstream K8s
  • Can eventually remove our queue-proxy probe

Cons:

  • Uncertainty of reception or feasibility
  • Will have to maintain fork until our client catches up to upstream K8s

@dgerd
Copy link

dgerd commented Jul 26, 2019

@joshrider @nak3 @shashwathi @markusthoemmes

Wanted to get your thoughts on taking a step back in functionality and proceeding with the following:

  1. Only unspecified readinessProbes means "probe aggressively/sub-second" with system defined defaults. (i.e. our spec does not show '0' for periodSeconds nor does it allow it)
  2. User defined probes continue to be passed through as added in Use user-defined readinessProbe in queue-proxy #4731
  3. If/When Allow probes to run on a more granular timer. kubernetes/kubernetes#76951 lands switch to using that instead of our own implementation

I believe that takes us a step back in:

  1. Sub-second probes are not "visible" in the API (We could potentially address this in other ways)
  2. successThreshold cannot be customized for sub-second probes. (How important is this do you guys think?)

It moves us forward in that:

  1. We break none of the examples that @nak3 posted above
  2. Migration to a K8s defined sub-second probe will be easier

This keeps the value of:

  1. All probes get re-written through the queue-proxy
  2. Exec probe in queue-proxy still allows more granular probing of user container

Anything else missing here?

@joshrider
Copy link
Contributor

I think that covers most of it.

One thing I'd add to the "takes us a step back" pile is that we lose the option of using an httpGet instead of tcpSocket in the sub-second probe. I'm not sure how important being able to customise successThreshold on a sub-second probe will be, but maybe having the option to pick between TCP/HTTP is more relevant?

Overall, I don't have much of a lean either way.

With regard to rolling back some of the functionality, I can see value in not breaking anyone's existing stuff. If minimising surprises for users is a top priority, having a user's probe simulate K8S's semantics seems desirable. There didn't seem to be much enthusiasm for kubernetes/kubernetes#76951, so I'm unsure of how much value we get out of smoothing the way for an eventual adoption of a sub-second K8S probe.

On the side of leaving things mostly intact, I can appreciate wanting to trim down startup time wherever possible.

In summation, 🤷‍♂️

@dgerd
Copy link

dgerd commented Jul 30, 2019

/cc @mattmoor

@mattmoor
Copy link
Member

Only unspecified readinessProbes means "probe aggressively/sub-second"

This means we can't support aggressively probing with httpGet, which feels problematic.

@markusthoemmes
Copy link
Contributor

@dgerd did we have any closure on the decisionmaking above? Might've fallen through the cracks or I've just forgotten the resolution.

@markusthoemmes
Copy link
Contributor

I vaguely recall something like: Well it's a very small break and we'd rather take this breaking change than trying to work around it at this point? Seems like we should get a closure on this pre v1.

@markusthoemmes
Copy link
Contributor

/assign @dgerd
/assign @mattmoor

@markusthoemmes
Copy link
Contributor

@dgerd Pingeroo.

@dgerd
Copy link

dgerd commented Sep 23, 2019

We decided to take Option #2 with improved error messages.

That said this change looks good.

/approve
/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 23, 2019
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgerd, nak3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 23, 2019
@knative-test-reporter-robot

The following jobs failed:

Test name Triggers Retries
pull-knative-serving-unit-tests pull-knative-serving-unit-tests 1/3

Automatically retrying due to test flakiness...
/test pull-knative-serving-unit-tests

@nak3
Copy link
Contributor Author

nak3 commented Sep 29, 2019

/test pull-knative-serving-unit-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test-and-release It flags unit/e2e/conformance/perf test issues for product features cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants