Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update HTTPRoute and other status definitions and documentation #1077

Closed
youngnick opened this issue Mar 28, 2022 · 20 comments
Closed

Update HTTPRoute and other status definitions and documentation #1077

youngnick opened this issue Mar 28, 2022 · 20 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@youngnick
Copy link
Contributor

What would you like to be added:

In implementing HTTPRoute and Gateway, implementors have found things that we've missed or that are not clear in the definitions of how status should be updated.

There are some condition types that need to be added, some edge cases that need to be addressed, and so on.

This issue covers doing a pass over this area before we finalize v1beta1.

This top-level comment will be updated with a checklist of tasks as we find them.

Why this is needed:
Having a clear explanation of the status flow is vital to making sure the user experience for Gateway API is preserved.

@youngnick youngnick added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 28, 2022
@youngnick youngnick added this to the v1beta1 milestone Mar 28, 2022
@youngnick
Copy link
Contributor Author

/assign

@youngnick
Copy link
Contributor Author

Candidates for tasks so far:

  • Add a Ready condition type for HTTPRoute, with explanation of what Ready means. This has conformance test implications.
  • Add a NotImplemented or NotSupported condition type for all resources so that there's another place to hold more information about why something was not accepted, with explanations of what that means.

@youngnick
Copy link
Contributor Author

Need to fold feedback from #935 in here as well.

@evankanderson
Copy link
Contributor

There's also feedback in #1080 (comment) (sorry @nathancoleman for diverting your PR).

I'd also like to specify some update semantics of the Gateway API -- in particular, it would be nice if Gateway API consumers could update from one "working" (not serving 500s) configuration to another without serving 500s in between. IIRC, at least some Envoy-based implementations may update the RouteConfiguration before updating the Cluster configuration, leading to routes which point to not-yet defined clusters. It would be nice if the Gateway API and conformance tests could make it clear that this is not a correct implementation. (We work around this in Knative by creating a dummy route with the next set of Clusters, and probe that route until it has live backends, but it would be better if this could be handled by the Gateway implementation).

cc @dprotaso @carlisia

@youngnick
Copy link
Contributor Author

We discussed this in our community meeting today and agreed that:

  • Many implementations (not just Envoy-based ones) can't guarantee that the configuration requested has made it to the entire data plane, so a strict Ready condition (as @evankanderson reasonably asked for on Support eventual consistency in conformance tests #1080) is not possible for them.
  • Because of this, we can't mandate a Ready condition, because people will bring the expectation of strict behavior because of the name.
  • We're going to work on another condition type (proposed names: Configured, Programmed or similar, there was the usual naming discussion, we'll need to bikeshed discuss it some more. :) ) that indicates that the the controller has finished working and that the data plane should be ready "soon".
  • We also agreed that we should say what a Ready condition implies, and that implementations may use one, but if so, it must mean that the presence of Ready means that all configuration is present on the data plane.

I agree that in principle the updating from one working config to another is very desirable. I think that ensuring that the ordering happens in the correct order is not straightforward, but we should be working towards it. I'm reluctant to mandate a behavior like this without a pretty clear conformance test to check it, which seems like it will be tricky to write.

@evankanderson
Copy link
Contributor

+1 on not mandating any behavior that doesn't have a conformance test.

What we've done in Knative is to set up a "load test" (e.g. 20 rps GET requests to the destination) while updating the routes, and checking that all returned results are 200 and not 500. Obviously, there are many different permutations here (replace existing route, add new more-specific route, update certain routing parameters, update the actual Service definition), but it seems like it should be possible to enumerate some of the known-tricky cases and test those.

@robscott
Copy link
Member

I think #1110, #1111, and #1112 by @mikemorris significantly overlap with this issue. Not sure if there's room for consolidation, just wanted to make sure these were all linked together.

@youngnick
Copy link
Contributor Author

Agreed, I think that @mikemorris' issues handle a large portion of the things I was thinking about when I raised this.

Interested to hear what everyone thinks about the idea of a generic NotImplemented or Unsupported Condition that could be used across any resource to indicate that something you're doing is unsupported?

@evankanderson
Copy link
Contributor

I'm skeptical that a generic Unsupported Condition will always work, assuming that fields may be added to the API that implementations may not be aware of. I suppose implementations could be conservative here and check for any spec fields that they don't recognize, and report them as Unsupported even if they contain a default value that matches the existing behavior.

@robscott
Copy link
Member

@evankanderson I agree that it would be nearly impossible for an implementation to recognize configuration they are not aware of and populate a "Unsupported" or "NotImplemented" condition based on that. I do think there's still value in having a condition like that though. This API includes many "extended" features that will not be implementable by everyone. That means it would be very useful for implementations to be able to communicate that there is configuration they are aware of in a Gateway or Route but have not implemented.

@skriss
Copy link
Contributor

skriss commented May 2, 2022

One specific request for HTTPRoute status additions: Listeners have a possible condition of "Ready: false" with reason "Invalid" to generically cover syntactically/semantically incorrect Listeners. It would be nice to have something similar for routes (could be "Accepted: false" with reason "Invalid"). The specific case I ran into recently for this is if syntactically invalid hostnames make it to the controller (theoretically this should be prevented by kubebuilder/webhook validation, but we can't rely on that) -- there are no obviously appropriate condition types/reasons to use for something like this. I can imagine other situations related to invalid matches/filters/etc. that I would want to use this for as well.

@youngnick youngnick removed their assignment May 2, 2022
@mikemorris
Copy link
Contributor

@skriss I added a few suggested RouteConditionReasons to the "not accepted" section of #1112 - InvalidRouteHostname sounds like it could reasonably cover your use case in addition to the listener hostname mismatch.

@carlisia
Copy link
Contributor

carlisia commented May 10, 2022

I extracted the case for a "Ready" Route condition into a new issue, as requested in a recent meeting:

Indicate a new Route "Readiness" condition.

@youngnick
Copy link
Contributor Author

Moving this out of the v0.5.0 milestone, since we've hit the parts of this issue required for that release.

@youngnick youngnick modified the milestones: v0.5.0, v0.6.0 May 19, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2022
@youngnick
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 18, 2022
@youngnick
Copy link
Contributor Author

Doing this as part of #1364.

/assign

@mikemorris
Copy link
Contributor

Re-read this and it feels like #1364 covers all the points in this sufficiently that this could likely be closed as a separate issue now.

@youngnick
Copy link
Contributor Author

I agree that this one is covered by the other issues, with followup work covered under #1364. Closing.

/close

@k8s-ci-robot
Copy link
Contributor

@youngnick: Closing this issue.

In response to this:

I agree that this one is covered by the other issues, with followup work covered under #1364. Closing.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

8 participants