-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Route handling of invalid BackendRefs needs clarification #1112
Comments
So, just to chime in, overall the part of the spec that gives me some of the biggest headaches is how route validation is reflected in statuses. You pointed out a number of areas where something about the route status, as specified in the spec, seems to need to be propagated onto the Gateway's ListenerStatus. For example, the current usage for
But as you point out other areas of the spec (i.e. under The notion overall that the ListenerStatus itself should ever reflect the status of underlying routes that it has successfully bound doesn't really make sense to me. Why? Because the Listener itself isn't bad, the Route is, and its status should be updated accordingly. The only time I can see that propagating something to a ListenerStatus based on a Route's binding makes sense is if somehow the underlying behavior of the Listener can't fully be determined and thus the entire thing is in an invalid state. For example, in our current implementation of TCPRoute binding in the Consul API Gateway we consider a TCP-based listener invalid if it has more than a single route bound to it due to our inability to currently load-balance between multiple TCP backends. |
This is great content @mikemorris and @andrewstucki!
I think this is an artifact of the v1alpha1 API where listeners selected Routes. While I think it could potentially be helpful for a Gateway owner to understand that something is attempting to attach to it but can't, it seems like the primary way to communicate that failure in status should be on the route. I like a lot of the suggestion above for addition to status. Historically we have not required GEPs for new reasons being added to status, I don't think we have any precedent for independent changes that introduced new conditions, not sure what to do there. Maybe a PR that introduces the reasons suggested above would be a good starting point, and we could separately discuss the idea of a "Ready" and/or "Programmed" condition? |
This is an outstanding summary, thanks @mikemorris! Reading through this, there are a few places where I have assumptions that aren't written down. I may have spoke about these before and assumed they're in the spec, happy to talk about if they're valid assumptions. I'll go through the questions in the order I read them. "not accepted"In general, I think we've tried to have the default status behavior be that if things aren't a valid reference, the referred object is not updated. So, we don't update the ListenerStatus with the names of Routes that don't attach. However, I agree that I think that, as Rob said, a lot of the weirdness here is us not coming back and updating this after the switch to route-initiated attachment. "not found"I think that having zero backends inside otherwise-valid backendRefs should not mean that the backendRef is not found. The backendRef is there, but it's empty. If we want to reflect that in the Route status, it should come under The assumption I had here, that I think is not documented, is that the rules for backendRefs are:
Note that these two rules also require the "zero endpoints doesn't make things unattached" rule. However, there's some interaction with the backendRefs field docs in that set of rules - should that empty config produce 503s? I'm inclined to say no, personally. Maybe a better way to summarize this is "Routes that generate some config in the underlying data plane must attach, and routes that generate no config in the underlying data plane must not."? That would cover the 503 issue, because it's talking about routes that have at least some config. If we agree here, this absolutely needs better documentation, since it's plainly not present now. "not permitted"I agree that this should behave the same as the "not found" case in respect to partial or no validity. This does mean that a Route that loses its ReferencePolicy will flip to not attached, but that seems desirable to me. "invalid" and "not ready yet"I agree with all of this. I think the next step here is for everyone to read what I think should happen and see if we agree. If so, we should move to a PR. |
👍 , this makes sense to me too. I added a few specific TODO items to the "not found" section for new RouteConditionReasons to help communicate these rejection cases. We should additionally still clarify if accepted routes should affect listener status, but this at least narrows the scope a bit.
Alright - I think this makes sense but will likely require a docs update to remove the mention of
👍 LGTM
I'd like to hear from additional implementations on this - I could see a plausible use case for an operator or team wanting to configure routing rules before actually deploying a new or updated backend service.
@youngnick What I'd like to avoid (expanded on in #1143) is the case where revoking a ReferencePolicy permitting one backend reference on a route with several would have a large blast radius, potentially dropping a large amount of permitted traffic instead of only rejecting the newly-unpermitted traffic. I'm uncertain if revoking the ReferencePolicy for the only backend reference on a route should be handled differently and reject/detach it (similar use case to the "not found" zero valid backend refs case). |
- fixes path prefix match YAML - removes check for HTTPRoute to be Accepted pending kubernetes-sigs#1112 Signed-off-by: Steve Kriss <krisss@vmware.com>
I still feel pretty strongly that the guideline of "no config in the data plane means not accepted" is the easiest one to both understand and implement. But interested to talk more about it. By that logic, I agree that the test in #1143 makes sense, and also agree that revoking the ReferencePolicy for a BackendRef that is the only target in a Route should mean that the Route is no longer Accepted. That is, if revoking a ReferencePolicy means there's no config to add, the Route is not Accepted. |
I'm trying to figure out how exactly the statuses of backendRefs influence the status of parentRefs. Let's see if I understand:
If I were a user I would find it confusing if a parentRef reported RefNotPermitted when the actual parentRef was OK but a backendRef wasn't. The HTTPRouteInvalidReferenceGrant conformance test is an example of this, I think. I think that the implication of this would be that a Route shouldn't attempt to connect to a parent until it has at least one valid backendRef. Maybe if a Route had no valid backendRefs it could report the parentRef status as something like RefNotAttempted? |
I like this idea. |
I think that isn't quite what this is intended to express. RouteParentStatus is intended to describe the status of the route, from the route's perspective - the relation to ParentRef is mostly to contextualize conditions when a route targets multiple parents, such as setting Whether to bubble this status up to a status field on the parent itself is a separate question, as described in #1112 (comment) (where it probably does not make sense to populate
I believe this would be a separate logical change, and #1211 addresses some of the implications (and implementation challenges) of this in more detail. [Oops, didn't mean to close.] |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
I think most of these questions have been clarified now in GEP-1364, and we should use this issue to track updating the documentation to reflect those decisions and add this to the v0.6.0 milestone. Issues I had been tracking as related to clarifying this behavior:
/remove-lifecycle stale |
I've reviewed those issues and PRs @mikemorris was tracking, thanks Mike. I think we've largely got this handled, and that invalid backendRefs are pretty well clarified now. The last remaining piece is #1143 - which I think should absolutely be updated, but we should also add or update tests to ensure that partial validity is checked in other cases too. |
With the closing of #1143, I think we can call this one done (yay!) /close |
@youngnick: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What would you like to be added:
Add documentation, possibly a new page in the Reference section of the website, and additional
RouteConditionReason
andListenerConditionReason
types as needed to clarify intended handling of invalid BackendRefs configurations. This is a bit more complicated than just #1109, as the implications of initially applying or resolving changes to a Route's BackendRefs or a ReferencePolicy may require triggering actions or settings statuses across several objects.Why this is needed:
There are a few distinct cases by which a
BackendRef
could be or become invalid, and I think it may be helpful to define and use more specific terms than "invalid" in the spec, as well as centralize and specify the intended behavior for each case more explicitly.One specific difficulty is the lack of specificity around when a Route
Accepted
condition should be set, and whether anAccepted { status: false, reason: Invalid }
condition should be set in any of these cases (and/or reflected in aListenerConditionType
status) given the lack of granularity in indicating status per BackendRef versus the status of the Route or Listener as a whole.It's also slightly unclear if
attachedRoutes
onListenerStatus
is intended to communicate the same thing as a route having anAccepted { status: true, reason: Accepted }
route parent status condition referencing that listener because of the difference between the words used ("attached" vs "accepted", refs #1111)."not accepted"
This case is when the Route has not been accepted by a Gateway Listener. The
HTTPRouteInvalidCrossNamespace
conformance test illustrates one example where this is an intended outcome, but it's not clear how this status should be communicated. The test suggests that no RouteParentStatus should be set on the Route because, having been rejected, it does not have an actual parent, only an intended parent - setting anAccepted { status: False }
condition (what RouteConditionReason?) could be an alternative way to communicate this status though.RouteParentStatus conditions
Accepted { status: False }
Listener handling
allowedRoutes
should be rejected by the GatewayController.InvalidRouteKinds
case, but I'm guessing that no similar status should be set due to cross-namespace security concerns because a Route in another namespace shouldn't be able to have any impact on a Gateway to which it is not allowed to bind.ListenerStatus
attachedRoutes
should not count this RouteReady { status: True, reason: Ready }
TODO:
HTTPRouteInvalidCrossNamespace
conformance test to check for anAccepted {status: False}
condition. This feels reasonable enough because even though the route was not accepted by the gateway (and therefore shouldn't be able to affect the gateway in any way), there should be a definitive controller which evaluated and rejected the route which has sufficient privileges to manage a route status appropriately.InvalidRouteKind
RouteConditionReason and update theHTTPRouteDisallowedKind
conformance test for communicating the case where a route is rejected because its kind is not specified in theallowedRoutes.kinds
field of the listener it is targeting.InvalidRouteNamespace
RouteConditionReason and conformance test for communicating the cases (Selector
,Same
) where a route is rejected because it lives in a namespace that is not allowed by theallowedRoutes.namespaces
field of the listener it is targeting.InvalidRouteHostname
RouteConditionReason and update theHTTPRouteListenerHostnameMatching
conformance test or add a new one for communicating the case where a route is rejected because it does not have at least one intersecting hostname with the listener it is targeting."unpermitted"
This case is when a Route has a BackendRef to an object in another namespace, where the object in the other namespace does not have a ReferencePolicy explicitly allowing the reference. This could reasonably be expected to be checked when the Route is created, and should be clarified to note that it should additionally be checked whenever the Route is modified or a ReferencePolicy with a
to
field resolving to a BackendRef of the route is deleted, modified, or created.This should be expected to be dynamic - revoking a ReferencePolicy should stop routing traffic to a BackendRef.
RouteParentStatus conditions
Accepted { status: True }
ResolvedRefs { status: False, reason: RefNotPermitted }
Ready { status: False, reason: Invalid }
[Not Implemented]Listener handling
As described in
HTTPBackendRef
This wording is likewise a bit confusing in that it only describes how each specific BackendRef should be handled, and not whether this should impact if the entire Route is determined to be "invalid". Notably, this case omits any mention of "dropping" anything from a Gateway as the "not found" case describes.
Should likely implement the same behavior described above in the "not found" section for sending HTTP 503 responses for invalid unpermitted BackendRefs.
ListenerStatus
attachedRoutes
should count this Route as long as it has RouteConditionTypeAccepted { status: True }
ResolvedRefs { status: False, reason: RefNotPermitted }
Ready { status: False, reason: Invalid }
if any single BackendRef on an accepted route is not permitted?TODO:
HTTPRouteInvalidCrossNamespaceBackendRef
conformance test to assert that a route with zero permitted backends:Accepted {status: True}
orAccepted {status: False}
route conditionattachedRoutes
ResolvedRefs {status: False, reason: ReasonNotPermitted}
listener condition if route issues of accepted routes should "bubble up" to listeners"not found"
This case is when the Route has at least one BackendRef that does not exist or cannot be resolved. This could reasonably be expected to be checked when the Route is created, but unclear if/when state changes could cause this to need to be updated.
UNRESOLVED:
RouteParentStatus conditions
ResolvedRefs { status: False, reason: InvalidBackendRef }
Listener handling
As described in
HTTPBackendRef
:The part that's a bit confusing with this wording is that it only describes how each specific BackendRef should be handled, and not whether this should impact if the entire Route is determined to be "invalid". The term "dropped" seems to imply active management of some object directly related to a Gateway (as opposed to the underlying implementation), which seems to be overly broad behavior if inferred to mean removing an entire Route from a Listener if one BackendRef became invalid.
The
backendRefs
field on HTTPRouteRule additionally states:(cobbling this information together from disjointed sources is an example of the difficulty described in docs: API specification information architecture obscures important content #1031)
ListenerStatus
attachedRoutes
should count this Route as long as it has RouteConditionTypeAccepted { status: True }
ResolvedRefs { status: False, reason: InvalidBackendRef }
InvalidBackendRef
ListenerConditionReason
Ready { status: False, reason: Invalid }
if any single BackendRef on an accepted route is not found?TODO:
InvalidBackendRef
RouteConditionReason, with a description similar to theInvalidCertificateRef
ListenerConditionReason"invalid"
This case is when the Route is syntactically or semantically invalid.
NOTE:
RouteParentStatus conditions
Accepted { status: False, reason: Invalid | InvalidParameters }
Listener handling
ListenerStatus
attachedRoutes
should not count this RouteReady { status: True, reason: Ready }
"not ready yet"
This case is when the Route is syntactically or semantically valid, all BackendRefs exist, can be resolved, and are permitted by a ReferencePolicy as applicable, and the route has been accepted by the GatewayController, but the underlying implementation is not fully configured or routing traffic yet. Expected to be a transient state.
NOTE:
RouteParentStatus conditions
Accepted { status: True }
Ready { status: False, reason: Pending }
[Not Implemented]Listener handling
ListenerStatus
attachedRoutes
should count this RouteReady { status: True, reason: Ready }
The text was updated successfully, but these errors were encountered: