Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listener isolation or cascading behavior #2416

Closed
youngnick opened this issue Sep 21, 2023 · 11 comments
Closed

Listener isolation or cascading behavior #2416

youngnick opened this issue Sep 21, 2023 · 11 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@youngnick
Copy link
Contributor

youngnick commented Sep 21, 2023

When talking about Listener acceptance and merging behavior as part of #2288, we discovered that there was a particularly important part of the spec that we had not clarified enough, and that this has created a behavior variance between implementations.

I (@youngnick) summarized this as follows:

That concept is important because, for better or for worse, it has always been intended that the traffic-matching between Listeners and Routes is a two stage process. First you match a Listener, then you look for matching routes. In the case that there's a wildcard hostname, it was definitely not intended that a request could match both (for example, the cafe.example.com and *.example.com Listeners.) That is, having different values at all for hostname has been intended to make the Listeners distinct and unmergeable. We certainly have not done enough to explain that this is the intention, and I'm sorry for that.

That is, it was the original intention in the spec that, if there is a precise hostname match in one Listener, and a wildcard match in another, that traffic that matches the precise hostname will arrive only at Routes bound to that Listener.

Another way to put this is that if a precise hostname match is present, then a wildcard match should never also match that precise hostname.

Because we were not clear about this in the spec originally, many implementations have implemented their rules such that requests that match a precise hostname may, if there are no matching Routes for the request, match Routes attached to the wildcard hostname.

To use an example, here is a Gateway that @pleshakov supplied in #2288.

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: gateway
spec:
  gatewayClassName: nginx
  listeners:
  - name: example
    port: 80
    protocol: HTTP
    hostname: "*.example.com"
  - name: cafe-http
    port: 80
    protocol: HTTP
    hostname: "cafe.example.com"

And here are some Routes that illustrate the config that may result in cascading behavior. I've left out hostnames on the HTTPRoutes so that only the Listener hostname is relevant.

apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: coffee
spec:
  parentRefs:
  - name: gateway
    sectionName: cafe-http
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /coffee
    backendRefs:
    - name: backend
      port: 3000
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: tea
spec:
  parentRefs:
  - name: gateway
    sectionName: example 
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /tea
    backendRefs:
    - name: backend 
      port: 3000

In this example, cafe.example.com has one HTTPRoute attached, /coffee, and *.example.com has one HTTPRoute attached, /tea.

The original intent of the spec is that the only valid request using cafe.example.com is http://cafe.example.com/coffee. Any request that does not match the set of Routes attached to the cafe-http Listener should be 404ed.

However, many implementations allow the request to cascade between Listeners, which means that request to either http://cafe.example.com/coffee or http://cafe.example.com/tea will succeed.

This issue is to discuss how to proceed here.

My personal preference is to tighten up the language here, and make the originally-implied behavior explicit and compulsory, with conformance tests to back it up. I'm aware that this will be painful for a number of implementations, but I really think that this is better in the long run. This is a very subtle difference that will be very difficult to explain to end users, and I think that it will significantly decrease portability to have this be optional in an extended feature.

@youngnick youngnick added the kind/bug Categorizes issue or PR as related to a bug. label Sep 21, 2023
@shaneutt shaneutt added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 21, 2023
@shaneutt
Copy link
Member

This is a very subtle difference that will be very difficult to explain to end users, and I think that it will significantly decrease portability to have this be optional in an extended feature.

I agree that this is subtle, and fairly difficult to explain to users.

My personal preference is to tighten up the language here, and make the originally-implied behavior explicit and compulsory, with conformance tests to back it up. I'm aware that this will be painful for a number of implementations, but I really think that this is better in the long run.

While I can see the sense in it, it remains unclear to me if this would be "better in the long run". If we have a number of implementations that are responsible for building this API, that have been working on conformance and generally just about ready for GA and last minute we make this change: the implications could be similar to a sudden, just-before-release backwards incompatible change and could significantly impact the trust we depend on from our community for our project to succeed. That said, I still see the sense in it and I do think this spec change is for the better, in the long term.

It seems clear to me that we will need to discuss the options available to us at length, verify just how impactful this is (or isn't) for implementations, and find the best balance (and in some kind of hurry, with GA looming).

I think anyone with a stake in this should chime in with their thoughts and considerations, as we need to hear it all.

@robscott
Copy link
Member

robscott commented Sep 22, 2023

Met with @shaneutt and @youngnick this morning, and we agreed on the following approach here:

  1. Add a new GatewayListenerIsolation Extended feature (and conformance tests) that will describe the ideal state that we'll want everyone to work towards supporting
  2. Some new features like Client Cert validation (Adds API to GEP-91: Client Certificate Validation #2273) are likely going to require support for Listener Isolation
  3. We don’t want to leave anyone behind, but want to encourage everyone to work towards support for Listener Isolation so that we can graduate this to a Core feature in the future

@shaneutt shaneutt added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on. release-blocker MUST be completed to complete the milestone and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 22, 2023
@shaneutt shaneutt added this to the v1.0.0 milestone Sep 22, 2023
@pleshakov
Copy link
Contributor

The example in the description makes a lot of sense. As well as the proposed solution.

However, there is an edgier edge case that I find a bit troubling -- it is the case when hostnames are defined in both listeners and HTTPRoute. I also found that implementations (I tried 4) didn't enforce listener isolation for this case.

I'd like to double check how to handle that case below. Hopefully, we can also define it in a conformance test.

The example below iteratively builds upon a simple Gateway.

(1)
Gateway:

  • listener-1 with empty hostname with HTTPRoute (hostname and one path) a.b.example.com/listener-1 that references the listener in its parent ref.

Expectation:

curl http://a.b.example.com/listener-1 -> 200

Now let's add another listener to the Gateway:

(2)
Gateway:

  • listener-1 empty hostname with HTTPRoute a.b.example.com/listener-1 that references the listener in its parent ref.
  • listener-2 hostname=*.example.com

Must listener-2 now prevent routing requests to listener-1?

curl http://a.b.example.com/listener-1 -> ?

Now let's attach an HTTPRoute to listener 2:

(3)
Gateway:

  • listener-1 with empty hostname with HTTPRoute a.b.example.com/listener-1 that references the listener in its parent ref.
  • listener-2 hostname=*.example.com with HTTPRoute a.b.example.com/listener-2 that references the listener in its parent ref.

I believe listener isolation rules must work like below here:

curl http://a.b.example.com/listener-1 -> 404
curl http://a.b.example.com/listener-2 -> 200 

Now let's attach another listener:

(4)
Gateway:

  • listener-1 with empty hostname with HTTPRoute a.b.example.com/listener-1 that references the listener in its parent ref.
  • listener-2 hostname=*.example.com with HTTPRoute a.b.example.com/listener-2 that references the listener in its parent ref.
  • listener-3 hostname=*.b.example.com

Must listener-3 now prevent routing requests to both listener-1 and listener-2?

curl http://a.b.example.com/listener-1 -> ?
curl http://a.b.example.com/listener-2 -> ?

Now let's attach an HTTPRoute to listener-3:

(5)
Gateway:

  • listener-1 with empty hostname with HTTPRoute a.b.example.com/listener-1 that references the listener in its pararent ref.
  • listener-2 hostname=*.example.com with HTTPRoute a.b.example.com/listener-2 that references the listener in its pararent ref.
  • listener-3 hostname=*.b.example.com with HTTPRoute a.b.example.com/listener-3 that references the listener in its pararent ref.

I believe listener isolation rules must work like below here:

curl http://a.b.example.com/listener-1 -> 404
curl http://a.b.example.com/listener-2 -> 404 
curl http://a.b.example.com/listener-3 -> 200 

Final Thoughts

  • I ran this example on a few implementations (4 in total) and I didn't see the isolation in most of them (see Expand and clarify Listeners definition #2288 (comment))
  • In my view, the fact that the hostname field in a listener can include another listener (*.example.com includes *.a.example.com) makes isolation harder to implement.
  • It might also be confusing for users - because a new more specific listener in a Gateway can suddenly lead to 404 for HTTPRoutes attached to a less specific listener.

@robscott
Copy link
Member

/assign

@youngnick
Copy link
Contributor Author

@pleshakov thanks for the detailed write-up.

Again, we obviously haven't explained this well enough, but the intent of the hostname field in HTTPRoute is for that to be a determinant of which Listener a HTTPRoute can attach to - that hostname is really only relevant for routing in the event that the Listener hostname field is empty or wildcard.

One thing that's implied but not explicit in the hostname docs is that the * in a hostname is greedy:

	// Hostnames that are prefixed with a wildcard label (`*.`) are interpreted
	// as a suffix match. That means that a match for `*.example.com` would match
	// both `test.example.com`, and `foo.test.example.com`, but not `example.com`.

We haven't specifically said what to do when there are overlapping wildcards - but in either case, the idea is that only one of the wildcards can be matched if a hostname matches both.

We need to tighten this up as well, and have two options, either "most dots to the right of the wildcard wins" (that is, that a more specific wildcard beats a less-specific one), or that less specific wildcard wins ("least dots to the right of the wildcard wins"). I probably prefer the first, personally, which makes your deductions correct.

I don't think it's too big a deal that implementations don't do this yet - as we've never been precise or had conformance tests for this use case which is, as you say, pretty edgy.

I'm already working on another round of language updates to the Listener docs, I will include this in my todo list.

Comments on if we should have more specific (my preferred option) or less specific wildcards win welcomed.

I should also note that the significant security implications for TLS routing of cascading Listeners do mean that it's very unlikely that we will revisit the decision to move towards requiring isolated Listeners long-term, regardless of these edge-case issues.

@robscott robscott removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-blocker MUST be completed to complete the milestone labels Oct 9, 2023
@robscott
Copy link
Member

robscott commented Oct 9, 2023

With #2465 merged, we've addressed the most critical portion of this. Leaving this issue open to track any additional follow ups like conformance tests, but it is no longer a release blocker.

@pleshakov
Copy link
Contributor

I'd like to volunteer to write a conformance test for this.
@robscott based on your last comments, sounds like no new issue is needed to be created. Would it be possible to reassign this one to me?

@robscott
Copy link
Member

That's great, thanks @pleshakov!

/assign @pleshakov

@sunjayBhatia
Copy link
Member

One thing we may want to do as part of this effort is add to the changes from #2465 and add some documentation on the Route hostname spec so users are aware of the fact that all of the requested hostnames in a route may not actually match up with the Listener they are attached to (due to isolation) and may not result in actual routing rules, per the points made here: #2416 (comment)

particularly: the intent of the hostname field in HTTPRoute is for that to be a determinant of which Listener a HTTPRoute can attach to - that hostname is really only relevant for routing in the event that the Listener hostname field is empty or wildcard.

@keithmattix
Copy link
Contributor

keithmattix commented May 3, 2024

Is this issue completed now that the conformance tests pass?

@robscott
Copy link
Member

robscott commented May 3, 2024

I think we can probably close this out - we still don't have coverage for HTTPS listener isolation, but that's tracked in #2803.

@robscott robscott closed this as completed May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants