Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client side ALPN support (upstream cluster) #3431

Closed
mandarjog opened this issue May 17, 2018 · 27 comments · Fixed by #13922
Closed

Client side ALPN support (upstream cluster) #3431

mandarjog opened this issue May 17, 2018 · 27 comments · Fixed by #13922
Assignees
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!

Comments

@mandarjog
Copy link
Contributor

Title: Envoy should support client side protocol negotiation via alpn

Description:
At present if the an upstream cluster has http2_protocolOptions, it will always send h2 traffic upstream. When used with ALPN = {"h2", "http/1.1"} the expectation is that it will be negotiated down to http/1.1if the server only supporthttp/1.1` I don't think this is happening.

We need this in Istio so that we can set "ALPN" options on all outbound tls connections and upgrade.

We also need to have http/1.1 --> h2c, there is a separate issue #1502 filed for this.

@mandarjog
Copy link
Contributor Author

@PiotrSikora

@mattklein123 mattklein123 added enhancement Feature requests. Not bugs or questions. help wanted Needs help! labels May 17, 2018
@mattklein123
Copy link
Member

Optimally, this would be done as part of also solving #1502. The right solution here is to have a new connection pool implementation that can do both HTTP/1.1, HTTP/2, ALPN negotiation, upgrade, etc. I would prefer to not complicate the existing connection pool implementations with this logic.

@incfly
Copy link
Contributor

incfly commented Jul 22, 2019

I did some study on this bug.

Using new connection pool for http1/http2 protocol selection with ALPN information seems feasible to me. High level things to do

ALPN upgrade

  1. Add a new connection pool impl, maybe HttpAutoConnectionPool. This impl is supposed to maintain list of the both http1 and http2 codec client, and auto select based on alpn value.
  2. We use include::envoy::http::protocol
    to dispatch the conn pool. We can add another one, Http1And2 to indicate the auto selection under h2c upgrade and ALPN.
  3. Plumb the ClientConnectionImpl::nextProtocol information with ClusterManagerImpl when selecting http conn pool.

H2C upgrade

I don't find any Envoy own implementation on HTTP its own auto-upgrade, h2c. I assume the h2 auto upgrade over plaintext has to be implemented, and then ingtegrated with this new HttpAutoConnectionPool.

H2C upgrade part I'm not that familiar. I hope we can focus on implementing ALPN part first, and the proposed changes structurally fit the enhancement later.

@PiotrSikora @mattklein123
Let me know what you think, thanks!

@mattklein123
Copy link
Member

@incfly this is on the right track.

For H2C upgrade, nghttp2 supports this internally to the library. See https://nghttp2.org/documentation/nghttp2_session_upgrade2.html#c.nghttp2_session_upgrade2. This can then be coupled with an HTTP1 codec to start the process, which then switches over to an HTTP2 codec after the upgrade. There will need to be some work/thought put into how this is done, most likely by having specific h2 codec constructors that are meant for upgrade scenarios.

@incfly
Copy link
Contributor

incfly commented Jul 30, 2019

I've started spending some time on this FR and @mattklein123 maybe we can update assignees to avoid potential duplicate work?

One question, what will be a feature guarding mechanism recommended to control this pool selection behavior change? A flag or a field under api/cluster or something else?

@mattklein123
Copy link
Member

@incfly I assigned the issue to you. In terms of configuration, yeah, I think we need to look at some config on the cluster, similar to how we configure http/1 vs. http/2 today. TBH I'm not thrilled with how this is all configured today, but we can clean this up in v3 cc @htuch. I would take a stab at something and we can talk about it in CR. Thank you!

@incfly
Copy link
Contributor

incfly commented Aug 9, 2019

Right now the code is organized as

  1. We create codeclient, deciding it's http1 or http2
  2. Within that codeclient constructor, we establish networking level connection.
  3. Later on when network connection and transport socket is ready, only by then we know the negotiated ALPN value, but it's too late. Codec is already chosen in step 1.

Appears to me we can defer the codec creation in onConnectionEvent, by then we'll have enough information about ALPN.

I guess, this won't be an issue, HTTP req/resp can't make any progress untill transport socket is ready.

This might require some refactoring, and http_integration.cc test seem a good qua-lifer to see if anything break.

@incfly
Copy link
Contributor

incfly commented Aug 22, 2019

A very very rough POC can be found at https://github.com/envoyproxy/envoy/compare/master...incfly:httpx-pool?expand=1
Mostly this httpx connection pool go through upgrade() in case of connected && ALPN == H2.

It'll be great to get some quick feedback so that I tell it's right direct to continue polishing it.

This POC duplicate lots of code from both http1 and http2 connection pool, which can cause some maintainece overhead for future changes on http1 and http2 implementation. But to certain extend, this is expected, since we don't want to mess up and modify on existing http1 connection pool. And overtime, we can deprecate HTTP1 connection pool.

The upgrade() is invoked in the case of connection event and ALPN is negotiated as "h2".

With this change, I set up a client and server envoy, for /http1 routes to a cluster without any alpn configured, for /http2 routes to alpn h2 configured. Then I can verify from server side stats, the downstream_rq_http2 and http1 metrics increment correspondingly. (client side still has some to be done work to get stats working)

@mattklein123
Copy link
Member

@incfly at a very high level this looks like it's on the right track, but I feel that we need to reconcile the code duplication, especially as we already have this issue in the context of #7852 cc @conqerAtapple. I would like to see a plan to come up with a base class that can be used across all of the implementations with potentially just the callouts needed to do the upgrade?

@mooperd
Copy link

mooperd commented Nov 20, 2019

+1 - Could we have an update on this?

@eddiewang
Copy link

eddiewang commented Nov 21, 2019

+1 spent hours tracing this down, but on the new Istio 1.4 around half of my external services stopped being accessible, here's the comment I wrote in the istio forum:

i can confirm that with a downgrade to istio v1.3.5 everything works as expected again. I was trying an upgrade to v1.4 so it was using whatever Envoy that version of Istio’s mesh uses. Thanks for tracing down this issue, very interesting bug because around half of my services were still working (probably because they supported HTTP2).

Drove me crazy because HTTP still worked, and I thought it was a mismanaged cert or an issue with istio’s SDS service.

@mooperd
Copy link

mooperd commented Nov 24, 2019

@eddiewang I don't think this is your specific problem.

@snowp
Copy link
Contributor

snowp commented Apr 15, 2020

@incfly Is this something you're still planning on working on?

Are we still thinking that a new connection pool is the right way to handle this? Not sure if there is an easier way of achieving this with other changes to the connection pool logic. The alternative that comes to mind would be including the desired ALPN in the connection pool hash (used today for socket settings), though that might not provide a good story for downgrades.

I think there's both a desire to support downgrading a cluster configured to do HTTP/1.1 when HTTP/2 ALPN fails, but also to be able to adjust the ALPN protocols depending on which protocol was selected (when using USE_DOWNSTREAM_PROTOCOL) to make sure that the selected HTTP protocol aligns with the selected ALPN protocol.

@lizan @ggreenway @mattklein123 for thoughts

@lizan
Copy link
Member

lizan commented Apr 15, 2020

Yes I think a new connection pool is still needed for this.

@incfly
Copy link
Contributor

incfly commented Apr 15, 2020

@snowp Sorry I was get distracted to something else on our team project. I got to a point where ALPN plumbing to connection pool creation time. Feel free re-assign if anyone feel the urgency.

To provide some outcome of my research/investigation: by the time I looked at this, the HTTP1 and HTTP2 connection has quite some duplication, like ActiveClient, etc.

This new connection, per requirement, need to handle upgrade and able to switch between 1 and 2, without further code duplication. That's the part I feel we will have to spend quite some time to refactor to unblock this new connection pool.

@snowp
Copy link
Contributor

snowp commented Apr 15, 2020

@incfly Does the work done here #9668 make things easier?

No immediate rush here, just trying to get a sense for where we're at and if someone's working on this

@mattklein123
Copy link
Member

@snowp I think there are 2 ways of thinking about this feature:

  1. Setting ALPN to match what you want to happen (to match downstream for example). This I think could be done with transport socket options and a different connection pool hash. So I don't think we would need a new CP in that case.
  2. What this issue is mainly tracking which is to allow h2, http/1.1 and then actually properly negotiate which one. This is more closely related to how we would do http/1.1 -> h2 upgrade. For that per @lizan I think we do need a new CP.

cc @htuch ^ is related to the h1 -> h2 upgrade intern project I listed.

@alyssawilk
Copy link
Contributor

I'm really not sure what the plan was w.r.t delegated connection pool. Even if we had an outer pool wrapping an inner HTTP/1 and inner HTTP/2 pool we'd need to create a connection somewhere, and hope ALPN matched? Or we need to create raw TCP connections and once they're established, stick that connection into a codec?

I think we want to go go the latter route, and if we do I don't think there's much of a need to have a delegated connection pool. Given almost all the logic between TCP, HTTP/1 and HTTP/2 is shared in the connection pool, and almost all of the application-specific logic is actually in the ActiveClient class, I think we could have a MixedConnectionPool. If alpn is cached, just creates a HTTP/1 or HTTP/2 type based on alpn (with graceful failure if alpn changes). If there's no cached ALPN, it creates raw TCP ActiveClients, and once they're connected, pulls the network::connection from the TCP active client and sticks it in the appropriate HTTP/1 or HTTP/2 ActiveClient.

I think it'd take a little bit of fiddling to move the last bit of pool-specific logic into the active client classes, and a bit of work to be able to pull the connection out of ActiveTcpClient and into anything else, so wanted to check in and make sure this doesn't sound too sketchy before I tackle it.

@alyssawilk
Copy link
Contributor

Basically something like mixed_connection_pool.* here
https://github.com/envoyproxy/envoy/compare/master...alyssawilk:h1_h2?expand=1
but without all the FIXMEs :-P

@mattklein123
Copy link
Member

@alyssawilk yes +1 I agree with the implementation you have outlined and this is roughly how I would have implemented it also. Do you want to just open a PR with your draft to take a look there? Exciting!

@alyssawilk
Copy link
Contributor

I'll go ahead and fix the FIXMES and get some tests working before I send it out. Just wanted to check in early in since it sounded to me like it was a different plan than folks had already agreed on :-)

@alyssawilk alyssawilk self-assigned this Oct 26, 2020
@snowp
Copy link
Contributor

snowp commented Oct 26, 2020

Yeah the idea was to make a raw connection to perform ALPN negotiation before deciding which delegated pool to use, then handing that connection over to the pool once ALPN was decided. My motivation for doing a delegated pool was to be able to reuse the protocol specific handling in each pool (e.g. number of allowed concurrent streams), but if the consolidation of the connection pool has come far enough that might no longer be necessary.

alyssawilk added a commit that referenced this issue Oct 28, 2020
Adding some basic functionality to the Network::Connection needed to disassociate TCP connections from their connection pool before handing them off to a codec client.

Additional Description:
I really wanted to add Network::Connection Connecting as a state, and I did a PR to that effect and cleaned up the 30 or so call sites in Envoy, but then realized that because it broke nearly every use of state, we either needed to have a separate boolean, or change all the enum values so folks would have to fix their downstream code. I chose the former as the lesser of two evils.

I also waffled a bit about removing the callbacks or sticking with the remove pattern in codec_helper.h and went with consistency over cleanup.

Risk Level: low (not yet used)
Testing: unit tests
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a
Part of #3431

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
alyssawilk added a commit that referenced this issue Nov 3, 2020
As part of #3431 we need to move the logic from HTTP/1 and HTTP/2 connection pools to the active client, so the new mixed connection pool can just create an active client of the right type and have all the logic in the client, not split between the client and the pool.

This removes functions for the HTTP/2 pool not moved in #13867

Risk Level: Medium (data plane refactor)
Testing: existing tests pass
Docs Changes: n/a
Release Notes: n/a
Part of #3431

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
ggreenway pushed a commit that referenced this issue Nov 4, 2020
As part of #3431 we need to move the logic from HTTP/1 and HTTP/2 connection pools to the active client,
so the new mixed connection pool can just create an active client of the right type.

This removes member variables from the HTTP/1.1 connection pool in preparation for that move.
a follow-up PR will do the same for HTTP/2, then the two classes will be removed in favor of the base http connection
pool being created with an instantiate_active_client_ closure which creates the right type of clients.

The alpn pool will eventually create an active client of the right type based on initial ALPN.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
ggreenway pushed a commit that referenced this issue Nov 4, 2020
…l) (#13889)

As part of #3431 making sure the ALPN pool can create raw TCP active clients by allowing them to be created by a generic connection pool.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
alyssawilk added a commit that referenced this issue Nov 5, 2020
Extending the TransportSocketOptions to allow for multiple fallback ALPN for the upcoming connection pool which supports
both HTTP/1 and HTTP/2

Risk Level: Low (data plane refactor, but a pretty minor one)
Testing: new unit tests
Docs Changes: n/a
Release Notes: n/a
Part of #3431

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants