-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable TLS 1.3 on the client-side by default #9300
Comments
Just for the record, authentication failures (where the server decides that the client cert is invalid) are in general not retryable. Resending the handshake message is basically guaranteed to fail. |
To the same endpoint, yes. But retries are usually sent to different endpoints than the one that failed. |
cc @PiotrSikora @mattklein123 @lizan @derekargueta I tried to test sidecar/envoyproxy(1.15) in istio1.7 as client (istio sleep + sidecar ) with TLS1.3 (configured by envoy filter) for TLS origination (configure ServiceEntry and DestinationRule) to connect to some http2/TLS1.3 servers, and it can work as below.
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: google-com
namespace: hobby
spec:
hosts:
- www.google.com
location: MESH_EXTERNAL
ports:
- name: http-port-for-tls-origination
number: 443
protocol: HTTP
resolution: NONE
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: google-com
namespace: hobby
spec:
host: www.google.com
trafficPolicy:
loadBalancer:
simple: ROUND_ROBIN
portLevelSettings:
- port:
number: 443
tls:
mode: SIMPLE
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: goole-com
namespace: hobby
spec:
configPatches:
- applyTo: CLUSTER
match:
cluster:
portNumber: 443
service: www.google.com
context: SIDECAR_OUTBOUND
patch:
operation: MERGE
value:
transport_socket:
typed_config:
'@type': type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext
common_tls_context:
tls_params:
tls_maximum_protocol_version: TLSv1_3
tls_minimum_protocol_version: TLSv1_3
workloadSelector:
labels:
app: sleep # modify this for your pod The test curl command is as below: I also update here : https://discuss.istio.io/t/configuring-tls-versions/2273/13 |
TLS 1.3 works fine, but it's not enabled by default, because there are some edge cases around retries (e.g. when using client certificates), that are not currently handled by Envoy. |
Yes, I applied the EnvoyFilter to enable TLSv1.3 and it seems to be working fine. I did a couple of tests after applying the retry policies both in the simple TLS and mTLS modes, but could not find any issues you were mentioning @PiotrSikora . I also tried certificate rotation and querying the server with an invalid certificate, but they seem to work fine as expected. Could you please explain those edge cases around retires that would not work with TLSv1.3 on the client-side? It would be really helpful, thanks! |
@PiotrSikora is this issue a boringssl issue or envoy proxy issue? HelloRetryRequest getter:
Adds getter indicating whether HelloRetryRequest was triggered
during TLSv1.3 handshake. thanks for clarification. |
istio update it here: istio/istio#37540 |
I'm seeing the same thing; has anyone had success proxying TLS 1.3 using envoy static config? If so, would you mind sharing your config? Here's my current config for reference . |
From a quick test here locally, I was also getting errors to connect one envoy to another using 1.3 If I set on the transport_socket:
name: envoy.transport_sockets.tls
typed_config:
'@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
tls_params:
tls_minimum_protocol_version: TLSv1_3 and try to hit the path that proxies to the curl -k https://localhost:1443/envoy2
upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435736:SSL routines:OPENSSL_internal:NO_SUPPORTED_VERSIONS_ENABLED% and on access logs, only information about the first envoy: envoy1 | [2023-04-27T12:33:28,323Z] [INFO] downstream.tls.version=TLSv1.3 upstream.tls.version=- http.protocol=HTTP/1.1 http.request.method=GET url.path=/ http.response.status_code=503 Now, if I set on the transport_socket:
name: envoy.transport_sockets.tls
typed_config:
'@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
tls_params:
tls_minimum_protocol_version: TLSv1_3
tls_maximum_protocol_version: TLSv1_3 Then I get a success curl -k https://localhost:1443/envoy2
Hello from envoy2 envoy1 | [2023-04-27T12:35:29,887Z] [INFO] downstream.tls.version=TLSv1.3 upstream.tls.version=TLSv1.3 http.protocol=HTTP/1.1 http.request.method=GET url.path=/ http.response.status_code=200
envoy2 | [2023-04-27T12:35:29,901Z] [INFO] downstream.tls.version=TLSv1.3 upstream.tls.version=- http.protocol=HTTP/2 http.request.method=GET url.path=/ http.response.status_code=200 For some reason envoy breaks if you do not set both Someone mentioned this same thing here: https://phabricator.wikimedia.org/T246083 |
Reference for the TLS 1.3 behavior discussed: https://www.rfc-editor.org/rfc/rfc8446#appendix-E.1.2 This should only be an issue when doing mTLS. If the application protocol (such as HTTP) does retries, it should work fine. The cases where it could be problematic are things like tcp_proxy, where it will only retry when Envoy detects a failure to establish a connection, which is what cannot be detected in this case. |
I experimented with this by using untrusted client certificate to trigger the difference between TLSv1.2 and TLSv1.3 handshake. In HTTP request use case, we currently lose the detailed error message: With TLSv1.2 (the default) the error message has the TLS alert info:
but with TLSv1.3 it will be not shown:
The above messages are coming from here. @ggreenway wrote
In general, I can see the problem with the TLSv1.3 handshake from our perspective, requiring client to read before it can discover the error. But I'm still puzzled about retry specifically becoming a problem, since authentication failure is not likely expected to be temporary and retried. Do you have any insights in this @ggreenway, @PiotrSikora? |
The problem is that authentication failure is not part of the TLS 1.3 handshake, so if the proxy "successfully" connects to the upstream and starts forwarding data from downstream, then that data is not stored in the proxy. So, if the proxy later receives TLS alert that its client certificate was rejected, then is has no way to retry with another upstream, because the data that is supposed to forwarded is gone. Even if we wanted to buffer that data for some period of time, that amount is effectively unbounded, since in TLS 1.3 there is no positive signal that the client certificate was accepted. While the authentication failure is unlikely to be temporary with a given endpoint, the whole point of load balancers and retrying is that another endpoint in the cluster might accept it. This is especially true when running multi-cluster/-region/-cloud deployments with canaries. |
Here is a summary of the conditions which the missing retry attempt can be observed (for those who want to reproduce the issue): Cluster:
HTTP connection manager:
TCP proxy
After upstream refuses the connection attempt by sending TLS alert and closing the connection, retry towards another endpoint is triggered for TLSv1.2 connections but not for TLSv1.3 connections. |
Currently, TLS 1.3 is NOT enabled by default on the client-side, because the handshake changed a bit in TLS 1.3, and is considered completed on the client-side as soon as the requested client certificate is sent, i.e. before server validates the presented client certificate. This means that the client-side transport socket reports connection as established, and the client starts sending data without knowing if the server is going to accept the client certificate, which makes handling failures a bit tricky with respect to retries and buffering, and makes enabling TLS 1.3 on client-side significantly more difficult than simply changing the maximum supported protocol version.
cc @mattklein123 @lizan @derekargueta
The text was updated successfully, but these errors were encountered: