Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable TLS 1.3 on the client-side by default #9300

Open
PiotrSikora opened this issue Dec 10, 2019 · 14 comments
Open

Enable TLS 1.3 on the client-side by default #9300

PiotrSikora opened this issue Dec 10, 2019 · 14 comments
Labels

Comments

@PiotrSikora
Copy link
Contributor

Currently, TLS 1.3 is NOT enabled by default on the client-side, because the handshake changed a bit in TLS 1.3, and is considered completed on the client-side as soon as the requested client certificate is sent, i.e. before server validates the presented client certificate. This means that the client-side transport socket reports connection as established, and the client starts sending data without knowing if the server is going to accept the client certificate, which makes handling failures a bit tricky with respect to retries and buffering, and makes enabling TLS 1.3 on client-side significantly more difficult than simply changing the maximum supported protocol version.

cc @mattklein123 @lizan @derekargueta

@yaronf
Copy link

yaronf commented Jul 24, 2020

Just for the record, authentication failures (where the server decides that the client cert is invalid) are in general not retryable. Resending the handshake message is basically guaranteed to fail.

@PiotrSikora
Copy link
Contributor Author

Just for the record, authentication failures (where the server decides that the client cert is invalid) are in general not retryable. Resending the handshake message is basically guaranteed to fail.

To the same endpoint, yes. But retries are usually sent to different endpoints than the one that failed.

@hobbytp
Copy link

hobbytp commented Sep 22, 2020

cc @PiotrSikora @mattklein123 @lizan @derekargueta

I tried to test sidecar/envoyproxy(1.15) in istio1.7 as client (istio sleep + sidecar ) with TLS1.3 (configured by envoy filter) for TLS origination (configure ServiceEntry and DestinationRule) to connect to some http2/TLS1.3 servers, and it can work as below.
Do you know why TLS1.3 for client can work? Because as I understand, with this issue, it shall not work, or any thing I missed?

  1. www.google.com, and www.facebook.com can work with TLS1.3 with envoyfilter + TLS origination.
  2. banzaicloud.com can work, but still need to add sni in DestinationRule.
  3. http2.pro need to add a fixing (envoy filter patch in http alpn override applied to external services breaks connections istio/istio#24619) before it can work.
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: google-com
  namespace: hobby
spec:
  hosts:
  - www.google.com
  location: MESH_EXTERNAL
  ports:
  - name: http-port-for-tls-origination
    number: 443
    protocol: HTTP
  resolution: NONE
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: google-com
  namespace: hobby
spec:
  host: www.google.com
  trafficPolicy:
    loadBalancer:
      simple: ROUND_ROBIN
    portLevelSettings:
    - port:
        number: 443
      tls:
        mode: SIMPLE
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: goole-com
  namespace: hobby
spec:
  configPatches:
  - applyTo: CLUSTER
    match:
      cluster:
        portNumber: 443
        service: www.google.com
      context: SIDECAR_OUTBOUND
    patch:
      operation: MERGE
      value:
        transport_socket:
          typed_config:
            '@type': type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext
            common_tls_context:
              tls_params:
                tls_maximum_protocol_version: TLSv1_3
                tls_minimum_protocol_version: TLSv1_3
  workloadSelector:
    labels:
      app: sleep  # modify this for your pod

The test curl command is as below:
kubectl exec -it sleep-8f795f47d-ddzr9 -n hobby -- curl --http2-prior-knowledge -v -sL -o /dev/null -D - http://banzaicloud.com:443

I also update here : https://discuss.istio.io/t/configuring-tls-versions/2273/13

@PiotrSikora
Copy link
Contributor Author

Do you know why TLS1.3 for client can work? Because as I understand, with this issue, it shall not work, or any thing I missed?

TLS 1.3 works fine, but it's not enabled by default, because there are some edge cases around retries (e.g. when using client certificates), that are not currently handled by Envoy.

@sha-rath
Copy link

Yes, I applied the EnvoyFilter to enable TLSv1.3 and it seems to be working fine. I did a couple of tests after applying the retry policies both in the simple TLS and mTLS modes, but could not find any issues you were mentioning @PiotrSikora . I also tried certificate rotation and querying the server with an invalid certificate, but they seem to work fine as expected. Could you please explain those edge cases around retires that would not work with TLSv1.3 on the client-side? It would be really helpful, thanks!

@hobbytp
Copy link

hobbytp commented Aug 10, 2021

@PiotrSikora is this issue a boringssl issue or envoy proxy issue?
Is it related to issue https://boringssl-review.googlesource.com/c/boringssl/+/37304

HelloRetryRequest getter: 

Adds getter indicating whether HelloRetryRequest was triggered
during TLSv1.3 handshake.

thanks for clarification.

@romanholidaypancakes
Copy link

My traffic passes through envoy to nginx, once nginx sets ssl_protocols TLSv1.3;, the following error appears
image

I set ssl_protocols TLSv1.2; everything works fine

@hobbytp
Copy link

hobbytp commented Mar 15, 2022

istio update it here: istio/istio#37540

@inflatador
Copy link

My traffic passes through envoy to nginx, once nginx sets ssl_protocols TLSv1.3;, the following error appears image

I set ssl_protocols TLSv1.2; everything works fine

I'm seeing the same thing; has anyone had success proxying TLS 1.3 using envoy static config? If so, would you mind sharing your config?

Here's my current config for reference .

@Brunomachadob
Copy link

Brunomachadob commented Apr 27, 2023

My traffic passes through envoy to nginx, once nginx sets ssl_protocols TLSv1.3;, the following error appears image
I set ssl_protocols TLSv1.2; everything works fine

I'm seeing the same thing; has anyone had success proxying TLS 1.3 using envoy static config? If so, would you mind sharing your config?

Here's my current config for reference .

From a quick test here locally, I was also getting errors to connect one envoy to another using 1.3
My test setup consists of 2 Envoys (envoy1 & envoy2), both v1.23.1, the first proxying to the second which does a direct response.

If I set on the envoy1:

      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          '@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          common_tls_context:
            tls_params:
              tls_minimum_protocol_version: TLSv1_3

and try to hit the path that proxies to the envoy2, I get:

curl -k https://localhost:1443/envoy2 
upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435736:SSL routines:OPENSSL_internal:NO_SUPPORTED_VERSIONS_ENABLED%

and on access logs, only information about the first envoy:

envoy1  | [2023-04-27T12:33:28,323Z] [INFO] downstream.tls.version=TLSv1.3 upstream.tls.version=- http.protocol=HTTP/1.1 http.request.method=GET url.path=/ http.response.status_code=503

Now, if I set on the envoy1:

      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          '@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          common_tls_context:
            tls_params:
              tls_minimum_protocol_version: TLSv1_3
              tls_maximum_protocol_version: TLSv1_3

Then I get a success

curl -k https://localhost:1443/envoy2
Hello from envoy2 
envoy1  | [2023-04-27T12:35:29,887Z] [INFO] downstream.tls.version=TLSv1.3 upstream.tls.version=TLSv1.3 http.protocol=HTTP/1.1 http.request.method=GET url.path=/ http.response.status_code=200
envoy2  | [2023-04-27T12:35:29,901Z] [INFO] downstream.tls.version=TLSv1.3 upstream.tls.version=- http.protocol=HTTP/2 http.request.method=GET url.path=/ http.response.status_code=200

For some reason envoy breaks if you do not set both tls_minimum_protocol_version and tls_maximum_protocol_version.

Someone mentioned this same thing here: https://phabricator.wikimedia.org/T246083

@ggreenway
Copy link
Contributor

Reference for the TLS 1.3 behavior discussed: https://www.rfc-editor.org/rfc/rfc8446#appendix-E.1.2

This should only be an issue when doing mTLS. If the application protocol (such as HTTP) does retries, it should work fine. The cases where it could be problematic are things like tcp_proxy, where it will only retry when Envoy detects a failure to establish a connection, which is what cannot be detected in this case.

@tsaarni
Copy link
Member

tsaarni commented Aug 9, 2023

I experimented with this by using untrusted client certificate to trigger the difference between TLSv1.2 and TLSv1.3 handshake. In HTTP request use case, we currently lose the detailed error message:

With TLSv1.2 (the default) the error message has the TLS alert info:

upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE

but with TLSv1.3 it will be not shown:

upstream connect error or disconnect/reset before headers. reset reason: connection termination

The above messages are coming from here.

@ggreenway wrote

If the application protocol (such as HTTP) does retries, it should work fine. The cases where it could be problematic are things like tcp_proxy, where it will only retry when Envoy detects a failure to establish a connection, which is what cannot be detected in this case.

In case of tcp_proxy, wouldn't TLS handshake be between the downstream client and upstream server, and Envoy would be unaware of the TLS version? Edit: I now realized you meant re-encrypting at Envoy.

In general, I can see the problem with the TLSv1.3 handshake from our perspective, requiring client to read before it can discover the error. But I'm still puzzled about retry specifically becoming a problem, since authentication failure is not likely expected to be temporary and retried. Do you have any insights in this @ggreenway, @PiotrSikora?

@PiotrSikora
Copy link
Contributor Author

PiotrSikora commented Aug 9, 2023

In general, I can see the problem with the TLSv1.3 handshake from our perspective, requiring client to read before it can discover the error. But I'm still puzzled about retry specifically becoming a problem, since authentication failure is not likely expected to be temporary and retried. Do you have any insights in this @ggreenway, @PiotrSikora?

The problem is that authentication failure is not part of the TLS 1.3 handshake, so if the proxy "successfully" connects to the upstream and starts forwarding data from downstream, then that data is not stored in the proxy. So, if the proxy later receives TLS alert that its client certificate was rejected, then is has no way to retry with another upstream, because the data that is supposed to forwarded is gone.

Even if we wanted to buffer that data for some period of time, that amount is effectively unbounded, since in TLS 1.3 there is no positive signal that the client certificate was accepted.

While the authentication failure is unlikely to be temporary with a given endpoint, the whole point of load balancers and retrying is that another endpoint in the cluster might accept it. This is especially true when running multi-cluster/-region/-cloud deployments with canaries.

@tsaarni
Copy link
Member

tsaarni commented Aug 11, 2023

Here is a summary of the conditions which the missing retry attempt can be observed (for those who want to reproduce the issue):

Cluster:

  • Cluster is configured with envoy.transport_sockets.tls in order to originate TLS connection from Envoy towards upstream cluster.
  • UpstreamTlsContext...tls_maximum_protocol_version is set to TLSv1_3.
  • Upstream service selects TLSv1.3 when offered.
  • UpstreamTlsContext is configured to use client certificate for mutual TLS authentication.
  • Upstream service rejects the client certificate.
  • Cluster has been configured with more than one endpoint.

HTTP connection manager:

  • HTTPConnectionManager...retry_policy.retry_on is set to connection-failure (not e.g. reset which triggers retry also with TLSv1.3).

TCP proxy

  • TcpProxy.max_connect_attempts is set to value greater than 1 (default is 1).

After upstream refuses the connection attempt by sending TLS alert and closing the connection, retry towards another endpoint is triggered for TLSv1.2 connections but not for TLSv1.3 connections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants