TLS Handshake error with vault agent injector #275

rishabh-arya95 · 2021-08-03T14:27:52Z

I am running the vault agent injector with auto tls enabled and configured an external vault server that is running on my host.

helm install vault hashicorp/vault \
    --set "injector.externalVaultAddr=http://${HOST_PRIVATE_IP}:8200"

Everything was working fine, suddenly after 24 hours, I am getting this bad certificate issue.

I have even tried using vault.hashicorp.com/tls-skip-verify annotation but the result is the same.
These are the agent injector logs.

kubectl logs -f vault-agent-injector-688d969fd6-fnxg5 -n vault
2021-08-02T13:21:51.952Z [INFO]  handler: Starting handler..
2021-08-02T13:21:51.961Z [INFO]  handler.auto-tls: Generated CA
Listening on ":8080"...
2021-08-02T13:21:51.968Z [INFO]  handler.certwatcher: Updated certificate bundle received. Updating certs...
2021-08-02T14:24:33.851Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T14:29:55.076Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T14:39:55.056Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T14:40:23.280Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T14:43:21.314Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T14:43:46.477Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T14:57:04.122Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T14:57:27.107Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:01:42.361Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:02:09.823Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:11:48.065Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:21:37.688Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:35:41.022Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:43:42.305Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:44:04.952Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:45:44.332Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:50:16.360Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:51:39.813Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:55:00.744Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T15:55:18.417Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T16:03:29.690Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T16:04:04.751Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T16:04:25.749Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T16:08:38.321Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T16:12:49.566Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T16:19:58.982Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T16:20:17.434Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-02T16:21:46.631Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T07:49:19.399Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T07:52:39.347Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T10:49:37.464Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T10:50:45.909Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T11:04:06.630Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T11:05:08.257Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T11:11:42.147Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T11:12:00.599Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T11:18:34.290Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T11:20:16.163Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T11:23:51.858Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T13:01:44.956Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-03T13:36:24.566Z [ERROR] handler: http: TLS handshake error from 172.17.0.1:17971: remote error: tls: bad certificate
2021-08-03T13:51:32.001Z [ERROR] handler: http: TLS handshake error from 172.17.0.1:20802: remote error: tls: bad certificate
2021-08-03T13:56:14.122Z [ERROR] handler: http: TLS handshake error from 172.17.0.1:36180: remote error: tls: bad certificate
2021-08-03T13:57:10.726Z [ERROR] handler: http: TLS handshake error from 172.17.0.1:4425: remote error: tls: bad certificate
2021-08-03T14:01:01.632Z [ERROR] handler: http: TLS handshake error from 172.17.0.1:10077: remote error: tls: bad certificate
2021-08-03T14:01:26.954Z [ERROR] handler: http: TLS handshake error from 172.17.0.1:50072: remote error: tls: bad certificate
2021-08-03T14:01:54.899Z [ERROR] handler: http: TLS handshake error from 172.17.0.1:17536: remote error: tls: bad certificate
2021-08-03T14:12:29.850Z [ERROR] handler: http: TLS handshake error from 172.17.0.1:12749: remote error: tls: bad certificate
2021-08-03T14:12:52.509Z [ERROR] handler: http: TLS handshake error from 172.17.0.1:1626: remote error: tls: bad certificate

The text was updated successfully, but these errors were encountered:

RakeshRaj97 · 2021-08-30T01:40:28Z

Did you get this sorted?

birelian · 2022-05-18T10:36:21Z

Hi,

Any update on this? We confirm it's happening with 0.16.0 deployed in AKS 1.21.

Thanks!

kevinlmadison · 2022-05-20T19:06:47Z

Having the same issue, vault 1.10 k8s 1.22.6 (RKE2).

swenson · 2022-05-20T23:32:45Z

The auto TLS certificate regenerates every 24 hours, which sounds like it probably related to the problem.

I'm having trouble reproducing this, even on the helm chart version 0.16.0.

Are there any other steps you use to see this issue?

What I am doing:

Vault running on my machine
Setup Kubernetes auth method as per https://learn.hashicorp.com/tutorials/vault/kubernetes-external-vault?in=vault/kubernetes#install-the-vault-helm-chart-configured-to-address-an-external-vault
Get the injector running with

helm install vault vault --repo https://helm.releases.hashicorp.com \
  --version=0.16.0 \
  --set server.enabled=false \
  --set injector.enabled=true \
  --set "injector.externalVaultAddr=http://192.168.65.2:8200"

Continually deploy and delete a pod that injects a secret

But I never see the failure mentioned after the certificate is refreshed.

xanmanning · 2022-05-23T09:46:41Z

We are seeing the same on the clusters we've upgraded, however it seems less frequent on the two clusters where we have a 1 minute cronjob continuously deploying and deleting a pod that injects a secret.

Running on GKE 1.21

Deployment is effectively:

helm install vault vault --repo https://helm.releases.hashicorp.com \
  --set server.enabled=false \
  --set injector.enabled=true \
  --set injector.image.tag=0.16.0 \
  --set "injector.externalVaultAddr=https://SOME_VAULT_ADDRESS:8200"

I'm going to perform some tests on a K3D cluster to see if there's a pattern.

birelian · 2022-05-23T10:00:20Z

The auto TLS certificate regenerates every 24 hours, which sounds like it probably related to the problem.

I'm having trouble reproducing this, even on the helm chart version 0.16.0.

Are there any other steps you use to see this issue?

What I am doing:
* Vault running on my machine

* Setup Kubernetes auth method as per https://learn.hashicorp.com/tutorials/vault/kubernetes-external-vault?in=vault/kubernetes#install-the-vault-helm-chart-configured-to-address-an-external-vault

* Get the injector running with
helm install vault vault --repo https://helm.releases.hashicorp.com \
  --version=0.16.0 \
  --set server.enabled=false \
  --set injector.enabled=true \
  --set "injector.externalVaultAddr=http://192.168.65.2:8200"
* Continually deploy and delete a pod that injects a secret
But I never see the failure mentioned after the certificate is refreshed.

Sorry @swenson for not being precise. When I said 0.16.0 I was talking about the vault-k8s version. The configuration that failed was Helm Chart vault-0.18.0 with Vault 1.9.6 and injector 0.16.0. This combination was failing in two different AKS clusters running K8s 1.21.

After we downgraded injector to 0.15.0, the error seems to be gone in the both clusters (at least, during the last week).

Thanks!

eddiehoffman · 2022-05-23T13:51:07Z

The auto TLS certificate regenerates every 24 hours, which sounds like it probably related to the problem.
I'm having trouble reproducing this, even on the helm chart version 0.16.0.
Are there any other steps you use to see this issue?
What I am doing:
* Vault running on my machine

* Setup Kubernetes auth method as per https://learn.hashicorp.com/tutorials/vault/kubernetes-external-vault?in=vault/kubernetes#install-the-vault-helm-chart-configured-to-address-an-external-vault

* Get the injector running with
helm install vault vault --repo https://helm.releases.hashicorp.com \
  --version=0.16.0 \
  --set server.enabled=false \
  --set injector.enabled=true \
  --set "injector.externalVaultAddr=http://192.168.65.2:8200"
* Continually deploy and delete a pod that injects a secret
But I never see the failure mentioned after the certificate is refreshed.
Sorry @swenson for not being precise. When I said 0.16.0 I was talking about the vault-k8s version. The configuration that failed was Helm Chart vault-0.18.0 with Vault 1.9.6 and injector 0.16.0. This combination was failing in two different AKS clusters running K8s 1.21.

After we downgraded injector to 0.15.0, the error seems to be gone in the both clusters (at least, during the last week).

Thanks!

We have the same issue and have had to revert back to 0.15.0.

xanmanning · 2022-05-23T17:04:21Z

Managed to re-create this using k3d cluster locally.

I created a cluster and deployed Vault+Vault-Agent-Injector

Set up a cronjob pulling Vault secrets running every minute for over 24 hours, no issue. I stopped the cronjob, and noted the time that the certificate was last updated (15:58 UTC on the 22nd) - waited until ~16:57 UTC on the 23rd (about 5 minutes ago) and ran a job from my cronjob.

Once the `time.NewTimer()` expires, calls to `timer.Stop()` will return `false`, but the channel will have nothing in it, causing `<-timer.C` to hang forever. This is hinted at by the docs, even though they suggest `timer.Stop()` should return true in that case. We change to a non-blocking drain so that we won't block forever. This manifests in never updating the certificate after it expires, causing TLS handshake errors. Fixes #275

swenson · 2022-05-23T22:01:09Z

I believe we have found the underlying cause for this and fixed it in the last few PRs.

I think we'll cut a new release of vault-k8s soon to address these issues (not sure exactly when, but I'd like to after this week and possibly after a few more fixes get in).

birelian · 2022-05-24T06:51:10Z

I believe we have found the underlying cause for this and fixed it in the last few PRs.

I think we'll cut a new release of vault-k8s soon to address these issues (not sure exactly when, but I'd like to after this week and possibly after a few more fixes get in).

Thank you!

Preen · 2022-05-25T15:32:26Z

can confirm I also had the problems described above and that downgrading to 0.15 worked.

lucasscheepers · 2022-09-22T12:55:38Z

Still experiencing this problem: Vault agent injector throws error 'tls: bad certificate' after each 24 hours.

@swenson In which version did you fix this bug?

rishabh-arya95 added the bug Something isn't working label Aug 3, 2021

matsgoran mentioned this issue May 23, 2022

0.16.0 - Occasionally vault init container just doesn't start #349

Closed

swenson mentioned this issue May 23, 2022

Stopped timer can cause certificate to never update #350

Merged

swenson closed this as completed in #350 May 23, 2022

lucasscheepers mentioned this issue Sep 22, 2022

Vault agent injector throws error 'tls: bad certificate' after each 24 hours hashicorp/vault-helm#787

Closed

gitlabbin mentioned this issue Mar 15, 2024

fix the bad-certificate issue when using tls-auto #607

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TLS Handshake error with vault agent injector #275

TLS Handshake error with vault agent injector #275

rishabh-arya95 commented Aug 3, 2021 •

edited

Loading

RakeshRaj97 commented Aug 30, 2021

birelian commented May 18, 2022

kevinlmadison commented May 20, 2022

swenson commented May 20, 2022 •

edited

Loading

xanmanning commented May 23, 2022

birelian commented May 23, 2022

eddiehoffman commented May 23, 2022

xanmanning commented May 23, 2022

swenson commented May 23, 2022

birelian commented May 24, 2022

Preen commented May 25, 2022

lucasscheepers commented Sep 22, 2022

TLS Handshake error with vault agent injector #275

TLS Handshake error with vault agent injector #275

Comments

rishabh-arya95 commented Aug 3, 2021 • edited Loading

RakeshRaj97 commented Aug 30, 2021

birelian commented May 18, 2022

kevinlmadison commented May 20, 2022

swenson commented May 20, 2022 • edited Loading

xanmanning commented May 23, 2022

birelian commented May 23, 2022

eddiehoffman commented May 23, 2022

xanmanning commented May 23, 2022

swenson commented May 23, 2022

birelian commented May 24, 2022

Preen commented May 25, 2022

lucasscheepers commented Sep 22, 2022

rishabh-arya95 commented Aug 3, 2021 •

edited

Loading

swenson commented May 20, 2022 •

edited

Loading