Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: failed to download "https://10.43.0.1:443/static/charts/traefik-1.81.0.tgz" #1817

Closed
badaniya opened this issue May 21, 2020 · 14 comments

Comments

@badaniya
Copy link

badaniya commented May 21, 2020

Version:
k3s version v1.18.2+k3s1 (698e444)

K3s arguments:
ExecStart=/usr/local/bin/k3s server --kube-controller-manager-arg pod-eviction-timeout=1m --disable local-storage,metrics-server --disable-cloud-controller --data-dir /var/lib/rancher/k3s --disable traefik --kube-apiserver-arg feature-gates="ServiceTopology=true,EndpointSlice=true" --datastore-endpoint postgres://postgres:postgres@10.177.205.14:5432/kubernetes

Describe the bug
During k3s service startup, we have a custom traefik manifest under /var/lib/rancher/k3s/server/manifest directory so that the helm job will install the custome traefik service. During the helm job, to install the custom traefik manifest, the retrieval of the traefik helm chart from the local kubernetes clusterIP failed. This appears to be due to an internal kubernetes certificate validation failure.

To Reproduce
It is hard to reproduce but comes up on occasion.

  1. Start k3s with above arguments (--disable traefik, but provide a custom traefik manifest at /var/lib/rancher/k3s/server/manifests)
  2. Monitor helm install job in kube-system namespace.

Expected behavior
Retrieval of the traefik helm chart from the local kubernetes cluster should be successful and not fail so that the custom traefik manifest can be installed.

Actual behavior
The retrieval of the traefik helm chart failed.

Additional context / logs

HELM INSTALL JOB FAILURE:
========================
root@node1:~# k3s kubectl -n kube-system get pods -o wide
NAME                         READY   STATUS             RESTARTS   AGE     IP          NODE    NOMINATED NODE   READINESS GATES
coredns-8655855d6-hnltp      1/1     Running            0          7m46s   10.42.1.2   node1   <none>           <none>
helm-install-traefik-4gqsl   0/1     CrashLoopBackOff   6          7m51s   10.42.0.2   node1   <none>           <none>

HELM CHART RETRIEVAL FAILURE:
=============================
root@node1:~# k3s kubectl -n kube-system logs helm-install-traefik-4gqsl
[storage/driver] 2020/05/21 17:21:46 list: failed to list: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%3DTILLER: dial tcp 10.43.0.1:443: connect: no route to host
Error: failed to download "https://10.43.0.1:443/static/charts/traefik-1.81.0.tgz" (hint: running `helm repo update` may help)

APPEARS TO BE A CERT ISSUE:
===========================
root@node2:~# k3s kubectl -n kube-system run --generator=run-pod/v1 --rm utils -it --image arunvelsriram/utils bash
Flag --generator has been deprecated, has no effect and will be removed in the future.
If you don't see a command prompt, try pressing enter.
root@utils:/# curl -X GET https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%3DTILLER
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

root@utils:/# wget https://10.43.0.1:443/static/charts/traefik-1.81.0.tgz
--2020-05-21 19:33:20--  https://10.43.0.1/static/charts/traefik-1.81.0.tgz
Connecting to 10.43.0.1:443... connected.
ERROR: cannot verify 10.43.0.1's certificate, issued by ‘CN=k3s-server-ca@1590081432’:
  Unable to locally verify the issuer's authority.
To connect to 10.43.0.1 insecurely, use `--no-check-certificate'.

openssl s_client -connect 10.43.0.1:443

root@node1:~# openssl s_client -connect 10.43.0.1:443
CONNECTED(00000005)
depth=0 O = k3s, CN = k3s
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 O = k3s, CN = k3s
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
 0 s:O = k3s, CN = k3s
   i:CN = k3s-server-ca@1590081432
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIB0jCCAXigAwIBAgIIQr14kBWP90IwCgYIKoZIzj0EAwIwIzEhMB8GA1UEAwwY
azNzLXNlcnZlci1jYUAxNTkwMDgxNDMyMB4XDTIwMDUyMTE3MTcxMloXDTIxMDUy
MTE3MTczNVowHDEMMAoGA1UEChMDazNzMQwwCgYDVQQDEwNrM3MwWTATBgcqhkjO
PQIBBggqhkjOPQMBBwNCAAQixr+yKVz1HdHoOMaFDjL+5dwXEetmDZTas14Cy1iR
pA3SRlMQ8djnm4wuWNuGKyMhLyUpgiegLCl36YmR8Kw+o4GcMIGZMA4GA1UdDwEB
/wQEAwIFoDATBgNVHSUEDDAKBggrBgEFBQcDATByBgNVHREEazBpggprdWJlcm5l
dGVzghJrdWJlcm5ldGVzLmRlZmF1bHSCJGt1YmVybmV0ZXMuZGVmYXVsdC5zdmMu
Y2x1c3Rlci5sb2NhbIIJbG9jYWxob3N0hwQKsc2ShwQKsc0PhwQKKwABhwR/AAAB
MAoGCCqGSM49BAMCA0gAMEUCIQDt2ENxLFNIDu5yC8BO9vc2U7WabU+OIA9artyx
o4YhNQIgZ5Zhp0mzxftHGQnIkhL6+em7W8eJZTrx3GDH3UzVfcs=
-----END CERTIFICATE-----
subject=O = k3s, CN = k3s

issuer=CN = k3s-server-ca@1590081432

---
No client certificate CA names sent
Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1
Shared Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512
Peer signing digest: SHA256
Peer signature type: ECDSA
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 908 bytes and written 421 bytes
Verification error: unable to verify the first certificate
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 256 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 21 (unable to verify the first certificate)
---
---
Post-Handshake New Session Ticket arrived:
SSL-Session:
    Protocol  : TLSv1.3
    Cipher    : TLS_AES_256_GCM_SHA384
    Session-ID: 3285BD4E72665E39FBB50BF56A0140D200720F9F5029914842F87FD933FD64DF
    Session-ID-ctx:
    Resumption PSK: 04E65FC6AF49DEAE428FB9ACAE936C4F8460880F18D7819712462C22CEBE9242A54D3CC7FABB1E10CE5BE299AA2133FE
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    TLS session ticket lifetime hint: 604800 (seconds)
    TLS session ticket:
    0000 - eb 92 15 cb 8c 92 ee d8-cf 49 78 21 f0 d0 db d1   .........Ix!....
    0010 - bc b3 b9 b7 f5 98 cc 6e-1e 4d 4e 57 d8 6c 0b 72   .......n.MNW.l.r
    0020 - ff 60 f7 41 c1 7e cb fe-a0 cd 4f a7 11 85 52 ee   .`.A.~....O...R.
    0030 - 12 10 de 57 20 de 14 fe-eb ac d4 8e e7 8c d7 98   ...W ...........
    0040 - 0a a2 da d2 5e 5f bc d4-36 fc 03 23 4e 89 27 61   ....^_..6..#N.'a
    0050 - 2d 44 69 f9 20 05 20 d7-c6 0a ce 78 35 ca fa 32   -Di. . ....x5..2
    0060 - fc 5b 22 c7 04 bd 50 2d-4a df d6 75 2b fc c0 02   .["...P-J..u+...
    0070 - c9 1e 5f f0 39 90 52 b9-77 c9 67 fa 87 7c 86 82   .._.9.R.w.g..|..
    0080 - 10                                                .

    Start Time: 1590089846
    Timeout   : 7200 (sec)
    Verify return code: 21 (unable to verify the first certificate)
    Extended master secret: no
    Max Early Data: 0
---
read R BLOCK

CUSTOM TRAEFIK MANIFEST:
========================
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: traefik
  namespace: kube-system
spec:
  chart: https://%{KUBERNETES_API}%/static/charts/traefik-1.81.0.tgz
  set:
    rbac.enabled: "true"
    ssl.enabled: "true"
    metrics.prometheus.enabled: "true"
    kubernetes.ingressEndpoint.useDefaultPublishedService: "true"
    image: "rancher/library-traefik"
    ssl.tlsMinVersion: "VersionTLS12"
    ssl.cipherSuites: "{TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA}"
    ssl.defaultCert: <Our own self-signed base64 encoded cert> 
    ssl.defaultKey: <Our own self-signed base64 encoded key> 
@samirsss
Copy link

bump since i am also seeing it.

Not installing Tiller due to 'client-only' flag having been set
Happy Helming!
++ ++ [main] 2020/05/23 00:52:51 Starting Tiller v2.12.3 (tls=false)
[main] 2020/05/23 00:52:51 GRPC listening on 127.0.0.1:44134
[main] 2020/05/23 00:52:51 Probes listening on :44135
[main] 2020/05/23 00:52:51 Storage driver is Secret
[main] 2020/05/23 00:52:51 Max history per release is 0
helm_v2 ls --all '^traefik$' --output json
jq -r '.Releases | length'
[storage] 2020/05/23 00:52:51 listing all releases with filter
[storage/driver] 2020/05/23 00:52:55 list: failed to list: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%3DTILLER: dial tcp 10.43.0.1:443: connect: no route to host
Error: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp 10.43.0.1:443: connect: no route to host

  • EXIST=
  • '[' '' == 1 ']'
  • '[' '' == v2 ']'
  • helm_repo_init
  • grep -q -e 'https?://'
  • echo 'chart path is a url, skipping repo update'
  • helm_v3 repo remove stable
    chart path is a url, skipping repo update
    Error: no repositories configured
  • true
  • return
  • helm_update install --set-string image=rancher/library-traefik --set kubernetes.ingressEndpoint.useDefaultPublishedService=true --set metrics.prometheus.enabled=true --set rbac.enabled=true --set-string 'ssl.cipherSuites={TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA}' --set ssl.enabled=true --set-string ssl.tlsMinVersion=VersionTLS12
  • '[' helm_v3 == helm_v3 ']'
    ++ helm_v3 ls ++ tr '[:upper:]' '[:lower:]'
    --all-namespaces --all -f '^traefik$' --output ++ jq -r '"(.[0].app_version),(.[0].status)"'
    json
  • LINE=null,null
    ++ echo null,null
    ++ cut -f1 -d,
  • INSTALLED_VERSION=null
    ++ echo null,null
    ++ cut -f2 -d,
  • STATUS=null
  • '[' -e /config/values.yaml ']'
  • '[' install = delete ']'
  • '[' -z null ']'
  • '[' null = deployed ']'
  • '[' null = failed ']'
  • '[' null = deleted ']'
  • helm_v3 install --set-string image=rancher/library-traefik --set kubernetes.ingressEndpoint.useDefaultPublishedService=true --set metrics.prometheus.enabled=true --set rbac.enabled=true --set-string 'ssl.cipherSuites={TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA}' --set ssl.enabled=true --set-string ssl.tlsMinVersion=VersionTLS12 traefik https://10.43.0.1:443/static/charts/traefik-1.81.0.tgz
    Error: failed to download "https://10.43.0.1:443/static/charts/traefik-1.81.0.tgz" (hint: running helm repo update may help)

k3s kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-8655855d6-vczbg 1/1 Running 0 3m35s
helm-install-traefik-k4bld 0/1 CrashLoopBackOff 4 3m55s

After deleting the helm traefik install pod

k3s kubectl -n kube-system delete pod helm-install-traefik-k4bld
pod "helm-install-traefik-k4bld" deleted

The traefik pods get installed

k3s kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-8655855d6-vczbg 1/1 Running 0 4m6s
traefik-bcccd7df9-98twt 0/1 ContainerCreating 0 4s
svclb-traefik-8z5sk 0/2 ContainerCreating 0 4s
helm-install-traefik-rswk7 0/1 Completed 0 12s
svclb-traefik-j9gvn 2/2 Running 0 4s

@brandond
Copy link
Member

brandond commented May 22, 2020

I see a lot of dial tcp 10.43.0.1:443: connect: no route to host which normally would indicate that something's wrong with iptables or connectivity between the agent and server. How many nodes do you have, and are they agent or server?

I also see that you are getting SSL errors in other places when running curl commands - which is expected. The certificates used internally by kubernetes are self-signed and would not be trusted by curl unless you've taken steps to trust the k3s root CA. Additionally, the api server requires authentication, which your curl command does not supply.

This command will still fail because your utils pod's service account doesn't have access to read secrets, but it'll show you that things are working:

kubectl -n kube-system run --rm utils -it --image arunvelsriram/utils -- /bin/bash

curl -X GET 'https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%3DTILLER' --header "Authorization: Bearer $(cat /run/secrets/kubernetes.io/serviceccount/token)" --cacert /run/secrets/kubernetes.io/serviceaccount/ca.crt

@badaniya
Copy link
Author

@brandond I have two nodes both are server.

I realized later on about my silly misdiagnosis of a cert validation issue here. Thanks for providing the right curl parameters to test it properly.

Any more insighst to debug this connectivity issue?

@brandond
Copy link
Member

A 'no route to host' error from the apiserver endpoint is pretty odd. What distro and architecture are you on? Does k3s check-config show anything unusual?

@alekc
Copy link

alekc commented May 24, 2020

also, those 2 nodes, are they running in the same network environment? Are they behind nat? I was hit recently by a bug (feature?) in flannel related to private/public addresses which would explain your issue. #1824

@samirsss
Copy link

Hi @brandond this is on ubuntu 18.04 server/vms that @badaniya and I have been seeing issues on. And @alekc yes both the nodes are on the same network environment.

@brandond
Copy link
Member

Is there a local firewall (ufw) or cloud provider firewall (security groups, etc) that might be blocking some traffic between nodes?

@samirsss
Copy link

@brandond no ufw. Also this is on-prem for us hence not cloud provider firewall either. For now we've added some retries to see if that helps fix the issue for us.

@Orabig
Copy link

Orabig commented Aug 23, 2020

I have the exact same error with a very simple configuration with one node only.

k3s was installed on a CentOS Linux (7.8) with curl -sfL https://get.k3s.io | sh -s - --docker (because docker was already installed on this host).

It seems to work for the main part, as it is possible to deploy pods, services, persistent volumes... Anyway, trafik is not working :

$ kubectl get pods -n kube-system
NAME                                     READY   STATUS             RESTARTS   AGE
coredns-7944c66d8d-l4vzf                 0/1     Running            0          5h45m
metrics-server-7566d596c8-5wf84          0/1     CrashLoopBackOff   8          11m
local-path-provisioner-6d59f47c7-2pbjw   0/1     CrashLoopBackOff   73         5h45m
helm-install-traefik-xthfq               0/1     CrashLoopBackOff   73         5h45m

The logs in the "helm-install-traefik-xthfq" are the very same than the ones posted by samirsss above.

@Gerthum
Copy link

Gerthum commented Feb 25, 2021

Ran into the same issue and after some investigation this solved it for me:

https://forums.docker.com/t/no-route-to-host-network-request-from-container-to-host-ip-port-published-from-other-container/39063/17

stopping the firewalld service confirmed the issue for me. I then had to allow the k3s subnet (10.42.x.x) access to the host to get things working while the firewalld service is running.

@ChristianCiach
Copy link

I had the same issue on my freshly provisioned CentOS 8 box. The output of kubectl get pods -n kube-system looked exactly like the example by @Orabig.

Even though my CentOS installation was completely new, there was a single thing that was different than on a vanilla installation of CentOS 8: For reasons unknown to me, our IT people insist on installing the package iptables-services and enabling the containing service iptables. I am not completely sure what this service does, but it changes some iptables rules. Please see the attached files that show the output of iptables-save with (where containers cannot connect to 10.43.0.1) and without (where k3s works fine) the iptables-services package installed.

iptables-save_without-service.txt
iptables-save_with-service.txt

@ChristianCiach
Copy link

ChristianCiach commented Apr 16, 2021

If iptables-services is installed and the iptables service is running, we had to add this line to /etc/sysconfig/iptables:

-A INPUT -p tcp -m state --state NEW -m tcp --dport 443,6445 -s 10.42.0.0/16 -d 127.0.0.1/32 -j ACCEPT

Shoutout to #566 (comment) to point me into this direction. I agree with the author of that comment that k3s should do this automatically.

EDIT: Actually, I am not too sure anymore that this was the solution...

@stale
Copy link

stale bot commented Oct 13, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Oct 13, 2021
@stale stale bot closed this as completed Oct 27, 2021
@slim-bean
Copy link

For posterity, I found this error but for different reasons I believe.

I jacked up my cluster pretty good and ended up deleting all my server nodes and re-adding them, this was also an upgrade as well which I think may have contributed to my problems. After re-adding the last I was getting this error Error: failed to download "https://10.43.0.1:443/static/charts/traefik-1.81.0.tgz"

I was able to work around this by deleting the traefik helm deployment with the command I found here: #717 (comment)

And then I restarted the k3s systemd unit and it recreated the traefik.yaml file (which was missing??) from the /manifests directory and successfully re-installed traefik for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants