-
Notifications
You must be signed in to change notification settings - Fork 295
etcd2 cluster doesnt start #62
Comments
@baldeynz Hi, thanks for trying kube-aws! Unfortunately, kube-aws doesn't support hostnames coming from Route53 private DNS(!= not the one provided by default in AWS) for etcd nodes. To provide some context, would you mind looking into https://github.com/coreos/kube-aws/blob/master/config/config.go#L254-L321, especially Anyways, I believe that this is something we'd like to support in the future. |
Thanks for the quick response :) |
Was directed here via feedback from another issue: coreos/coreos-kubernetes#675 (comment)
Disclaimer I'm using the latest I tried to follow the steps noted above by @baldeynz
This appears to be the problem that the etcd2 service is complaining about validating that the .pem files are 0 bytes/empty:
Unfortunately this process did not work for me with the latest RC3 version, perhaps I should try reverting back to the previous RC version used above which was RC2 and see if we have better luck with it? Thanks @baldeynz and @mumoshu for your information and assistance. |
Hi @cmcconnell1, thanks as always! At first, there's no significant change between rc.2 and rc.3 regarding how etcd nodes are provisioned, therefore the issue might have no relation to the difference in versions. Two things:
|
@cmcconnell1 Also, could you check Putting full log files to gist.github.com would help as we're still unsure which part of the whole system is failing(=close to the root cause). |
Hello @mumoshu
I also repeated the specified manual hack to modify the cloud-init-etcd file, hardcoding the expected etcd nodes AWS DNS name which resolves on the etcd node.
Full output from requested systemctl and journalctl commands on gists below, as for the "full log files" can you specify locations and names for all logs you would like to see?:
Thanks again |
It seems your nodes are not able to connect to the internet in order to download the awscli image.
|
Hi @cmcconnell1, thanks as always! I agree to @pieterlange for a possible source of your issue. More concretely, I suspect that:
Excuse me if I'm repeating what you might know again but AFAIK ephemeral ports is used to "receive" packets from the other end of a tcp session. Forbidding uses of those in the ACL/SG outbound rule would end up dropping all the data like HTTP responses hence I also have an experience trouble-shooting a slightly similar issue caused by blocking ephemeral ports in coreos/coreos-kubernetes#744 (comment). |
…e-proxy-race-condition to hcom-flavour * commit '1d3373d1c2d7a6db17df8dfcbc14606b8fa3c9ad': Fix for issue: kubernetes-retired#1424 Fix for issue: kubernetes-retired#1424
Hi
Im not sure whether i have something configured incorrectly or whether this is a bug so apologies if this is posted as an issue and its not. I have looked for docs around this config and cant see anything obviously wrong but happy to read more if someone can point me in the right direction.
Problem is that using default config the etcd2 cluster wont start.
Im using v0.9.1-rc.2
below is the etc2d config in userdata/cloud-config-etcd file which is unchanged from what was generated using the kube-aws render command
_units:
- name: etcd2.service
drop-ins:
- name: 20-etcd2-aws-cluster.conf
content: |
[Unit]
Requires=decrypt-tls-assets.service
After=decrypt-tls-assets.service
my cluster.yaml contains this:
hostedZoneId: "sandbox.testwaikato.kiwi"
and
etcdCount: 3
all other DNS and etcd config in it is default.
The problem is that when the etcd tries to start with this config there is an error:
Nov 15 20:10:52 ip-172-19-76-198.sandbox.testwaikato.kiwi systemd[1]: Starting etcd2...
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=https://ip-172-19-76-198.sandbox.testwaikato.kiwi:2379
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_CERT_FILE=/etc/etcd2/ssl/etcd.pem
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_CLIENT_CERT_AUTH=true
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_DATA_DIR=/var/lib/etcd2
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_ELECTION_TIMEOUT=1200
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=https://ip-172-19-76-198.sandbox.testwaikato.kiwi:2380
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_INITIAL_CLUSTER=ip-172-19-76-198.ap-southeast-2.compute.internal=https://ip-172-19-76-198.ap-southeast-2.compute.intern
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_INITIAL_CLUSTER_STATE=new
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_KEY_FILE=/etc/etcd2/ssl/etcd-key.pem
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_LISTEN_CLIENT_URLS=https://ip-172-19-76-198.sandbox.testwaikato.kiwi:2379
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_LISTEN_PEER_URLS=https://ip-172-19-76-198.sandbox.testwaikato.kiwi:2380
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_NAME=ip-172-19-76-198.sandbox.testwaikato.kiwi
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_PEER_CERT_FILE=/etc/etcd2/ssl/etcd.pem
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_PEER_KEY_FILE=/etc/etcd2/ssl/etcd-key.pem
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd2/ssl/ca.pem
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: recognized and used environment variable ETCD_TRUSTED_CA_FILE=/etc/etcd2/ssl/ca.pem
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: etcd Version: 2.3.7
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: Git SHA: fd17c91
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: Go Version: go1.7.1
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: Go OS/Arch: linux/amd64
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: setting maximum number of CPUs to 1, total number of available CPUs is 1
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: found invalid file/dir lost+found under data dir /var/lib/etcd2 (Ignore this if you are upgrading etcd)
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: peerTLS: cert = /etc/etcd2/ssl/etcd.pem, key = /etc/etcd2/ssl/etcd-key.pem, ca = , trusted-ca = /etc/etcd2/ssl/ca.pem, client-cert-auth = false
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: listening for peers on https://ip-172-19-76-198.sandbox.testwaikato.kiwi:2380
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: clientTLS: cert = /etc/etcd2/ssl/etcd.pem, key = /etc/etcd2/ssl/etcd-key.pem, ca = , trusted-ca = /etc/etcd2/ssl/ca.pem, client-cert-auth = true
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: listening for client requests on https://ip-172-19-76-198.sandbox.testwaikato.kiwi:2379
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: stopping listening for client requests on https://ip-172-19-76-198.sandbox.testwaikato.kiwi:2379
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: stopping listening for peers on https://ip-172-19-76-198.sandbox.testwaikato.kiwi:2380
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi etcd2[1265]: couldn't find local name "ip-172-19-76-198.sandbox.testwaikato.kiwi" in the initial cluster configuration
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi systemd[1]: etcd2.service: Main process exited, code=exited, status=1/FAILURE
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi systemd[1]: Failed to start etcd2.
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi systemd[1]: etcd2.service: Unit entered failed state.
Nov 15 20:10:54 ip-172-19-76-198.sandbox.testwaikato.kiwi systemd[1]: etcd2.service: Failed with result 'exit-code'.
running hostname on each etcd server returns the non aws hostname
e.g
ip-172-19-76-198.sandbox.testwaikato.kiwi
if i then go onto my etcd2 servers and change the /etc/systemd/system/etc2.service.d/20-etcd2-aws-cluster.conf file by replacing the %H references with the aws DNS values for the host
e.g
Environment=ETCD_LISTEN_CLIENT_URLS=https://ip-172-19-76-198.ap-southeast-2.compute.internal:2379
Environment=ETCD_ADVERTISE_CLIENT_URLS=https://ip-172-19-76-198.ap-southeast-2.compute.internal:2379
Environment=ETCD_LISTEN_PEER_URLS=https://ip-172-19-76-198.ap-southeast-2.compute.internal:2380
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=https://ip-172-19-76-198.ap-southeast-2.compute.internal:2380
and start the service it works.
It looks to me like the Environment=ETCD_INITIAL_CLUSTER=ip-172-19-76-198.ap-southeast-2.compute.internal=https://ip-172-19-76-198.ap-southeast-2.compute.internal:2380,ip-172-19-77-197.ap-southeast-2.compute.internal=https://ip-172-19-77-197.ap-southeast-2.compute.internal:2380,ip-172-19-76-199.ap-southeast-2.compute.internal=https://ip-172-19-76-199.ap-southeast-2.compute.internal:2380 line is using the aws DNS entries but by having %H in the user data i get ip-172-19-76-199.sandbox.testwaikato.kiwi in my config and even though they both resolve etcd wont start because of this?
So is this a bug or is there someway to set the config to either my local dns names OR aws hostnames in userdata/cloud-config-etcd ?
The text was updated successfully, but these errors were encountered: