-
Notifications
You must be signed in to change notification settings - Fork 49
Conversation
c55f778
to
74c2647
Compare
It seems the tests are failing. |
My hypothesis is that linkerd installation is not complete and we try to create pods and they fail on some webhook validation. There is no retry logic in pod creation. |
Now CI tasks fail with following error:
|
The errors seem to be further up:
|
b5b7cf9
to
53fb95e
Compare
On the CI the linkerd installation is stuck and following happens: Events from the linkerd 14m Warning FailedCreate replicaset/linkerd-controller-7c7d9bf56 Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd 36m Normal ScalingReplicaSet deployment/linkerd-controller Scaled up replica set linkerd-controller-7c7d9bf56 to 2
linkerd 14m Warning FailedCreate replicaset/linkerd-destination-84b94d4df5 Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd 36m Normal ScalingReplicaSet deployment/linkerd-destination Scaled up replica set linkerd-destination-84b94d4df5 to 2
linkerd 14m Warning FailedCreate replicaset/linkerd-grafana-8648658f4b Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd 36m Normal ScalingReplicaSet deployment/linkerd-grafana Scaled up replica set linkerd-grafana-8648658f4b to 1
linkerd 14m Warning FailedCreate replicaset/linkerd-identity-6c7b88bcf8 Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd 36m Normal ScalingReplicaSet deployment/linkerd-identity Scaled up replica set linkerd-identity-6c7b88bcf8 to 2
linkerd 14m Warning FailedCreate replicaset/linkerd-prometheus-748b48c8c9 Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd 36m Normal ScalingReplicaSet deployment/linkerd-prometheus Scaled up replica set linkerd-prometheus-748b48c8c9 to 1
linkerd 14m Warning FailedCreate replicaset/linkerd-proxy-injector-6b979484f Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd 36m Normal ScalingReplicaSet deployment/linkerd-proxy-injector Scaled up replica set linkerd-proxy-injector-6b979484f to 2
linkerd 14m Warning FailedCreate replicaset/linkerd-sp-validator-6896bc59c4 Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd 36m Normal ScalingReplicaSet deployment/linkerd-sp-validator Scaled up replica set linkerd-sp-validator-6896bc59c4 to 2
linkerd 14m Warning FailedCreate replicaset/linkerd-tap-8f79d6c5c Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd 36m Normal ScalingReplicaSet deployment/linkerd-tap Scaled up replica set linkerd-tap-8f79d6c5c to 2
linkerd 14m Warning FailedCreate replicaset/linkerd-web-8558657b69 Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd 36m Normal ScalingReplicaSet deployment/linkerd-web Scaled up replica set linkerd-web-8558657b6 These are the apiserver errors: I0707 11:56:57.236263 1 trace.go:116] Trace[1183028621]: "Create" url:/api/v1/namespaces/linkerd/pods,user-agent:kube-controller-manager/v1.18.3 (linux/amd64) kubernetes/2e7996e/system:serviceaccount:kube-system:replicaset-controller,client:10.0.4.35 (started: 2020-07-07 11:56:56.213473697 +0000 UTC m=+2383.280605006) (total time: 1.02276867s):
I0707 11:56:57.363868 1 trace.go:116] Trace[1840209148]: "Call mutating webhook" configuration:linkerd-proxy-injector-webhook-config,webhook:linkerd-proxy-injector.linkerd.io,resource:/v1, Resource=pods,subresource:,operation:CREATE,UID:301278b9-e5bc-411b-a77f-46c1718a9a51 (started: 2020-07-07 11:56:56.344687474 +0000 UTC m=+2383.411818776) (total time: 1.019130746s):
W0707 11:56:57.363946 1 dispatcher.go:181] Failed calling webhook, failing closed linkerd-proxy-injector.linkerd.io: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused |
I was able to recreate this error on HA AWS locally. It is not a problem on any form of Packet. |
53fb95e
to
c830dfd
Compare
Now it fails with following error: linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
× no unschedulable pods
linkerd-identity-76dd9cc969-9tdt4: 0/4 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
see https://linkerd.io/checks/#l5d-existence-unschedulable-pods for hints |
3fece05
to
ed9ff4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nits, otherwise LGTM
79f9f82
to
1baa756
Compare
8f80a0c
to
e978585
Compare
PTAL: @iaguis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except I'd still change generateCertificates
to not have a receiver and simply returns cert {}
struct.
e978585
to
383cef7
Compare
This commit adds assets for linkerd2 chart at version `stable-2.8.1`. Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
- Use cert creation API from linkerd. - Vendor `linkerd` libraries needed for cert generation. - Add content from `values-ha.yaml` file to the `manifest.go` to add support for HA installation. In the HA mode user can specify the number of controllers that should be created using a variable `controller_replicas`. **Caution**: If the number of linkerd controller count is more than the available worker nodes then some pods may stay in `Pending` state. Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
In AWS install non-HA setup. CI on AWS currently does not have enough resources to run multiple replicas of the linkerd pods. Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
- Add test to verify if the deployments are up and running. - Add test to run `linkerd check` against the apiserver. Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
If pod creation fails for some reason in first attempt then try again. Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
- Adds ServiceMonitors to monitor linkerd control plane pods. - Adds entry in the components prometheus metrics list to test if the linkerd metrics are being scraped. Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
383cef7
to
4bc0d8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @surajssd 🎉 .
To test manually create following config in your
lokocfg
file:TODOs:
This installs lot of stuff like grafana, prometheus which can be disabled and settings done so that we can use prometheus operator.It is not possible right now to decouple the prometheus shipped with linkerd but there is ongoing work on this: Move Prometheus as an Add-On linkerd/linkerd2#4362 Once that is merged and released we can think about it.For testing random patch/dev versions we need to have a way to specify which version of linkerd to deploy.Tests: