Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

component: Add linkerd #690

Merged
merged 6 commits into from
Aug 27, 2020
Merged

component: Add linkerd #690

merged 6 commits into from
Aug 27, 2020

Conversation

surajssd
Copy link
Member

@surajssd surajssd commented Jul 1, 2020

To test manually create following config in your lokocfg file:

component "experimental-linkerd" {}

TODOs:

  • This installs lot of stuff like grafana, prometheus which can be disabled and settings done so that we can use prometheus operator. It is not possible right now to decouple the prometheus shipped with linkerd but there is ongoing work on this: Move Prometheus as an Add-On linkerd/linkerd2#4362 Once that is merged and released we can think about it.
  • Use the values-ha.yaml compared to the normal values.yaml file.
  • I think the way currently certs are generated on every apply new certs will be created, see if it affects the general workings of the workloads.
  • For testing random patch/dev versions we need to have a way to specify which version of linkerd to deploy.

Tests:

  • Add tests to verify things with and without linkerd installation.

@surajssd surajssd force-pushed the surajssd/add-linkerd2 branch 5 times, most recently from c55f778 to 74c2647 Compare July 3, 2020 08:49
@surajssd surajssd marked this pull request as ready for review July 3, 2020 08:52
@invidian
Copy link
Member

invidian commented Jul 3, 2020

It seems the tests are failing.

@surajssd
Copy link
Member Author

surajssd commented Jul 3, 2020

It seems the tests are failing.

My hypothesis is that linkerd installation is not complete and we try to create pods and they fail on some webhook validation. There is no retry logic in pod creation.

assets/components/linkerd2/values.yaml Outdated Show resolved Hide resolved
pkg/components/linkerd/manifest.go Outdated Show resolved Hide resolved
pkg/components/linkerd/manifest.go Show resolved Hide resolved
pkg/components/linkerd/manifest.go Show resolved Hide resolved
pkg/components/linkerd/manifest.go Show resolved Hide resolved
pkg/components/linkerd/manifest.go Show resolved Hide resolved
test/components/linkerd/linkerd_test.go Outdated Show resolved Hide resolved
test/components/linkerd/linkerd_test.go Outdated Show resolved Hide resolved
ci/aks/aks-cluster.lokocfg.envsubst Outdated Show resolved Hide resolved
pkg/components/linkerd/component.go Outdated Show resolved Hide resolved
@surajssd
Copy link
Member Author

surajssd commented Jul 3, 2020

Now CI tasks fail with following error:

+ echo 'Sleeping for 30 seconds. Waiting for external-dns to clear DNS records.'
Sleeping for 30 seconds. Waiting for external-dns to clear DNS records.
+ sleep 30
+ '[' 2 = 0 ']'

@iaguis
Copy link
Contributor

iaguis commented Jul 3, 2020

The errors seem to be further up:

--- FAIL: TestNoExtraSSHKeysOnNodes (300.11s)

--- FAIL: TestFileCreatedByCLCSnippetExistsOnNodes (300.11s)

@surajssd surajssd force-pushed the surajssd/add-linkerd2 branch 7 times, most recently from b5b7cf9 to 53fb95e Compare July 7, 2020 09:33
@surajssd
Copy link
Member Author

surajssd commented Jul 7, 2020

On the CI the linkerd installation is stuck and following happens:

Events from the linkerd namespace:

linkerd       14m         Warning   FailedCreate              replicaset/linkerd-controller-7c7d9bf56              Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-controller                        Scaled up replica set linkerd-controller-7c7d9bf56 to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-destination-84b94d4df5            Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-destination                       Scaled up replica set linkerd-destination-84b94d4df5 to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-grafana-8648658f4b                Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-grafana                           Scaled up replica set linkerd-grafana-8648658f4b to 1
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-identity-6c7b88bcf8               Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-identity                          Scaled up replica set linkerd-identity-6c7b88bcf8 to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-prometheus-748b48c8c9             Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-prometheus                        Scaled up replica set linkerd-prometheus-748b48c8c9 to 1
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-proxy-injector-6b979484f          Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-proxy-injector                    Scaled up replica set linkerd-proxy-injector-6b979484f to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-sp-validator-6896bc59c4           Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-sp-validator                      Scaled up replica set linkerd-sp-validator-6896bc59c4 to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-tap-8f79d6c5c                     Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-tap                               Scaled up replica set linkerd-tap-8f79d6c5c to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-web-8558657b69                    Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-web                               Scaled up replica set linkerd-web-8558657b6  

These are the apiserver errors:

I0707 11:56:57.236263       1 trace.go:116] Trace[1183028621]: "Create" url:/api/v1/namespaces/linkerd/pods,user-agent:kube-controller-manager/v1.18.3 (linux/amd64) kubernetes/2e7996e/system:serviceaccount:kube-system:replicaset-controller,client:10.0.4.35 (started: 2020-07-07 11:56:56.213473697 +0000 UTC m=+2383.280605006) (total time: 1.02276867s):
I0707 11:56:57.363868       1 trace.go:116] Trace[1840209148]: "Call mutating webhook" configuration:linkerd-proxy-injector-webhook-config,webhook:linkerd-proxy-injector.linkerd.io,resource:/v1, Resource=pods,subresource:,operation:CREATE,UID:301278b9-e5bc-411b-a77f-46c1718a9a51 (started: 2020-07-07 11:56:56.344687474 +0000 UTC m=+2383.411818776) (total time: 1.019130746s):
W0707 11:56:57.363946       1 dispatcher.go:181] Failed calling webhook, failing closed linkerd-proxy-injector.linkerd.io: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused

@surajssd
Copy link
Member Author

surajssd commented Jul 7, 2020

I was able to recreate this error on HA AWS locally. It is not a problem on any form of Packet.

@surajssd surajssd force-pushed the surajssd/add-linkerd2 branch from 53fb95e to c830dfd Compare July 7, 2020 14:18
@surajssd
Copy link
Member Author

surajssd commented Jul 7, 2020

Now it fails with following error:

        linkerd-existence
        -----------------
        √ 'linkerd-config' config map exists
        √ heartbeat ServiceAccount exist
        √ control plane replica sets are ready
        × no unschedulable pods
            linkerd-identity-76dd9cc969-9tdt4: 0/4 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
            see https://linkerd.io/checks/#l5d-existence-unschedulable-pods for hints

@surajssd surajssd force-pushed the surajssd/add-linkerd2 branch 3 times, most recently from 3fece05 to ed9ff4a Compare July 10, 2020 10:37
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits, otherwise LGTM

pkg/components/linkerd/component.go Outdated Show resolved Hide resolved
pkg/components/linkerd/component.go Outdated Show resolved Hide resolved
pkg/components/linkerd/component.go Outdated Show resolved Hide resolved
pkg/components/linkerd/component.go Outdated Show resolved Hide resolved
@surajssd surajssd force-pushed the surajssd/add-linkerd2 branch 2 times, most recently from 79f9f82 to 1baa756 Compare August 19, 2020 12:20
pkg/components/linkerd/component.go Outdated Show resolved Hide resolved
@surajssd surajssd force-pushed the surajssd/add-linkerd2 branch 2 times, most recently from 8f80a0c to e978585 Compare August 24, 2020 14:02
@surajssd surajssd requested a review from invidian August 25, 2020 11:08
@surajssd
Copy link
Member Author

PTAL: @iaguis

invidian
invidian previously approved these changes Aug 25, 2020
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except I'd still change generateCertificates to not have a receiver and simply returns cert {} struct.

invidian
invidian previously approved these changes Aug 26, 2020
@surajssd surajssd requested a review from knrt10 August 26, 2020 13:50
This commit adds assets for linkerd2 chart at version `stable-2.8.1`.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
- Use cert creation API from linkerd.

- Vendor `linkerd` libraries needed for cert generation.

- Add content from `values-ha.yaml` file to the `manifest.go` to add
support for HA installation. In the HA mode user can specify the number
of controllers that should be created using a variable
`controller_replicas`.

**Caution**: If the number of linkerd controller count is more than the
available worker nodes then some pods may stay in `Pending` state.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
In AWS install non-HA setup. CI on AWS currently does not have enough
resources to run multiple replicas of the linkerd pods.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
- Add test to verify if the deployments are up and running.
- Add test to run `linkerd check` against the apiserver.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
If pod creation fails for some reason in first attempt then try again.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
- Adds ServiceMonitors to monitor linkerd control plane pods.

- Adds entry in the components prometheus metrics list to test if the
linkerd metrics are being scraped.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
Copy link
Member

@knrt10 knrt10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @surajssd 🎉 .

@knrt10 knrt10 requested a review from invidian August 26, 2020 14:30
@surajssd
Copy link
Member Author

Tests pass 🎉 Thanks for your patient review @invidian and @knrt10.

@surajssd surajssd merged commit df92e35 into master Aug 27, 2020
@surajssd surajssd deleted the surajssd/add-linkerd2 branch August 27, 2020 06:57
@invidian invidian mentioned this pull request Aug 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants