Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifies integration tests to be runnable outside of AWS #667

Merged
merged 1 commit into from
Dec 23, 2022

Conversation

BennettJames
Copy link
Contributor

@BennettJames BennettJames commented Dec 22, 2022

This patch reworks the test and CI system for integration tests to make the execution of the tests more platform agnostic. A system to manage and inject credentials is added that makes the tests runnable outside of an AWS context, and substantial adaptations are made to the test setup and runtime to make them run on a pure github action or linux environment.

Example integration test run: https://github.com/BennettJames/aws-app-mesh-controller-for-k8s/actions/runs/3756588966/jobs/6382826436
Example beta release run: https://github.com/BennettJames/aws-app-mesh-controller-for-k8s/actions/runs/3760250999/jobs/6391541156

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

IMAGE: "${{ secrets.CI_AWS_ACCOUNT }}.dkr.ecr.us-west-2.amazonaws.com/amazon/appmesh-controller"
IMAGE_TAG: "${{ github.event.inputs.tag }}"
IMAGE_TAG_AMD: "${{ github.event.inputs.tag }}-linux_amd64"
IMAGE_TAG_ARM: "${{ github.event.inputs.tag }}-linux_arm64"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note on the variables here: before merging, I'd want to standardize on using just the beta account here. I'd like to get the patch mostly approved / reviewed, then create the appropriate roles in the beta account, then move over.

@@ -0,0 +1,78 @@
name: "Run Integration Test"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a more elaborate integration test setup that runs on plain ubuntu github actions. The setup behavior is much more complex than before (or rather - the complexity is explicit, rather than having been done out of sight on ec2 runners), and has been moved to a separate action so as to be callable from multiple locations.

--platform linux/arm64 \
--build-arg GOPROXY="$GOPROXY" \
-t "${IMAGE}:${IMAGE_TAG_ARM}" \
. --push
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some earlier logic that did a proper multiarch deploy, but took it out to avoid the already rather broad scope. I can dial back the logic even further if desired to make this look more like the original action.

COPY apis/ ./apis/
COPY controllers/ ./controllers/
COPY mocks/ ./mocks/
COPY webhooks/ ./webhooks/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't strictly related, but I tweaked this dockerfile to be mildly more performant under some circumstances. Particularly: it uses the host build architecture for the building (which leads to fewer total actions in cross-building scenarios), and narrowed the scope of copied files to avoid spurious rebuilds.

// If set, will not enable ipv6 on any of the pods. This is needed to handle
// testing on github actions, where ipv6 is not supported -
// https://github.com/actions/runner-images/issues/668.
DisableIPv6 bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a pretty annoying last minute surprise. I'm not happy with how that turned out - there's a lot of ugly logic in here to force ipv6 off in the integration tests to make it run on github actions. Otherwise, any test involving an actual envoy will hang during the proxyinit step displaying any error.

I'm tempted to add a separate flag to the controller itself to disable ipv6 on envoys. I'm worried that any additional work on the tests will result in future developers hitting the same problem, whereas a hard off switch for ipv6 in the controller itself would be more robust.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both flag or hard off switch are acceptable. Since (I imagine) other customers have hit this ipv6 problem, what do you think about documenting this issue in a readme soemwhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I may introduce a single flag w/ docs in a follow-up patch.

--set "env.AWS_SESSION_TOKEN='$AWS_SESSION_TOKEN'" \
--set sidecar.envoyAwsAccessKeyId="$AWS_ACCESS_KEY_ID" \
--set sidecar.envoyAwsSecretAccessKey="$AWS_SECRET_ACCESS_KEY" \
--set sidecar.envoyAwsSessionToken="$AWS_SESSION_TOKEN"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note on the permissions here:

This forwards the same set of credentials to two different locations: one, to the environment on the controller itself, and two, to the environment on the envoy containers. This could be finer grained - the envoy's need narrower permissions than the controller. This is needed as the runners lack the same AWS setup that let credentials be bootstrapped off the environment / metadata service.

I'm not thrilled by the system I added, and am very open to criticisms / suggestions here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks reasonable to me for now. Since you have the credentials as separate here, we can always introduce another role later.

kubectl get pod -n appmesh-system

echo -n "running integration test type $__test ... "
ginkgo --flakeAttempts=2 -v -r $__test_dir -- \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never like doing this, but I added an auto-retry to tests here w/ the flakeAttempts flag. At present, the test suite is just not super robust, and this increases the success rate considerably.

@BennettJames BennettJames force-pushed the master branch 2 times, most recently from 3db0a61 to e24509b Compare December 22, 2022 19:38
Copy link
Contributor

@bendu bendu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really happy to see this PR. Nice work!

amazing

--set "env.AWS_SESSION_TOKEN='$AWS_SESSION_TOKEN'" \
--set sidecar.envoyAwsAccessKeyId="$AWS_ACCESS_KEY_ID" \
--set sidecar.envoyAwsSecretAccessKey="$AWS_SECRET_ACCESS_KEY" \
--set sidecar.envoyAwsSessionToken="$AWS_SESSION_TOKEN"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks reasonable to me for now. Since you have the credentials as separate here, we can always introduce another role later.

@@ -60,5 +60,13 @@ ginkgo -v -r test/integration/virtualnode/ -- --cluster-kubeconfig=/Users/xxxx/.
In case of failures, refer to [Troubleshooting](https://github.com/aws/aws-app-mesh-controller-for-k8s/blob/master/docs/guide/troubleshooting.md) guide.


### integration test suite with kind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding instructions here. This will really help out the next person

// If set, will not enable ipv6 on any of the pods. This is needed to handle
// testing on github actions, where ipv6 is not supported -
// https://github.com/actions/runner-images/issues/668.
DisableIPv6 bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both flag or hard off switch are acceptable. Since (I imagine) other customers have hit this ipv6 problem, what do you think about documenting this issue in a readme soemwhere?

vpc_id:
description: "aws vpc id to use for the test"
required: true
account_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are the inputs passed in? Will the account_id and role still be marked as secret?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - all inputs are elided in the same way (I think secrets are hidden based off of just pattern matching, so the origin doesn't really matter). You can see this in the raw output.

@bendu bendu requested a review from sshver December 22, 2022 21:26
This patch reworks the test and CI system for integration tests to make the
execution of the tests more platform agnostic. A system to manage and inject
credentials is added that makes the tests runnable outside of an AWS context,
and substantial adaptations are made to the test setup and runtime to
make them run on a pure github action or linux environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants