-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modifies integration tests to be runnable outside of AWS #667
Conversation
IMAGE: "${{ secrets.CI_AWS_ACCOUNT }}.dkr.ecr.us-west-2.amazonaws.com/amazon/appmesh-controller" | ||
IMAGE_TAG: "${{ github.event.inputs.tag }}" | ||
IMAGE_TAG_AMD: "${{ github.event.inputs.tag }}-linux_amd64" | ||
IMAGE_TAG_ARM: "${{ github.event.inputs.tag }}-linux_arm64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick note on the variables here: before merging, I'd want to standardize on using just the beta account here. I'd like to get the patch mostly approved / reviewed, then create the appropriate roles in the beta account, then move over.
@@ -0,0 +1,78 @@ | |||
name: "Run Integration Test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a more elaborate integration test setup that runs on plain ubuntu github actions. The setup behavior is much more complex than before (or rather - the complexity is explicit, rather than having been done out of sight on ec2 runners), and has been moved to a separate action so as to be callable from multiple locations.
.github/workflows/beta-release.yaml
Outdated
--platform linux/arm64 \ | ||
--build-arg GOPROXY="$GOPROXY" \ | ||
-t "${IMAGE}:${IMAGE_TAG_ARM}" \ | ||
. --push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some earlier logic that did a proper multiarch deploy, but took it out to avoid the already rather broad scope. I can dial back the logic even further if desired to make this look more like the original action.
COPY apis/ ./apis/ | ||
COPY controllers/ ./controllers/ | ||
COPY mocks/ ./mocks/ | ||
COPY webhooks/ ./webhooks/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't strictly related, but I tweaked this dockerfile to be mildly more performant under some circumstances. Particularly: it uses the host build architecture for the building (which leads to fewer total actions in cross-building scenarios), and narrowed the scope of copied files to avoid spurious rebuilds.
// If set, will not enable ipv6 on any of the pods. This is needed to handle | ||
// testing on github actions, where ipv6 is not supported - | ||
// https://github.com/actions/runner-images/issues/668. | ||
DisableIPv6 bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a pretty annoying last minute surprise. I'm not happy with how that turned out - there's a lot of ugly logic in here to force ipv6 off in the integration tests to make it run on github actions. Otherwise, any test involving an actual envoy will hang during the proxyinit step displaying any error.
I'm tempted to add a separate flag to the controller itself to disable ipv6 on envoys. I'm worried that any additional work on the tests will result in future developers hitting the same problem, whereas a hard off switch for ipv6 in the controller itself would be more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both flag or hard off switch are acceptable. Since (I imagine) other customers have hit this ipv6 problem, what do you think about documenting this issue in a readme soemwhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I may introduce a single flag w/ docs in a follow-up patch.
--set "env.AWS_SESSION_TOKEN='$AWS_SESSION_TOKEN'" \ | ||
--set sidecar.envoyAwsAccessKeyId="$AWS_ACCESS_KEY_ID" \ | ||
--set sidecar.envoyAwsSecretAccessKey="$AWS_SECRET_ACCESS_KEY" \ | ||
--set sidecar.envoyAwsSessionToken="$AWS_SESSION_TOKEN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick note on the permissions here:
This forwards the same set of credentials to two different locations: one, to the environment on the controller itself, and two, to the environment on the envoy containers. This could be finer grained - the envoy's need narrower permissions than the controller. This is needed as the runners lack the same AWS setup that let credentials be bootstrapped off the environment / metadata service.
I'm not thrilled by the system I added, and am very open to criticisms / suggestions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks reasonable to me for now. Since you have the credentials as separate here, we can always introduce another role later.
kubectl get pod -n appmesh-system | ||
|
||
echo -n "running integration test type $__test ... " | ||
ginkgo --flakeAttempts=2 -v -r $__test_dir -- \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never like doing this, but I added an auto-retry to tests here w/ the flakeAttempts
flag. At present, the test suite is just not super robust, and this increases the success rate considerably.
3db0a61
to
e24509b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--set "env.AWS_SESSION_TOKEN='$AWS_SESSION_TOKEN'" \ | ||
--set sidecar.envoyAwsAccessKeyId="$AWS_ACCESS_KEY_ID" \ | ||
--set sidecar.envoyAwsSecretAccessKey="$AWS_SECRET_ACCESS_KEY" \ | ||
--set sidecar.envoyAwsSessionToken="$AWS_SESSION_TOKEN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks reasonable to me for now. Since you have the credentials as separate here, we can always introduce another role later.
@@ -60,5 +60,13 @@ ginkgo -v -r test/integration/virtualnode/ -- --cluster-kubeconfig=/Users/xxxx/. | |||
In case of failures, refer to [Troubleshooting](https://github.com/aws/aws-app-mesh-controller-for-k8s/blob/master/docs/guide/troubleshooting.md) guide. | |||
|
|||
|
|||
### integration test suite with kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding instructions here. This will really help out the next person
// If set, will not enable ipv6 on any of the pods. This is needed to handle | ||
// testing on github actions, where ipv6 is not supported - | ||
// https://github.com/actions/runner-images/issues/668. | ||
DisableIPv6 bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both flag or hard off switch are acceptable. Since (I imagine) other customers have hit this ipv6 problem, what do you think about documenting this issue in a readme soemwhere?
vpc_id: | ||
description: "aws vpc id to use for the test" | ||
required: true | ||
account_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are the inputs passed in? Will the account_id and role still be marked as secret?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - all inputs are elided in the same way (I think secrets are hidden based off of just pattern matching, so the origin doesn't really matter). You can see this in the raw output.
This patch reworks the test and CI system for integration tests to make the execution of the tests more platform agnostic. A system to manage and inject credentials is added that makes the tests runnable outside of an AWS context, and substantial adaptations are made to the test setup and runtime to make them run on a pure github action or linux environment.
This patch reworks the test and CI system for integration tests to make the execution of the tests more platform agnostic. A system to manage and inject credentials is added that makes the tests runnable outside of an AWS context, and substantial adaptations are made to the test setup and runtime to make them run on a pure github action or linux environment.
Example integration test run: https://github.com/BennettJames/aws-app-mesh-controller-for-k8s/actions/runs/3756588966/jobs/6382826436
Example beta release run: https://github.com/BennettJames/aws-app-mesh-controller-for-k8s/actions/runs/3760250999/jobs/6391541156
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.