-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Conversation
Image pushed to quay.io/squaremo/flux:issue-539-native-ecr-auth-b76ac67 should anyone wish to try this out ... |
Please don't consider my comments as blockers for the PR, those are just a few thoughts on how we can improve things in the future... |
The question is, if we released this as is, would it limit our ability to improve it? Probably not -- we can always deprecate those arguments, or reinterpret them to mean "just auth this region / these account IDs". But it would be good to get it right the first time, too :-) |
Two more points:
With those in mind, I propose that I change the arguments:
|
39cd52f
to
dd7e7cd
Compare
Image built from the most recent commit at quay.io/squaremo/flux:issue-539-native-ecr-auth-dd7e7cd |
@squaremo I tried the new image but I still get logs like this: |
Thank you!
Yep, this is the problem I was referring to in
The way to shut it up is to mention your own account ID in the argument Having said that, it looks like it's the same account ID causing log noise in eu-central-1 as it is in eu-west-1, so I wonder if I could make the default to ignore that account ID ... |
Yes please ignore it by default this seams to be an aws account of the eks team. |
if len(config.Regions) == 0 { | ||
// this forces the AWS SDK to load config, so we can get the default region | ||
sess := session.Must(session.NewSessionWithOptions(session.Options{ | ||
SharedConfigState: session.SharedConfigEnable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good start, hopefully it will be sufficient long term and we won't need to add too many input variables, as it will always run in a container and won't need to support a host of configuration modes that eksctl
has to support (environment variables, flags and ~/.aws/config
file with profiles and MFA etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to leave space for mounting config files (e.g., by attempting to load config before using the ec2 metadata service). You are right that contexts etc. complicate this :-/ We'll see ..
site/faq.md
Outdated
- Google Container Registry works this way; Flux will | ||
automatically attempt to use platform-provided credentials when | ||
scanning images in GCR. | ||
- (Amazon) Elastic Container Registry has its own authentication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change this to
Amazon Elastic Container Registry (ECR) has its own authentication;
Flux will attempt to scan for images in ECR, if EC2 instance it is running
on has access to ECR.
Which also suggest that we might want to consider to support for ECR outside of EC2, but I don't think kubelet supports that kind of mode yet (IIRC, it was mentioned in the comment in kubelet code linked from here).
As far as EC2 itself is concerned, there is a possibility that user may wish to avoid using instance roles and pass credentials to flux directly (but we should check if kubelet supports that also). The reason someone may wish to do that is security, i.e. for the lack of real IAM role enforcement for pods, someone may prefer to disallow access to ECR for any but the chose process in their cluster. One can use separate sets of nodes to isolate privileged/unprivileged pods, but that won't work in the case, as kubelets will have to fetch images on all nodes. However, it's worse noting that fine-grain IAM roles can be created to avoid the possibility of certain images being fetched on certain nodes.
Just a few thoughts (well, more of a brain dump on ECR and IAM), perhaps we could open an issue for now, if it's worse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be possible (though ungainly) to use credentials on the host to grant kubelet access to ECR when not running in EC2. And, I think it'll be possible (with this PR) to make credentials available to flux too, by providing a ~/.aws/credentials
.
But I'd expect this to be well off the beaten path. I agree that it's important to set out the caveats early -- I'll try to rephrase.
site/faq.md
Outdated
service accounts, platform-provided credentials on GCP or AWS, and | ||
a Docker config file if you mount one into the fluxd container (see | ||
the [command-line usage](./daemon.md)). | ||
- When using images in ECR, from AWS, the IAM account used to run the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rephrase as:
- When using images in AWS ECR, the EC2 instance role of the node
fluxd container runs on must have permissions to query the ECR registry
or registries in question. In turns, this is a requirement for kubelet also,
so if your cluster can run containers from ECR, Flux should work too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it definitely the EC2 instance role that decides whether you have access? I confess I skipped lightly over the question of how the container gets any IAM permissions at all (and Just Made It Work TM). I'm happy to be more specific about the mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments, hopefully helpful.
This takes the essential workings of #1455 and creates a helper for using AWS authorization with ECR (the AWS container registry).
Instead of asking for the AWS region and set of registry IDs (really account IDs), take these from the registry URLs as they come up. This means we have to make a token request for each region; so, keep track of the expiry, per region. And so we don't keep asking when there's a failure, embargo requests for a region (for ten minutes) whenever we fail to get a token. The arguments `--aws-region` and `--aws-registry-id` are now ignored, since the values are taken from the registry URLs; but, the plumbing for those values is left in, because the arguments are useful with an altered meaning of "only these regions/accountIDs". This change will be made in a subsequent commit.
After the last commit, which takes the region and account ID from the registry URL, this commit repurposes the arguments --aws-region and (renamed) --aws-account-id to be "allow lists". They differ in the default policy: if no account IDs are given, all accounts IDs are allowed; whereas, if no region is given, the allowed regions is taken to be the "cluster region". What is the cluster region? It's the region as detected from configuration (so it can be overridden), or as given by the EC2 metadata service, as a last resort. The rationale for always restricting the region is that using images from a different region will incurs costs, so it's better to encourage people to be deliberate about it. - [ ] does it still work with minikube (or does it fail at creating wrapper)
This removes a duplicate label from the AWS-specific logger.
dd7e7cd
to
4c156aa
Compare
This introduces the flag `--registry-ecr-exclude-id`, and renames `--registry-ecr-account-id` to `--registry-ecr-include-id` for consistency. Oddly, `--registry-ecr-exclude-id` is most useful when you don't supply it, or its "include" sibling, since then fluxd will scan everything except the registry used for EKS system images (which will always fail, so far as I can determine).
Image that works as described in the newly-edited description: quay.io/squaremo/flux:issue-539-native-ecr-auth-1b5a3f2 This build
|
@errordeveloper I've tweaked the wording in the FAQ (two places) -- is it any better, do you think? |
Many thanks @errordeveloper and @hiddeco ! |
awesome sauce! |
just waiting for a new release... |
…d): elaborate on how to enable autologin for ECR fluxcd/flux#1619 fluxcd/flux2#107 fluxcd#11 fluxcd#139 fluxcd#147 fluxcd#174 fluxcd#179 fluxcd#180 fluxcd/website#601 fluxcd#193 fluxcd/website#611 https://fluxcd.io/docs/guides/image-update/#aws-elastic-container-registry Signed-off-by: lloydchang <lloydchang@gmail.com>
…d): elaborate on how to enable autologin for ECR fluxcd/flux#1619 fluxcd/flux2#107 fluxcd#11 fluxcd#139 fluxcd#147 fluxcd#174 fluxcd#179 fluxcd#180 fluxcd/website#601 fluxcd#193 fluxcd/website#611 https://fluxcd.io/docs/guides/image-update/#aws-elastic-container-registry Signed-off-by: lloydchang <lloydchang@gmail.com>
…d): elaborate on how to enable autologin for ECR fluxcd/flux#1619 fluxcd/flux2#107 fluxcd#11 fluxcd#139 fluxcd#147 fluxcd#174 fluxcd#179 fluxcd#180 fluxcd/website#601 fluxcd#193 fluxcd/website#611 https://fluxcd.io/docs/guides/image-update/#aws-elastic-container-registry Signed-off-by: lloydchang <lloydchang@gmail.com> Signed-off-by: Kingdon Barrett <kingdon@weave.works>
…d): elaborate on how to enable autologin for ECR fluxcd/flux#1619 fluxcd/flux2#107 fluxcd#11 fluxcd#139 fluxcd#147 fluxcd#174 fluxcd#179 fluxcd#180 fluxcd/website#601 fluxcd#193 fluxcd/website#611 https://fluxcd.io/docs/guides/image-update/#aws-elastic-container-registry Signed-off-by: lloydchang <lloydchang@gmail.com> Signed-off-by: Kingdon Barrett <kingdon@weave.works>
Taking inspiration (and code) from #1455, this PR builds support for fetching authorization tokens into the registry credentials code.
The design is simple: it's a wrapper for ImageCreds which keeps an authentication token, refreshing when it needs to via the AWS API. This is pretty similar to
ImageCredsWithDefaults
, and is composed with that in fluxd's main.go. The arguments--registry-ecr-region
,--registry-ecr-include-id
and--registry-ecr-exclude-id
control which images it will scan:None of the arguments are necessary; the default will be "scan images from any registry in the cluster region, except the EKS system images".
If fluxd is not running on EC2 (or hasn't been rigged so that it can authenticate as an AWS user, to access ECR), the wrapper will fail to be initialised, a message will be logged, and image scanning will proceed as before.
Fixes #539.
(EDIT: update to reflect latest changes)