-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get credentials from GCP/Azure when needed #194
Conversation
This is a note for later but I think it would be better to get these implementations out of the controller, maybe moved to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, @somtochiama 👍
Added some comments regarding a few minor improvements.
internal/azure/exchanger.go
Outdated
req, err := http.NewRequest("POST", exchangeUrl, strings.NewReader(parameters.Encode())) | ||
if err != nil { | ||
return "", fmt.Errorf("error constructing exchange request: %w", err) | ||
} | ||
|
||
req.Header.Add("Content-Type", "application/x-www-form-urlencoded") | ||
req.Header.Add("Content-Length", strconv.Itoa(len(parameters.Encode()))) | ||
|
||
client := &http.Client{} | ||
resp, err := client.Do(req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a helper method that can be used for form POSTing in http
. Here's the short variant of all of this 😄
req, err := http.NewRequest("POST", exchangeUrl, strings.NewReader(parameters.Encode())) | |
if err != nil { | |
return "", fmt.Errorf("error constructing exchange request: %w", err) | |
} | |
req.Header.Add("Content-Type", "application/x-www-form-urlencoded") | |
req.Header.Add("Content-Length", strconv.Itoa(len(parameters.Encode()))) | |
client := &http.Client{} | |
resp, err := client.Do(req) | |
resp, err := http.PostForm(exchangeUrl, parameters) |
internal/azure/exchanger.go
Outdated
} | ||
|
||
type Exchanger struct { | ||
scheme string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem like something that would ever be changed or overridden, should be safe to hardcode this as https
in the resulting URL.
var authConfig authn.AuthConfig | ||
gcpDefaultTokenURL := "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" | ||
|
||
request, err := http.NewRequest("GET", gcpDefaultTokenURL, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use the const here
request, err := http.NewRequest("GET", gcpDefaultTokenURL, nil) | |
request, err := http.NewRequest(http.MethodGet, gcpDefaultTokenURL, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left just a few more comments.
internal/azure/exchanger.go
Outdated
return "", fmt.Errorf("failed to send token exchnage request: %w", err) | ||
} | ||
|
||
if resp.StatusCode != 200 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if resp.StatusCode != 200 { | |
if resp.StatusCode != http.StatusOK { |
internal/azure/exchanger.go
Outdated
} | ||
|
||
if resp.StatusCode != 200 { | ||
responseBytes, err := ioutil.ReadAll(resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a good idea to not use ioutil.ReadAll
here, we don't know how large the response body can be and this can cause memory issues. See my comment below.
Do we know what type of responses we get? If it's JSON maybe we can extract only the relevant bits for the error message. If not, maybe try to only read at a maximum just a few bytes worth of data.
internal/azure/exchanger.go
Outdated
responseBytes, err := ioutil.ReadAll(resp.Body) | ||
if err != nil { | ||
return "", fmt.Errorf("failed to read response body: %w", err) | ||
} | ||
|
||
var tokenResp tokenResponse | ||
err = json.Unmarshal(responseBytes, &tokenResp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ioutil.ReadAll
loads everything into memory and it should be avoided whenever possible. Using the json Decoder would be more efficient.
responseBytes, err := ioutil.ReadAll(resp.Body) | |
if err != nil { | |
return "", fmt.Errorf("failed to read response body: %w", err) | |
} | |
var tokenResp tokenResponse | |
err = json.Unmarshal(responseBytes, &tokenResp) | |
var tokenResp tokenResponse | |
decoder := json.NewDecoder(resp.Body) | |
if err := decoder.Decode(&tokenResp); err != nil { | |
return "", err | |
} |
internal/azure/exchanger.go
Outdated
} | ||
|
||
func (e *Exchanger) ExchangeACRAccessToken(armToken string) (string, error) { | ||
exchangeUrl := fmt.Sprintf("%s://%s/oauth2/exchange", "https", e.acrFQDN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exchangeUrl := fmt.Sprintf("%s://%s/oauth2/exchange", "https", e.acrFQDN) | |
exchangeUrl := fmt.Sprintf("https://%s/oauth2/exchange", e.acrFQDN) |
I've done some testing against the image
and on GCP:
The testing involved was to follow basically the first half of the Image Update guide: create an ImageRepo and ImagePolicy, pointed at the private repository. Remove the flag to confirm that access does not happen without the newly added feature. When the flag is present, there should be no secretRef or other access management required, it should "just work." I created the Azure cluster in an existing Resource Group through the portal with a "System-Managed Identity" and associated the cluster with a Container Registry on the same RG. I used Azure CNI and Calico policy enforcement, which probably doesn't matter, (but just to be complete) with Kubernetes 1.21 under the hood. The image repo created was definitely private. I used the Quickstart guide from Azure docs to "build and push" some example Podinfo images https://docs.microsoft.com/en-us/azure/container-registry/container-registry-quickstart-task-cli The Dockerfiles for example are in here: https://github.com/kingdonb/bootstrap-repo/tree/main/apps/base/podinfo - they are very sparse. I just needed two images so I could confirm they were both discovered successfully when the ImageRepo is reconciled. I did the same test on GCR basically, creating a GKE cluster with Workload Identity enabled, then building the same images from the tutorial for Google Cloud Build. This testing method made it possible for me to avoid using my local network or any local build nodes 👍 I placed the images in a new repository in the same https://cloud.google.com/build/docs/building/build-containers The Workload Identity automatically authorized the project repository in the same workspace (project) and as I was able to push images with the gcloud client, that were then discovered without issue by the image-reflector-controller, with a set of manifests in and centered around this one: https://github.com/kingdonb/bootstrap-repo/blob/main/apps/production/podinfo/imagerepo.yaml After confirming these both worked successfully with the flags: https://github.com/kingdonb/bootstrap-repo/blob/75fc7f8822f1850d4046700ce4281c39ac5e4bfd/clusters/gke-geekingdon/flux-system/kustomization.yaml#L16 I'm happy to report the feature works without issue on both of these! Removing the flag made for an authentication error that looked reasonable, (I didn't capture it.) I'll be repeating the testing later today, in case there are any questions or issues in the critical path that other reviewers wanted to make sure are properly addressed. Please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you @somtochiama! 💯
Could you please also resolve the conflicts? |
I have some surprising results after testing further on GKE, I created a new cluster and enabled Workload Identity, but the reflector controller can no longer reach the auth, or something else has gone wrong:
This while the images can be deployed in the cluster, so apparently the project association is correct, something else has gone wrong. I suspect this is probably a configuration error, although I'm not sure what can have gone wrong, I have the flag enabled and my image is at:
but the registry cannot be scanned for ImageRepository resources. Is there somewhere else to check for debugging errors? |
@somtochiama sent me some docs about Workload Identity that I hadn't seen before. It looks like my previous test was not using workload identity, but the default service account instead. I will go back and repeat it after I work out my Workload Identity issues to confirm. Anyway just to add to this report, my configuration issue is definitely a configuration issue. |
24233a1
to
59cea8a
Compare
59cea8a
to
cffd1ff
Compare
cffd1ff
to
d5255a5
Compare
I am adding some notes here for testing this on GCP. I will write more detailed documentation next and make a commit.
Example of
|
The bootstrap docs along with #193 should be placed in https://fluxcd.io/docs/guides/image-update/#imagerepository-cloud-providers-authentication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work @somtochiama, thank you. I've made a few suggestions and raised some questions -- nothing drastic, so I've approved this PR.
(The first commit seems to have a duplicate sign-off -- something to amend if you end up doing further work here.)
}, nil | ||
} | ||
|
||
// List from https://github.com/kubernetes/kubernetes/blob/master/pkg/credentialprovider/azure/azure_credentials.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to link to the code at a specific revision, or (more readably) a release tag, otherwise it can change out from under you.
ClientCert = "certFile" | ||
ClientKey = "keyFile" | ||
CACert = "caFile" | ||
azureCloudConfigJsonFile = "/etc/kubernetes/azure.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This const isn't mentioned in the rest of the diff -- perhaps it's a holdover from the "mount the file from the host" code?
if err != nil { | ||
return authConfig, err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided Do
didn't return an error, you should Close()
the response body, and probably read it to the end too (see the comment in https://pkg.go.dev/net/http#Client.Do). You can do io.Copy(io.Discard, response.Body)
(or perhaps better, io.Copy(io.Discard, &io.LimitedReader{R: response.Body, N: maxResponseLength})
for some maxResponseLength
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should really do this in all cases other than getting an error back from Do
. As it stands, the body will only be read to the end and closed if the status was OK and a JSON value could be decoded; but, it's entirely possible, e.g., that a 40x status would have body content.
It's a pain I know, because you are forced to either repeat code or defer (and throw away errors) -- since no (other) I/O happens after the request, I think using defer
is fine.
|
||
// getAzureLoginAuth returns authentication for ACR. The details needed for authentication | ||
// are gotten from environment variable so there is not need to mount a host path. | ||
func getAzureLoginAuth(ref name.Reference) (authn.AuthConfig, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be wise to pass the context from Reconcile through to this func, for it to use in requests (and likewise for the GCR -- and ECR, for that matter). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too.
|
||
For [<abbr title="Elastic Kubernetes Service">EKS</abbr>][EKS] and [<abbr title="Elastic Container Registry">ECR</abbr>][ECR], | ||
the flag is `--aws-autologin-for-ecr`. | ||
For [<abbr title="Googke Kubernetes Enginer">GKE</abbr>][GKE] and [<abbr title="Google Container Registry">GCR</abbr>][GCR], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For [<abbr title="Googke Kubernetes Enginer">GKE</abbr>][GKE] and [<abbr title="Google Container Registry">GCR</abbr>][GCR], | |
For [<abbr title="Google Kubernetes Engine">GKE</abbr>][GKE] and [<abbr title="Google Container Registry">GCR</abbr>][GCR], |
<3 for using <abbr>
consistently!
the flag is `--aws-autologin-for-ecr`. | ||
For [<abbr title="Googke Kubernetes Enginer">GKE</abbr>][GKE] and [<abbr title="Google Container Registry">GCR</abbr>][GCR], | ||
the flag is `--gcp-autologin-for-gcr`. | ||
For [<abbr title="Azure Kubernetes Service">AKS</abbr>][AKS] and [<abbr title="Azure Container Registry">EKS</abbr>][ACR], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For [<abbr title="Azure Kubernetes Service">AKS</abbr>][AKS] and [<abbr title="Azure Container Registry">EKS</abbr>][ACR], | |
For [<abbr title="Azure Kubernetes Service">AKS</abbr>][AKS] and [<abbr title="Azure Container Registry">ACR</abbr>][ACR], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So .. many .. TLAs ... 😄
(to be clear, I'm commenting on Tech, not on the docs here)
It's noted elsewhere, but following up that I have tested GKE Workload Identity to connect with gcr.io registries successfully However I was not able to get artifact registry working through workload identity, even after separately granting the role for
I guess the culprit is here: https://github.com/fluxcd/image-reflector-controller/pull/194/files#diff-922fcbfbe2565443329918a2b0d49b2b03588007ec6464a20d3feb824f3f80a2R735-R737 it needs to match |
Yey, that worked:
Some research indicates artifact registry OCI addresses for GCP will always be of the form Here is the commit that enables artifact registry, if you'd like to pull it into your branch:
The page that clued me into the structure is here, https://cloud.google.com/artifact-registry/docs/access-control – it would be good to have a more canonical source than that, as there may still be other addresses that can host Artifact Registry and they will not work without further changes. Hopefully this address format is invariant and we can count on it everywhere 🤷 I tested this version and it works with both |
The other thing I might want to add to the docs:
It was not very difficult to discover these roles, but we can help the users by spelling it out. I'm not sure I have selected the minimal roles either. There is no
Someone who knows GCP well enough to say how to select the minimal roles should review that information for correctness. (Perhaps the nasty mess of roles required just to get "container registry / reader" access is part of the motivation to deprecate gcr.io - I'm not sure if I'm the one that added cloud build role or if it came with |
This was the doc from GCP re: Workload Identity that gave the most clarity about what roles were available and what role primitives composed them: https://cloud.google.com/iam/docs/understanding-roles#predefined_roles |
@somtochiama can you please rebase and run |
Signed-off-by: Somtochi Onyekwere <somtochionyekwere@gmail.com> Signed-off-by: Somtochi Onyekwere <somtochionyekwere@gmail.com>
44325e6
to
fcd6c92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @somtochiama 🏅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following through on my comments -- one adjustment, then I think this is good to go.
if err != nil { | ||
return authConfig, err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should really do this in all cases other than getting an error back from Do
. As it stands, the body will only be read to the end and closed if the status was OK and a JSON value could be decoded; but, it's entirely possible, e.g., that a 40x status would have body content.
It's a pain I know, because you are forced to either repeat code or defer (and throw away errors) -- since no (other) I/O happens after the request, I think using defer
is fine.
@somtochiama can you please open a new PR to address @squaremo comment, sorry I missed it. |
Closes #180
Closes #179
This pull request adds two flags (
gcp-autologin-for-gcr
andazure-autologin-for-acr
) for autologin on GCP and Azure to their container registries.This has been tested on GKE clusters with workload identities enabled and default service account.
This has been tested on an AKS w Managed Identities.
Further refactoring could be done to move implementation details of various cloud providers into a dedicated package.
Signed-off-by: Somtochi Onyekwere somtochionyekwere@gmail.com