Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Azure Workload Identity Support #378

Open
pinkfloydx33 opened this issue Jul 3, 2022 · 23 comments · Fixed by #442
Open

[FEATURE] Azure Workload Identity Support #378

pinkfloydx33 opened this issue Jul 3, 2022 · 23 comments · Fixed by #442
Labels
enhancement New feature or request

Comments

@pinkfloydx33
Copy link

pinkfloydx33 commented Jul 3, 2022

Azure recently added Azure Workload Identities (see https://github.com/Azure/azure-workload-identity and https://azure.github.io/azure-workload-identity/docs/) to AKS. It's currently technically in preview, but it is stable and there are many already using it in production. Currently in private-preview is support for managed identities (in addition to app registrations).

Important: This is the replacement to aad pod-identity and will be offered as a first-class add-on in AKS

Describe the solution you'd like
Support for Azure Workload Identity. I believe that this can be implemented on top of the existing environment-variable driven configuration with recognition of the new settings. I think it should mostly be handled by the azure client, though see linked issues below as there may be some monitoring that needs to occur.

We'd also need helm chart support to allow annotating service accounts, but I'll open that as a separate issue in the helm repo.

Additional context
See cert-manager and external-dns for similar requests
See here for support added in go/auto-rest
See here for implementation in KEDA for key-vault and Azure scalers

@pinkfloydx33 pinkfloydx33 added the enhancement New feature or request label Jul 3, 2022
@pinkfloydx33
Copy link
Author

Any chance this will get looked at in the near future? For us, this (along with external-dns) is a blocker to complete adoption of workload identity.

If not, would you accept a PR for it? I'm not an expert in go but I would be willing to give it a shot. Upon a quick glance, it looks like it may be a matter of some updates to azure/credentialprovider. Though while these are environment variable based settings, azure/auth/EnvironmentSettings doesn't include any new helpers for it (likely since it's a multistep process). So I think it'd require some manual switching on the precense of the appropriate variables, but perhaps that's not the best solution. It may also require an updated version of the azure clients, I'm not sure just yet.

In either case, inspiration can be drawn from the go-msal AZWI keyvault example as well as KEDA's implementation linked in the OP.

@hargut
Copy link

hargut commented Oct 27, 2022

As of October 24th 2022 the https://github.com/Azure/aad-pod-identity was as deprecated.

@weisdd
Copy link

weisdd commented Dec 3, 2022

Since I have some experience with making adal work well with federated identity, decided to give it a go for this project as well.
In my lab, controller was able to pull a secret. I still need to test injector (I'll probably have some time for that next week). If that scenario works as well, I'll prepare a PR.

@mohamedmzid
Copy link

hello @weisdd, glad to hear that's working with workload identity, could you share the branch/tag so that we can test from our side as well ?

@cgroschupp
Copy link
Contributor

cgroschupp commented Dec 9, 2022

Hi @mohamedmzid,
i have linked my two PRs here.

This is my value file, please use the chart from my repo:

controller:
  image:
    repository: cgroschupp/akv2k8s-controller
    tag: workload-identity-support
  keyVaultAuth: environment
  logLevel: debug
  podLabels:
    azure.workload.identity/use: "true"
  serviceAccount:
    create: true
    labels:
      azure.workload.identity/use: "true"
    annotations:
      azure.workload.identity/client-id: <replace>
env_injector:
  enabled: false  

@weisdd
Copy link

weisdd commented Dec 9, 2022

@mohamedmzid I was working on it locally, so haven't been posted before, but since you're interested, added it here:
#443

I haven't had time yet to test it against ACR and env injection, for the rest it should work just fine.

@cgroschupp As you can see, it's done in a different way with explicit request to use Workload Identity. That's something that can be done in different ways, I guess it's up to maintainers to choose between the two.
Though, the main issue with your PR is that it relies only on built-in features of adal, while it doesn't support access token renewals for federated identity out-of-the-box (federated token has to be dynamically re-read from disk as it expires after 1h), so any long-standing session will fail after 24h due to the access token expiration. azure-key-vault-to-kubernetes is not necessarily affected by that, but it's always important to keep that in mind in your implementations.

@weisdd
Copy link

weisdd commented Dec 9, 2022

Tested secret creation, env injection, and private ACR access. The latter didn't work, required a tiny fix.

I have a terraform lab prepared for the test environment + a modified chart (some changes that are not covered by SparebankenVest/public-helm-charts#84). I'll try to publish them this weekend, just need to write some instructions and reorganize the code a bit.

@weisdd
Copy link

weisdd commented Dec 10, 2022

Terraform lab with my custom builds is shared here: https://github.com/weisdd/akv2k8s-pr-443

@torresdal
Copy link
Collaborator

For your information we're looking into adding support for Workload Identity now and we will review any pull requests and comments related to this. Will get back to you shortly.

@cgroschupp
Copy link
Contributor

@torresdal workload identity support is already added, see #442

@torresdal
Copy link
Collaborator

He😅 Been off this project a while and missed this. So this can be closed?

@pinkfloydx33
Copy link
Author

Though AZWI support is not released yet, it's supposedly closed out with that PR.

This issue has been around for a while. The PR should've been linked to this issue for closure purposes IMO.

@torresdal
Copy link
Collaborator

See now that it's available in 1.5-beta. I'll look into what's missing to get out a new 1.5 release.

@181192
Copy link
Collaborator

181192 commented Feb 12, 2023

@torresdal We are missing support for authenticating with the ACR when using env-injector, when the command is not set in the k8s manifest. The code then needs to get the metadata from the container registry in order to fetch the entrypoint of the container so it can set the envs...
#457

@181192 181192 linked a pull request Feb 12, 2023 that will close this issue
@181192
Copy link
Collaborator

181192 commented Feb 12, 2023

I suspect k8schain from go-containerregistry is not playing well here...

@cgroschupp
Copy link
Contributor

cgroschupp commented Apr 17, 2023

@181192 After adding the AcrPull role to the identity I used for the env_injector, it is working for me.

Tested with 1.5.0-beta5

But you must do the same if you are using the pod identity.

@cgroschupp
Copy link
Contributor

@181192 Have you tested it too?

@181192
Copy link
Collaborator

181192 commented May 31, 2023

Yes I have, cant find any issues, so want to make a release

@cmergenthaler
Copy link

Any plans when the new version will be released?

@tspearconquest
Copy link
Contributor

tspearconquest commented Jun 22, 2023

I'm having troubles with this.

I have created a User Assigned Managed Identity, and enabled Azure Workload Identity on a test cluster. On that cluster, I've installed AKV2K8S and configured it with the below values:

akv2k8s:
  name: akv2k8s

  global:
    logLevel: info
    logFormat: json
    keyVaultAuth: cloudConfig
    userDefinedMSI:
      enabled: true
      msi: "my-identity-object-id"
      subscriptionId: "my-subscription-id"
      tenantId: "my-tenant-id"
      azureCloudType: "AzurePublicCloud"
    metrics:
      enabled: true

  rbac:
    create: true

  watchAllNamespaces: true

  controller:
    enabled: false

  env_injector:
    enabled: true
    name: env-injector
    keyVaultAuth: cloudConfig
    authService: true
    image:
      repository: my-repo/akv-webhook
      tag: v1.5.0-beta.6
      pullPolicy: Always
    replicaCount: 3
    envImage:
      repository: my-repo/akv-envinjector
      tag: v1.5.0-beta.6
      pullPolicy: Always
    logLevel: info
    logFormat: json
    certificate:
      useCertManager: true
      custom:
        enabled: false
    serviceAccount:
      create: false
      name: akv2k8s-envinjector
    labels:
      app: akv2k8s
    podLabels:
      app: akv2k8s
      "azure.workload.identity/use": "true"
    podAnnotations:
      kubectl.kubernetes.io/default-container: "webhook"
    podDisruptionBudget:
      enabled: false

    failurePolicy: Fail

    namespaceSelector:
      matchExpressions:
      - key: control-plane
        operator: DoesNotExist
      - key: name
        operator: NotIn
        values:
        - kube-system

    resources:
      limits:
        memory: 500Mi
      requests:
        cpu: 5m
        memory: 50Mi
    tolerations:
    - key: CriticalInfra
      operator: Exists
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - akv2k8s
          topologyKey: "kubernetes.io/hostname"

    extraVolumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: akv2k8s-envinjector
      readOnly: true

    extraVolumes:
    - name: akv2k8s-envinjector
      projected:
        defaultMode: 292
        sources:
        - configMap:
            name: kube-root-ca.crt
        - downwardAPI:
            items:
            - fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
              path: namespace
        - serviceAccountToken:
            expirationSeconds: 3600
            path: token

After deploying, the env-injector comes up properly and I can see the Azure Workload Identity webhook mutated the pod to include the projected volume for the workload identity token in addition to the normal service account token projected volume which is defined in the above values:

  - name: akv2k8s-envinjector
    projected:
      defaultMode: 292
      sources:
      - configMap:
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - serviceAccountToken:
          expirationSeconds: 3600
          path: token
  - name: azure-identity-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: api://AzureADTokenExchange
          expirationSeconds: 3600
          path: azure-identity-token

Edit: I hit submit too early.

After verifying the above, I tried to deploy one of our services which uses AKV2K8S injected vars, and I'm getting the below on the replicaset for that service:

Warning FailedCreate 6m45s (x90 over 20h) replicaset-controller Error creating: Internal error occurred: failed calling webhook "pods.env-injector.admission.spv.no": failed to call webhook: Post "https://akv2k8s-envinjector.akvtest.svc:443/pods?timeout=10s": EOF

@tspearconquest
Copy link
Contributor

I discovered last week that we had an egress NetworkPolicy which blocked connectivity to the IMDS endpoint. Removing that seems to get me closer to a working state. It can connect to AD and gets rejected with a message indicating that the identity was not found. I'm working on adding the identity to the VMSS now which I hope will solve the issue.

@tspearconquest
Copy link
Contributor

It would be great if the below happens:

  1. Add some docs to make it clear that blocking access to the IMDS endpoint will break this functionality
  2. Add some check at startup of the akv2k8s env-injector service to connect with the MSI and verify that the token can be retrieved. If unsuccessful, then output enough logs about the problem to allow the user to troubleshoot before they move on to trying to start a workload with injected environment vars.

@tspearconquest
Copy link
Contributor

Just writing to update that I got my setup working. It turns out I didn't need AZWI for my AKV2K8S use-case, as I am simply using a custom UMI, and AKV2K8S supports it natively in the helm chart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
9 participants