Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pod identity support for namespaces and per-resource scoped auth #3187

Merged
merged 14 commits into from
Aug 30, 2023

Conversation

super-harsh
Copy link
Collaborator

@super-harsh super-harsh commented Aug 13, 2023

Closes #3167

What this PR does / why we need it:
This PR adds support for allowing single-operator multi-tenant environments to authenticate with different Managed Identities through AAD Pod Identity by adding another secret variable(USE_POD_IDENTITY_AUTH) for scoped credentials to instruct the operator to use Pod Identity authentication.

If applicable:

  • this PR contains documentation
  • this PR contains tests

@codecov-commenter
Copy link

codecov-commenter commented Aug 14, 2023

Codecov Report

Merging #3187 (9ee8b18) into main (55e6ec8) will increase coverage by 0.04%.
Report is 1 commits behind head on main.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #3187      +/-   ##
==========================================
+ Coverage   54.42%   54.46%   +0.04%     
==========================================
  Files        1428     1446      +18     
  Lines      609210   616083    +6873     
==========================================
+ Hits       331582   335576    +3994     
- Misses     223364   225541    +2177     
- Partials    54264    54966     +702     
Files Changed Coverage Δ
v2/internal/identity/credential_provider.go 47.36% <0.00%> (-7.37%) ⬇️

... and 80 files with indirect coverage changes

@super-harsh super-harsh self-assigned this Aug 14, 2023
AZURE_TENANT_ID: "$AZURE_TENANT_ID"
AZURE_CLIENT_ID: "$IDENTITY_CLIENT_ID"
USE_POD_IDENTITY_AUTH: "true"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already use a boolean for USE_WORKLOAD_IDENTITY for global secret and omit that for scoped credentials. Suggestions on using the same here for consistency and making forcing the Managed Identity to be a default in absence of ClientSecret?

Copy link
Collaborator Author

@super-harsh super-harsh Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts on not doing the above :

  1. Users using Workload Identity would have to update their scoped secrets to add a boolean
  2. Pod Identity is deprecated, so did not want to use it as default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with having multiple boolean configurations like this is that we end up with oddities - what should happen if both USER_POD_IDENTITY_AUTH and USE_WORKLOAD_IDENTITY are true at the same time?

I'd like us to consider using a single new configuration setting and deprecating USE_WORKLOAD_IDENTITY.

What if we introduced IDENTITY_AUTH with permitted values of pod, workload or scoped, with scoped as the default if not set.
For back compat, USE_WORKLOAD_IDENTITY: "true" would change the default to workload.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with having multiple boolean configurations like this is that we end up with oddities - what should happen if both USER_POD_IDENTITY_AUTH and USE_WORKLOAD_IDENTITY are true at the same time?

AFAIK we don't actually support USE_POD_IDENTITY_AUTH and USE_WORKLOAD_IDENTITY_AUTH on the same secrets.
USE_POD_IDENTITY_AUTH is allowed only in the scoped credential secret (namespace or per-resource). It's not allowed at the global secret level from what I can tell.

USE_WORKLOAD_IDENTITY is only at the global secret level, it's not supported at the per-resource/per-namespace secret (@super-harsh correct me if I am wrong here). Instead, it's always inferred if the user omits AZURE_CLIENT_SECRET that they want workload identity.

So there's no technical ambiguity here even if there may be some user-ambiguity. It would be better if things were exactly the same between the different secret formats probably, but I do think how we got here is somewhat reasonable because:

  1. The global secret used the "if no AZURE_CLIENT_SECRET, assume AAD Pod Identity" approach, since it previously supported AAD Pod Identity. When we added workload identity support we needed a way to differentiate between the existing supported AAD Pod Identity and Workload Identity. We possibly should've added an enum instead of a boolean here, but the choice of "evolve without breaking" seems right to me.
  2. The per-namespace and per-resource secrets didn't yet exist, so didn't support AAD Pod Identity, so didn't need a boolean until now. Now, we want to add support for AAD Pod Identity in a nonbrekaing way, so we need to differentiate between workload identity (already supported, chosen by omitting the AZURE_CLIENT_SECRET) and this new (legacy) mode of AAD Pod Identity.

I think the direction we want to head (eventually) is that AAD Pod Identity is fully deprecated and we can drop all these flags and just have the "if client_secret omitted, workload identity" - though maybe we want an enum just to allow future evolution. I don't see how we get:

  1. Global and per-namespace/per-resource secrets are consistently shaped (support the same fields/flags).
  2. No breaking changes from what we have today.

Without doing something like what @super-harsh has done here (though agree maybe boolean worse than enum)

AZURE_TENANT_ID: "$AZURE_TENANT_ID"
AZURE_CLIENT_ID: "$IDENTITY_CLIENT_ID"
USE_POD_IDENTITY_AUTH: "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with having multiple boolean configurations like this is that we end up with oddities - what should happen if both USER_POD_IDENTITY_AUTH and USE_WORKLOAD_IDENTITY are true at the same time?

AFAIK we don't actually support USE_POD_IDENTITY_AUTH and USE_WORKLOAD_IDENTITY_AUTH on the same secrets.
USE_POD_IDENTITY_AUTH is allowed only in the scoped credential secret (namespace or per-resource). It's not allowed at the global secret level from what I can tell.

USE_WORKLOAD_IDENTITY is only at the global secret level, it's not supported at the per-resource/per-namespace secret (@super-harsh correct me if I am wrong here). Instead, it's always inferred if the user omits AZURE_CLIENT_SECRET that they want workload identity.

So there's no technical ambiguity here even if there may be some user-ambiguity. It would be better if things were exactly the same between the different secret formats probably, but I do think how we got here is somewhat reasonable because:

  1. The global secret used the "if no AZURE_CLIENT_SECRET, assume AAD Pod Identity" approach, since it previously supported AAD Pod Identity. When we added workload identity support we needed a way to differentiate between the existing supported AAD Pod Identity and Workload Identity. We possibly should've added an enum instead of a boolean here, but the choice of "evolve without breaking" seems right to me.
  2. The per-namespace and per-resource secrets didn't yet exist, so didn't support AAD Pod Identity, so didn't need a boolean until now. Now, we want to add support for AAD Pod Identity in a nonbrekaing way, so we need to differentiate between workload identity (already supported, chosen by omitting the AZURE_CLIENT_SECRET) and this new (legacy) mode of AAD Pod Identity.

I think the direction we want to head (eventually) is that AAD Pod Identity is fully deprecated and we can drop all these flags and just have the "if client_secret omitted, workload identity" - though maybe we want an enum just to allow future evolution. I don't see how we get:

  1. Global and per-namespace/per-resource secrets are consistently shaped (support the same fields/flags).
  2. No breaking changes from what we have today.

Without doing something like what @super-harsh has done here (though agree maybe boolean worse than enum)


// IdentityAuthMode enum is used to determine if we're using Pod Identity or Workload Identity
//authentication for namespace and per-resource scoped credentials
IdentityAuthMode = "IDENTITY_AUTH_MODE"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it would make more sense to call this:
AUTH_MODE and have the values be workloadidentity, podidentity, or (possibly in the future) sp or cert? Calling it IDENTITY_AUTH_MODE might limit us?

workloadIdentity IdentityAuthModeOption = "workload"

// IdentityAuthMode enum is used to determine if we're using Pod Identity or Workload Identity
//authentication for namespace and per-resource scoped credentials
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//authentication for namespace and per-resource scoped credentials
// authentication for namespace and per-resource scoped credentials

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not actually updated?

Copy link
Member

@theunrepentantgeek theunrepentantgeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nitpick on validation of configuration, but otherwise looks good. Thanks for making the changes, much appreciated.

Comment on lines -503 to -510
helm upgrade --install --devel aso2 aso2/azure-service-operator \
--create-namespace \
--namespace=azureserviceoperator-system \
--set azureSubscriptionID=$AZURE_SUBSCRIPTION_ID \
--set aadPodIdentity.enable=true \
--set aadPodIdentity.azureManagedIdentityResourceId=${IDENTITY_RESOURCE_ID} \
--set azureClientID=${IDENTITY_CLIENT_ID} \
--set crdPattern='resources.azure.com/*;containerservice.azure.com/*;keyvault.azure.com/*;managedidentity.azure.com/*;eventhub.azure.com/*'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this helm command present below this point - should it have been removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have moved it above on line 455

Comment on lines 338 to 343
func authModeOrDefault(mode string) AuthModeOption {
if mode == string(podIdentity) {
return podIdentity
}
return workloadIdentity
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This treats anything other than podIdentity as workloadIdentity - including foo, bang, and oh-oh.

A misconfigured value should at least trigger a log warning, if not an actual error.

I'm also a fan of being case insensitive for configuration values, though @matthchr doesn't always agree.

Suggested change
func authModeOrDefault(mode string) AuthModeOption {
if mode == string(podIdentity) {
return podIdentity
}
return workloadIdentity
}
func authModeOrDefault(mode string) (AuthModeOption, error) {
if strings.EqualFold(mode, string(podIdentity)) {
return podIdentity, nil
}
if strings.EqualFold(mode, string(workloadIdentity)) {
return workloadIdentity, nil
}
return "", errors.Errorf("authorization mode %q not valid", mode)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a fan of being case insensitive for configuration values, though @matthchr doesn't always agree.

I don't have a philosophical objection here one way or the other. I do think that @theunrepentantgeek is correct we should have a warning or error if the user gets it wrong. I think from a consistency perspective, being case-sensitive here might make sense given other fields (such as enums in the CRDs themselves) are also case-sensitive, and case-sensitivity seems to be "the standard" in the Kubernetes world (look at labels which are case-sensitive, or names which only allow lowercase and thus are case-sensitive).

My dislike of case-insensitivity comes primarily from REST, where attempting to be case-insensitive is awkward because JSON keys are stereotypically case-sensitive (so the expectation is not necessarily case-insensitivity) and REST entities are both set by the user and returned to the user. In the "returned to the user" scenario, case insensitivity gets awkward because you want to preserve what they've sent you, but internally you want to store it in a single canonical form. Accomplishing this ends up being:

  1. More work in your backend (need to remember their case)
  2. Confusing. Which of their cases do you remember? The latest PUT? Or the first PUT? Or something else?

Postel's law ends up not really applying because usually in REST you have to return exactly what you accept (the expectation is that what you PUT is what you GET, basically), so you can't actually be liberal in what you accept but conservative in what you return. They are one and the same. You have to pick a winner and I tend to favor picking the easier to implement (and simpler to explain) approach and just give an error in all other cases.

type AuthModeOption string

const (
podIdentity AuthModeOption = "pod"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to call it podidentity and workloadidentity?

@matthchr matthchr added this to the v2.3.0 milestone Aug 28, 2023
@@ -34,6 +34,17 @@ const (
FederatedTokenFilePath = "/var/run/secrets/tokens/azure-identity"
)

type AuthModeOption string

const (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these constants be made available to import like #3171?

@@ -300,3 +326,15 @@ func (c *credentialProvider) getSecret(ctx context.Context, namespace string, se
func getSecretNameFromAnnotation(credentialFrom string, resourceNamespace string) types.NamespacedName {
return types.NamespacedName{Namespace: resourceNamespace, Name: credentialFrom}
}

func authModeOrDefault(mode string) (config.AuthModeOption, error) {
if strings.EqualFold(mode, string(config.WorkloadIdentityAuthMode)) || mode == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it's more consistent if we're case-sensitive (see my other comment), but approving as-is as not gonna block it on this.

@super-harsh super-harsh enabled auto-merge (squash) August 30, 2023 02:38
@super-harsh super-harsh merged commit 9061839 into main Aug 30, 2023
@super-harsh super-harsh deleted the feature/scoped-pod-identity branch August 30, 2023 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Allow multi-tenant environments to authenticate with different Managed Identities through AAD Pod Identity
5 participants