Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: RoleAssignment UUID clashes #3637

Closed
Roman-Galeev opened this issue Dec 7, 2023 · 6 comments · Fixed by #4196
Closed

Bug: RoleAssignment UUID clashes #3637

Roman-Galeev opened this issue Dec 7, 2023 · 6 comments · Fixed by #4196
Assignees
Labels
high-priority Issues we intend to prioritize (security, outage, blocking bug)
Milestone

Comments

@Roman-Galeev
Copy link

Roman-Galeev commented Dec 7, 2023

Version of Azure Service Operator
v2.4.0

Describe the bug
We have development, staging and production subscriptions for deployments, and each of them has a managed kubernetes service (AKS), so the algorithm used to calculate "stable" AzureName UUID produces clashes. This is the function used currently:

		assignment.Spec.AzureName = randextensions.MakeUUIDName(
			ownerGK,
			assignment.Spec.Owner.Name,
			gk,
			assignment.Namespace,
			assignment.Name)

I believe that "uniqueness" of the UUID could be easily improved by including into the seed string subscription ID (which is known to ASO anyways). However, this will not cover the case when there are more than one cluster in the subcription, but handling that is a more complex story, e.g. ASO could generate an instance ID on install, and use it as a seed.

@matthchr
Copy link
Member

matthchr commented Dec 8, 2023

but handling that is a more complex story, e.g. ASO could generate an instance ID on install, and use it as a seed.

We considered this but avoided it because one of the goals for the autogenerated UUID was that it be stable across instances of ASO to support migration stories where a resource was moved from one ASO cluster to another.

I assume you must be running multiple instances of ASO across multiple clusters (maybe 1 for prod, 1 for staging, 1 for dev?). The current workaround is documented here (in the code)

// In the rare case users have multiple ASO instances with resources in the same namespace in each cluster
// having the same name but not actually pointing to the same Azure resource (maybe in a different subscription?)
// they can avoid name conflicts by explicitly specifying AzureName for their RoleAssignment.

Maybe this case is not as rare as we anticipated.

I'm open to moving to include subscription in this too, although we'd need to think about how to do so in a way which is backwards compatible as we won't want to orphan existing RoleAssignments.

@matthchr
Copy link
Member

matthchr commented Dec 8, 2023

I'm open to moving to include subscription in this too, although we'd need to think about how to do so in a way which is backwards compatible as we won't want to orphan existing RoleAssignments.

Actually we wouldn't break existing RoleAssignments because defaulting only applies when they create a new one, it would break the case where they move RoleAssignments between two instances of ASO w/ the same name. Possibly we could document our way around that.

A (likely bigger) problem is that, at the point the webhook runs we don't actually know the subscription and it's actually possible that the subscription configuration doesn't exist yet, so it might be difficult for the webhook to take it into account.

Have you considered just using something like kustomize to make your dev/test/prod env roleassignments have slightly different names?

-dev, -test and -prod suffix on the RoleAssignment would solve the problem without you needing to fully manage the UUID yourselves.

@Roman-Galeev
Copy link
Author

Roman-Galeev commented Dec 8, 2023

I assume you must be running multiple instances of ASO across multiple clusters (maybe 1 for prod, 1 for staging, 1 for dev?).

Yeah, each cluster has its own instance of ASO. Currently we generate UUIDs, but this is not really convenient, and UUID clashes are totally not nice, because it works, say, in dev, and then all of a sudden breaks in the next cluster.

-dev, -test and -prod suffix on the RoleAssignment would solve the problem without you needing to fully manage the UUID yourselves.

The idea is that deployment of ASO resources should be the same across environments, so we can promote changes between them in gitops way by making merge requests, so adding environment-specific suffixes to names breaks the assumption. It's possible to override the UUID or resource name on deployment, but IMO this is what operator should do, and it does, kind of.

we don't actually know the subscription

I think we do, as it's a configuration parameter of ASO itself. Actually, azureClientID used by ASO is the unique instance ID, so if we could use it as UUID5 namespace, then all UUIDs would be unique up to the instance.

@Roman-Galeev
Copy link
Author

Roman-Galeev commented Dec 8, 2023

I would rather prefer dealing with errors like "assignment exists already" in case of reinstalling ASO with the new azureClientID and redeploying everything, than with RoleAssignmentUpdateNotPermitted, as the latter breaks the deployment, and the former is kind of expected.

@Roman-Galeev
Copy link
Author

...and another idea is including principalID into role UUID seed, instead of owner.

related: #3318

@theunrepentantgeek theunrepentantgeek added high-priority Issues we intend to prioritize (security, outage, blocking bug) and removed needs-triage 🔍 labels Dec 11, 2023
@theunrepentantgeek theunrepentantgeek added this to the v2.6.0 milestone Dec 11, 2023
@theunrepentantgeek
Copy link
Member

There's an interesting conflict here between the two scenarios. In one case, generating a stable UUID for the role identity is crucial to allowing resources to be handed off between ASO instances; in the other, having a different UUID for each installation is needed to keep dev/test/prod environments separate.

I suspect there's no one-size-fits-all solution here; maybe we need to add some configuration to RoleAssignment to control how the stable UUIDs are generated. I'll drop this issue into our next milestone and tag as high-priority so we don't lose sight of it.

@theunrepentantgeek theunrepentantgeek modified the milestones: v2.9.0, v2.11.0 Aug 19, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Recently Completed in Azure Service Operator Roadmap Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high-priority Issues we intend to prioritize (security, outage, blocking bug)
Projects
Development

Successfully merging a pull request may close this issue.

4 participants