Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Auth Key Rotation #3849

Closed
mehighlow opened this issue Mar 6, 2024 · 6 comments · Fixed by #4110
Closed

Support Auth Key Rotation #3849

mehighlow opened this issue Mar 6, 2024 · 6 comments · Fixed by #4110
Assignees
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@mehighlow
Copy link
Contributor

mehighlow commented Mar 6, 2024

Describe the current behavior
There might be a requirement (due to a security breach or protocol) to rotate authentication or access keys for an Azure resource, such as CosmosDB or a Storage account.
There is no option to trigger key rotation, nor does the operator automatically select new keys upon reconciliation if one triggers rotation externally (through the portal, using an az-cli command, or via API interaction).

Describe the improvement

  1. I'd say an idempotent annotation would be a solution. For example, if I need to rotate keys, I could patch a resource by adding an serviceoperator.azure.com/reconcile-policy: rotate-keys annotation. The operator would then rotate a key, update operatorSpec.secrets, and set a timestamp to ensure it doesn't rotate it again. If I need to rotate keys once more, I would have to delete the last known rotation timestamp that was previously set by the operator.
  2. And/Or update keys upon change
@mehighlow
Copy link
Contributor Author

mehighlow commented Mar 7, 2024

Update: The operator does update secrets once key/keys have been rotated externally. It just takes time (depends on the number of resources managed by ASO, which defines API calls throttling, I believe).

Maybe adding one more metadata.lastUpdateTimestamp next to the existing metadata.creationTimestamp would improve visibility, as this data gets retrieved constantly.

Also, you never know how fast a service (application/deployment) would recover once keys rotated via external call. ASO deployment restart helps to speed up the process.

@buzzaII
Copy link

buzzaII commented Mar 9, 2024

For key rotation in our environment we go through a proces of alternating the primaryMasterKey and secondaryMasterKey into the SAME secret destination as seen below. We then have reloader configured to monitor the secret change. This gives us zero down time as there is no period the secrets in k8s are invalid. This was implemented based on some lightweight guidance here.

The rotation process is then:

  1. Rotate the secondary key in azure portal / via script
  2. Update manifest SecretDestination to use the secondaryMasterKey.
  3. Deploy workload
  4. Rotate the master key in azure portal / via script
  5. Update manifest SecretDestination to use the primaryMasterKey.
  6. Deploy workload
operatorSpec:
    secrets:
      primaryMasterKey:
        name: cosmos-db
        key: key

or

operatorSpec:
    secrets:
      primarySecondaryKey:
        name: cosmos-db
        key: key

@mehighlow
Copy link
Contributor Author

@buzzaII, thank you for sharing! It indeed appears to be a solution to separate secrets and ingest both primaryKey and secondaryKey as two separate values from two separate secrets. Reloader triggers deployment rollout, and an application just needs a fallback implementation to use the second key if the first one fails. However, you have to have a side script to rotate keys. In my case, with hundreds of Cosmos DB accounts, sequential key rotation takes far too much time, while parallel execution requires API throttle management, which makes the script more of a well-designed tool. So, I wonder what @matthchr thinks about that.

@buzzaII
Copy link

buzzaII commented Mar 12, 2024

Just for clarity the above steps 2 and 5 create the same secret in k8s:

apiVersion: v1
kind: Secret
metadata:
  name: cosmos-db
type: Opaque
data:
  #either the primary or secondary secret value - because this changes it causes reloader to restart the deployment
  key: <<secret value>>

The other thing is you could actually rotate these on a schedule (say weekly) - alternating each week between the primary and the the secondary secrets.

Note, which I failed to describe above, if you were to rotate too quickly between steps 3-4 you could create an outage (if you have a goal of zero downtime) as reloader might not have had a chance to restart the workloads and have them ready in time.

In saying all of this - we are making an effort to move to workload identities so we don't have secrets in the mix.

@mehighlow
Copy link
Contributor Author

In saying all of this - we are making an effort to move to workload identities so we don't have secrets in the mix.

For sure, that is the way to go. But there is always old stuff that no one has time to deal with that is still using keys and connection strings for auth.

@matthchr matthchr self-assigned this Mar 25, 2024
@matthchr matthchr added the documentation Improvements or additions to documentation label Apr 8, 2024
@matthchr matthchr added this to the v2.8.0 milestone Apr 22, 2024
matthchr added a commit to matthchr/azure-service-operator that referenced this issue Jun 18, 2024
@matthchr
Copy link
Member

However, you have to have a side script to rotate keys. In my case, with hundreds of Cosmos DB accounts, sequential key rotation takes far too much time, while parallel execution requires API throttle management, which makes the script more of a well-designed tool. So, I wonder what @matthchr thinks about that.

Yeah I agree that this is challenging. The difficulty with ASO supporting secret rotation comes down to rendering the action of secret rotation in a declarative way. It's not impossible, but given the huge push from MSFT to move folks away from service principals and keys to to Managed Identity, which is already declarative, it doesn't seem all that valuable for us to invest effort in solving this problem in ASO. The canonical Azure-wide solution for "it's hard to manage rotation of a large number of credentials" is Managed Identity, which ASO also already supports.

With that said, I've sent a PR to update our documentation based on the flow that @buzzaII outlined as that pattern called out explicitly would probably be more helpful to users than the very light set of guidance we had.

Once that documentation PR lands, this will be closed as at this point in time that's all we're planning on doing for this.

github-merge-queue bot pushed a commit that referenced this issue Jun 19, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Recently Completed in Azure Service Operator Roadmap Jun 19, 2024
@theunrepentantgeek theunrepentantgeek moved this from Recently Completed to Ready for Release in Azure Service Operator Roadmap Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Development

Successfully merging a pull request may close this issue.

3 participants