-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3299: Update kmsv2 kep for beta #3804
Conversation
af09a4f
to
7a2c7ff
Compare
7a2c7ff
to
4781798
Compare
2. Run the e2e suite against a kind cluster that has kms v2 encryption enabled (as defined below). | ||
3. Compare `request_duration_seconds`, `request_terminations_total`, `request_aborts_total` API server metrics between the two runs. The acceptable delta should be less than 20%. | ||
4. Observe metrics from the reference implementation to determine time taken at each step of the encryption/decryption process. | ||
5. Observe API server startup time with and without kms encryption enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the acceptable delta in startup time for the apiserver w/ and w/o the feature enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we defined the acceptable delta for API server metrics (latency) to be 20% we can do the same here. WDYT @enj?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@logicalhan Me and @enj had a conversation and this is the rough math we came up with:
Assuming 10ms for every KMS RPC request to be complete, for every 5000 resource encrypted, it will increase the startup time by roughly a minute (5000*10ms = 50s). Is this something you would like us to document in the KEP?
1. Add new `identity` provider at the top of encryption config | ||
1. Restart kube-apiserver | ||
1. Run storage migration to migrate all the existing encrypted data to use the `identity` provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, does adding identity
provider unencrypt the data in etcd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know if the migration has successfully completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, does adding identity provider unencrypt the data in etcd?
By setting identity
as the first provider in the list, that'll be used for writes (no encryption). The rest of the providers will be used for read. When the user runs kubectl get secrets --all-namespaces -o json | kubectl replace -f -
, the kms providers will be used to decrypt the data and on write the identity
provider will be used. At the end of this, all the data will be stored unencrypted.
How do you know if the migration has successfully completed?
We're proposing a metric to record the number of times a key is used for read and write (xref: kubernetes/kubernetes#115394). With this metric, when the API server is restarted after storage migration, the count for key used to read should be 0 indicating all the data is already unencrypted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using a condition on the CRD to denote that the unencryption has completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a CRD for KMS. As a first step, we were thinking about exposing enough details in the metrics + documentation on how to use the metric, that could be used to determine if the rotation is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still have the storage-version-migrator. explaining how to verify the writes with that API is worthwhile. A link in the PRR is sufficient to describe it.
1. Remove the KMS provider from the encryption config and restart kube-apiserver | ||
2. At the end of these steps, all the data in etcd will be unencrypted. | ||
|
||
More details are available [here](https://kubernetes.io/docs/tasks/administer-cluster/kms-provider/#disabling-encryption-at-rest) | ||
|
||
Disabling this gate without first doing a storage migration to use a different encryption at rest mechanism will result in data loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just data loss, right? It will basically break the cluster, since apiserver won't be able to read the objects from etcd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just data loss, right? It will basically break the cluster, since apiserver won't be able to read the objects from etcd.
That is only when all the resources in etcd are encrypted using KMS. Typically the configuration is only for a subset of resources (secrets, configmaps). In that scenario, those resources can't be retrieved if the KMS providers are removed without running storage migration. If the user adds the KMS providers back (only for read), they would still be able to retrieve the resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's imagine we encrypted secrets that are mounted in pods. This will prevent any future pods referencing secrets from starting, right? Given that secrets are the most likely thing to encrypt, it's worth calling out the consequences of the failure here.
/cc @deads2k |
4781798
to
1f7d360
Compare
looks like there are a couple PRR updates for beta and this will be good. |
1f7d360
to
f917da2
Compare
@deads2k Updated the PRR sections. PTAL when you get a chance! |
f917da2
to
4e86b15
Compare
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com> Co-authored-by: Mo Khan <i@monis.app> Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
4e86b15
to
02dbdd3
Compare
/lgtm |
/approve for PRR |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aramase, deads2k, enj The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Anish Ramasekar anish.ramasekar@gmail.com
Co-authored-by: Mo Khan i@monis.app
fixes kubernetes/kubernetes#114318