-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design proposal: new migration process for backend-storage PVCs #314
base: main
Are you sure you want to change the base?
Changes from 2 commits
45edea3
4fa3742
80e29ec
31f2309
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,82 @@ | ||||||||
# Overview | ||||||||
This proposal is about supporting the migration of all VMs that use a persistent feature, such as persistent vTPM. | ||||||||
|
||||||||
To make this document as simple as possible, let's consider that storage in Kubernetes can either be filesystem (FS) or | ||||||||
block mode, and either ReadWriteOnce (RWO) or ReadWriteMany (RWX) access mode, with storage classes supporting any | ||||||||
number of combinations of both. | ||||||||
|
||||||||
As of KubeVirt 1.3, Here is how each of the 4 combinations is supported by the backend-storage for persistent VM state: | ||||||||
- RWO FS: supported by backend-storage but makes VMs non-migratable. **This entire design proposal is about removing that limitation.** | ||||||||
- RWX FS: preferred combination, supported by backend-storage and compatible with VM migrations | ||||||||
- RWO block: not supported yet, but PRs exist to implement it in a non-migratable way | ||||||||
- RWX block: same as above, with migratability functional but considered unsafe, because it leads to 2 pods (source and target virt-launchers) mounting the same ext3 filesystem concurrently. | ||||||||
|
||||||||
The above is the best we can do with the current implementation, which involves mounting the same partition on migration source and target. | ||||||||
This proposal is about switching to a different model which would instead create a new backend-storage PVC on migration, enabling RWO FS. | ||||||||
|
||||||||
Furthermore, the implementation of this design proposal would also allow a potential block-based implementation, compatible with both RWO and RWX and safely migratable. | ||||||||
That, however, is out-of-scope for this design proposal, and may never be needed, since most/all block storage classes also support RWO FS. | ||||||||
|
||||||||
## Alternative approach | ||||||||
The alternative to this design is for upstream components (libvirt - qemu - swtpm) to add block storage support for TPM/EFI. | ||||||||
KubeVirt could then create one backend-storage PVC per feature (EFI/TPM) and pass them directly to libvirt. | ||||||||
That alternative approach would enable RWX block backend-storage (as opposed to RWO FS in this proposal). | ||||||||
|
||||||||
Pros: | ||||||||
- Less code to write and maintain in KubeVirt | ||||||||
- No need to duplicate backend-storage PVCs on migrations | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please file a public request on libvirt to add this functionality, and include the URL to it here. The simplest way to "kill" this alternative is if libvirt rejects it. I would think that this would be a mistake on their behalf, but that's their question to answer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I discussed this with a few members of the libvirt team, and they actually don't appear to have strong objections against it.
So really not to much left to do overall, however we would have to wait for the first release of every component that needs changes, which could take a long time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Please make sure to spell out all your reasoning as its "cons". For example, I did not see the issue of timing written in the proposal.
Let have a formal request and link to it here. We should be explicit that because the reasons spelled out here we may not use it, or may not use it immediately. But let us give the libvirt team an opportunity to refute our assumptions - maybe they can deliver this functionality fast? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
https://issues.redhat.com/browse/RHEL-54325 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks. github comments can easily get lost. Please add these to the design doc - the fact that they are not done is a reason against the alternative option. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I will rework this design proposal this week and include those links. |
||||||||
|
||||||||
Cons: | ||||||||
- Relies on third party projects adding and maintaining options to store config files into a block devices without filesystems | ||||||||
- Requires one PVC per feature for every VM | ||||||||
|
||||||||
## Motivation/Goals | ||||||||
- Users want to be able to use any storage class for backend-storage | ||||||||
- Users want all VMs with persistent features to be potentially migratable | ||||||||
|
||||||||
## Definition of Users | ||||||||
Any VM owner that wants to use persistent features, such as TPM or EFI | ||||||||
|
||||||||
## User Stories | ||||||||
As a user, I want to be able to seamlessly enable persistent features while keeping the migratability of my VMs. | ||||||||
|
||||||||
## Repos | ||||||||
kubevirt/kubevirt | ||||||||
|
||||||||
# Design | ||||||||
- When creating a new VM with persistent features, we use either (first match wins): | ||||||||
- The storage class specified in the KubeVirt Custom Resource | ||||||||
- The storage class marked as the default for virtualization purposes unless it has a StorageProfile that shows only block is supported | ||||||||
- The storage class marked as the default for the cluster | ||||||||
- All new persistent storage PVCs will be created with the name `persistent-state-for-<vm_name>-<random_string>` and the new annotation `persistent-state-for-<vm_name>=true` | ||||||||
- When starting an existing VM with persistent features, we will look for any PVC with the annotation, or any PVC with the legacy name, for backwards compatibility | ||||||||
- The volume mode will be Filesystem | ||||||||
- The access mode will be RWO, unless a StorageProfile exists for the Storage Class and shows only RWX is supported | ||||||||
- Note: the CR field documentation will be adjusted to reflect that RWX is not longer needed | ||||||||
- When a migration is created: | ||||||||
- We create a new empty PVC for the target, with the name `persistent-state-for-<vm_name>-<random_string>` and no annotation | ||||||||
- In the migration object status, we store the names of both the source and target PVCs (see API section below) | ||||||||
- If a migration succeeds, we set the annotation `persistent-state-for-<vm_name>=true` on the new (target) PVC and delete the old (source) PVC, using the source PVC name from the migration object | ||||||||
- If a migration fails, we delete the target backend-storage PVC that was just created, using the target PVC name from the migration object | ||||||||
- In the unlikely event that a VM shuts down towards the very end of a migration, the migration finalizer will decide which PVC to keep and which one to get rid of | ||||||||
- **Important note**: with the way the migration controller currently works, this has the potential of suffering a race condition. We need to make absolutely sure that won't happen. | ||||||||
|
||||||||
## API | ||||||||
The only API that's introduced is a couple status fields in the VirtualMachineMigration object: | ||||||||
- `SourcePersistentStatePVCName` | ||||||||
- `TargetPersistentStatePVCName` | ||||||||
|
||||||||
## Scalability | ||||||||
No new scalability concern will be introduced. | ||||||||
|
||||||||
## Update/Rollback Compatibility | ||||||||
Since the name of the backend-storage PVC will change, we will keep fallback code to look for the legacy PVC. | ||||||||
|
||||||||
## Functional Testing Approach | ||||||||
All existing backend-storage-related functional tests will still apply. More tests could be added to ensure all 4 combinations | ||||||||
of FS/block RWX/RWO work for backend-storage, if we think it's worth the additional stress on the e2e test lanes. | ||||||||
|
||||||||
# Implementation Phases | ||||||||
First phase is block support. That effort is well underway already. | ||||||||
Second phase is changing the PVC handling and could be done as part of the first phase PR or as a separate one. | ||||||||
Either way, it's important that both phases land in the same KubeVirt release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm a bit late on this. Have we ever considered to use a clustered filesystem like gfs2 instead of ext3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really good question.
gfs2
somehow didn't pop up when I looked for potentially suitable filesystems for formatting the block storage, probably worth investigating!That being said, this proposal (and even the alternative it contains) doesn't consider the option of formatting the block storage (anymore), and not just because I couldn't find a suitable filesystem.
The fact that it requires adding a bunch of tools to virt-launcher and virt-handler, and the general complexity of managing partitions ourselves, make it something we should avoid, in my opinion (see draft PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I read a bit about gfs2 supported by Red Hat, and it requires clvm. So, I think this also excludes it as a solution