Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Btrfs snapshotter implementation #1957

Merged
merged 17 commits into from
Feb 21, 2024

Conversation

davidcassany
Copy link
Contributor

@davidcassany davidcassany commented Feb 14, 2024

This PR includes the btrfs snapshotter implementation and required changes to specially in the boot process area (elemental-sysroot, elemental-setup, grub, etc.) to also support btrfs snapshots.

Fixes #1923
Fixes #1925

@davidcassany davidcassany requested a review from a team as a code owner February 14, 2024 10:46
@davidcassany davidcassany marked this pull request as draft February 14, 2024 10:47
@davidcassany
Copy link
Contributor Author

davidcassany commented Feb 14, 2024

For the time being this is missing the unit tests and few adaptations to fix/skip integration tests. Note that a btrfs deployment can't be downgraded to an image that does not apply the new btrfs related setup for booting.

Included now

@codecov-commenter
Copy link

codecov-commenter commented Feb 14, 2024

Codecov Report

Attention: 281 lines in your changes are missing coverage. Please review.

Comparison is base (3917e95) 72.04% compared to head (39992cb) 73.10%.

Files Patch % Lines
pkg/snapshotter/btrfs.go 73.24% 117 Missing and 28 partials ⚠️
pkg/snapshotter/loopdevice.go 66.25% 20 Missing and 7 partials ⚠️
pkg/action/upgrade.go 33.33% 21 Missing and 5 partials ⚠️
pkg/types/v1/snapshotter.go 24.00% 16 Missing and 3 partials ⚠️
pkg/action/build-disk.go 61.11% 11 Missing and 3 partials ⚠️
pkg/utils/common.go 52.00% 9 Missing and 3 partials ⚠️
pkg/action/mount.go 60.00% 6 Missing and 4 partials ⚠️
pkg/elemental/elemental.go 63.63% 5 Missing and 3 partials ⚠️
pkg/action/install.go 28.57% 4 Missing and 1 partial ⚠️
pkg/action/reset.go 16.66% 4 Missing and 1 partial ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1957      +/-   ##
==========================================
+ Coverage   72.04%   73.10%   +1.05%     
==========================================
  Files          72       74       +2     
  Lines        8041     8666     +625     
==========================================
+ Hits         5793     6335     +542     
- Misses       1784     1819      +35     
- Partials      464      512      +48     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Requires: xorriso >= 1.5.6
Requires: btrfsprogs
Requires: snapper
Requires: xorriso >= 1.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's effectively a xorriso downgrade !?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is just making the constraint more relaxed. We do not care about the patch level version. Our real constraint is on v1.5.x. Leap 15.5 and SP5 ship 1.4.x.

@@ -301,7 +301,7 @@ func applyKernelCmdline(r *v1.RunConfig, mount *v1.MountSpec) error {
}

switch split[0] {
case "elemental.image", "cos-img/filename":
case "elemental.mode":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these (breaking?) changes need good documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, boot process needs to be documented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a breaking chance since these kernel parameters are set in /etc/cos/bootargs.cfg which is always stored in the image which also contains the initrd that has to read them. In fact thats on of the reasons to not include any specific kernel argument in grub.cfg (which is stored in EFI partition and common to all installed images). Changes in grub.cfg are real breaking changes.

@@ -0,0 +1,3 @@
snapshotter:
type: btrfs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't "type" superfluous ? (are there any other types 🤔 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can use a loopdevice based approach which is the default to facilitate upgrades from 5.5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the loop device approach I would assume snapshotter: to be absent. 😕

But

snapshotter:
  type: loopdevice

is strange since the old way didn't really do snapshots ?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If snapshotter is undefined, not included within the config yaml defaults to loopdevice. My idea is including a specific config for SLE Micro images through the elemental package as in rancher/elemental#1213

Keeping type: btrfs needed IMHO makes sense as there is no reason to assume there can't be a additional snapshotter interface implementations in the future. I can easily dream on an overlayfs based approach 🤔

@@ -175,7 +175,7 @@ func SyncData(log v1.Logger, runner v1.Runner, fs v1.FS, source string, target s

log.Infof("Starting rsync...")

args := []string{"--progress", "--partial", "--human-readable", "--archive", "--xattrs", "--acls"}
args := []string{"--progress", "--partial", "--human-readable", "--archive", "--xattrs", "--acls", "--delete"}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure we can track removed files across upgrades (image A contains a file which image B does not). Since in btrfs snapshots we upgrade over an already existing root-tree we need to make sure it perfectly mirrors the upgraded image. The library we use to extract an image does not perform a perfect sync, just overwrite existing files where needed.

currentSnapshotID int
activeSnapshotID int
bootloader v1.Bootloader
installing bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to track first snapshot deployment in most of the methods of the public snapshotter interface. This is caused by snapper, as it always requires a previous subvolume to operate from. Hence, for the first deployment we are not using snapper to create the first snapshot and relay on btrfs commands to create a subvolume that looks like a btrfs snapper snapshot. This causes having two different execution paths and having to wrap different set of tools for operations that are pretty similar from an Elemental PoV. This is also an inconvenience in SLE Micro, KIWI runs similar tricks to make first root as a snapshot of disk images.

snapper is mostly though to operate on a running system with certain pre-installed btrfs layoyt, but it is not really useful to run a clean installation and deploy the first root as a snapper snapshot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to separate this into a Btrfs snapshotter and a snapper snapshotter?

@@ -301,7 +301,7 @@ func applyKernelCmdline(r *v1.RunConfig, mount *v1.MountSpec) error {
}

switch split[0] {
case "elemental.image", "cos-img/filename":
case "elemental.mode":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, boot process needs to be documented

@@ -2,7 +2,7 @@
Description=Elemental system rootfs overlay mounts
DefaultDependencies=no
After=initrd-root-fs.target
Requires=initrd-root-fs.target
Wants=initrd-root-fs.target
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relaxing this requirement so we could consider removing the RemainAfterExit if needed, which could be useful to track start and stop of the service.

set_volume
source (${volume})/etc/cos/bootargs.cfg
linux (${volume})${kernel} ${kernelcmd} ${extra_cmdline} ${extra_active_cmdline}
initrd (${volume})${initramfs}
}

for passive_snap in ${passive_snaps}; do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passive_snaps includes a space separated list of passive snapshots IDs. Then the path of the image is computed in set_volume function above.

@@ -301,7 +301,7 @@ func applyKernelCmdline(r *v1.RunConfig, mount *v1.MountSpec) error {
}

switch split[0] {
case "elemental.image", "cos-img/filename":
case "elemental.mode":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a breaking chance since these kernel parameters are set in /etc/cos/bootargs.cfg which is always stored in the image which also contains the initrd that has to read them. In fact thats on of the reasons to not include any specific kernel argument in grub.cfg (which is stored in EFI partition and common to all installed images). Changes in grub.cfg are real breaking changes.


if b.cfg.Snapshotter.Type == constants.BtrfsSnapshotterType {
if !b.spec.Expandable {
cfg.Logger.Errorf("Non expandable disk images are not supported for btrfs snapshotter")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be hard to implement support for non expandable (full disk including all partitions) disk images in way that is compatible for btrfs snapshotter too, but requires some build-disk command refactor which is beyond the scope of this PR and also I don't think is really a relevant use case for Elemental.


rootMap := map[string]string{}

if b.spec.Expandable {
exclude = b.spec.Partitions.Persistent
excludes = append(excludes, b.spec.Partitions.Persistent, b.spec.Partitions.State)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also exclude State partition from the build since this is no longer required as the grub configuration is stored in EFI partition.

@@ -8,7 +8,6 @@ Before=initrd.target
[Service]
RootDirectory=/sysroot
BindPaths=/proc /sys /dev /run /tmp
BindPaths=-/oem
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified that the mount command already mounts that on earlier stages. Former initrd /oem mountpoint is unmounted by stopping initrd.target before switching root (PartOf relationship).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, verified that initramfs stage is still executed for yaml files stored in /oem 👍

Requires: xorriso >= 1.5.6
Requires: btrfsprogs
Requires: snapper
Requires: xorriso >= 1.5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is just making the constraint more relaxed. We do not care about the patch level version. Our real constraint is on v1.5.x. Leap 15.5 and SP5 ship 1.4.x.

@davidcassany
Copy link
Contributor Author

Just got a green CI run here https://github.com/rancher/elemental/actions/runs/7932002562/job/21658757154 🎉
Gonna rebase, lauch CI again and move it to ready to review

@davidcassany
Copy link
Contributor Author

Requires rancher/elemental#1213 to be merged after/together

Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
…ition on expandable images

Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
…nfig constructor

Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
Current loopdevice snapshotter does not require anymore
having symlinks to list passive snapshots. Current grub2
configuration configures the passive snapshot path based on
the snapshot ID.

Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
Signed-off-by: David Cassany <dcassany@suse.com>
@davidcassany
Copy link
Contributor Author

UI CI run passes at: https://github.com/rancher/elemental/actions/runs/7988344296
CLI CI run passes for k3s at: https://github.com/rancher/elemental/actions/runs/7987631264

test-downgrade and test-recovery are not going to pass in this PR, incompatibility to downgrade from elemental v2 to v1.x. Workarounds for tests are to be included in separate and follow up PRs.

@davidcassany davidcassany merged commit 8f802fa into rancher:main Feb 21, 2024
17 of 19 checks passed
@davidcassany davidcassany deleted the btrfs_implementation branch July 11, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not include state partition in expandable disks Implement btrfs/snapper snapshotter
4 participants