-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add upgrade-recovery subcommand #1974
Add upgrade-recovery subcommand #1974
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer adding the upgrade-recovery
command.
Adding flags that completely changes the implementation of the upgrade
command feels like a recipe for unintended consequences.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1974 +/- ##
==========================================
- Coverage 73.04% 72.27% -0.77%
==========================================
Files 74 76 +2
Lines 8713 8895 +182
==========================================
+ Hits 6364 6429 +65
- Misses 1831 1938 +107
- Partials 518 528 +10 ☔ View full report in Codecov by Sentry. |
From @davidcassany other comment I got the impression adding the argument was better. This is indeed simpler to implement. (And adding a separate action here is probably not needed too) |
Part of rancher/elemental#1218 |
49451b4
to
3246717
Compare
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
0e19aad
to
825c402
Compare
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 👍
I think the only actual request change I have to actually include the yaml loading unit test. I think the current sub viper setup is wrong (speaking from memory I might miss some detail), but in any case a simple unit test in that front will tell us and actually verify it works when fixed, in case it needs to be fixed.
The other issues, probably it is worth addressing them in a separate PR, not so sure. If we don't face obvious breaks and regressions with this I'd vote to merge it and iterate from that point. I think it might be easier this way.
pkg/action/upgrade-recovery.go
Outdated
statePath := filepath.Join(u.spec.Partitions.State.MountPoint, constants.InstallStateFile) | ||
if u.spec.Partitions.State.MountPoint == "/" || u.spec.Partitions.State.MountPoint == "/.snapshots" { | ||
statePath = filepath.Join(constants.RunningStateDir, constants.InstallStateFile) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lines are concerning, these are likely to explain why this does not work in a pod run. I got rid of this hack in upgrade because those paths shouldn't be hardcoded and we should try to minimize suc-upgrade impact in toolkit code (so I'd say that testing against /host/
or /host/.snapshots
is also not an option).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a compromise to start walking is not updating the state.yaml
in State partition and figure out the consequences of such an inconsistency an figure out if there are better alternatives to manage state.yaml
data (probably could be stored somewhere else such as in OEM, even I am not a fan of the idea but it would make is handling way easier).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I did inherit the hack and I didn't notice it was removed.
It's gone now. Thank you
pkg/action/upgrade.go
Outdated
return elementalError.NewFromError(err, elementalError.MountStatePartition) | ||
} | ||
cleanup.Push(umount) | ||
umount, err = elemental.MountRWPartition(u.cfg.Config, u.spec.Partitions.State) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely to produce an undesired effect or even fail, it is likely to attempt mounting the current root as RW. By default /run/initramfs/elemental-state
is already mounted as RW on a running system with the current setup. So it should not be needed to actually mount it as RW, it already is as RW.
Also, on a running system on btrfs, the active subvolume is mounted by default on this partition and it could also be the reported mount point from ghw library. This is the actual reason this was only called in recovery mode in the former code. Probably we need to move or add the logic at
elemental-toolkit/pkg/snapshotter/btrfs.go
Lines 736 to 761 in 9538960
func findStateMount(runner v1.Runner, device string) (rootDir string, stateMount string, err error) { | |
output, err := runner.Run("findmnt", "-lno", "SOURCE,TARGET", device) | |
if err != nil { | |
return "", "", err | |
} | |
r := regexp.MustCompile(`@/.snapshots/\d+/snapshot`) | |
scanner := bufio.NewScanner(strings.NewReader(strings.TrimSpace(string(output)))) | |
for scanner.Scan() { | |
lineFields := strings.Fields(scanner.Text()) | |
if len(lineFields) != 2 { | |
continue | |
} | |
if strings.Contains(lineFields[1], constants.RunningStateDir) { | |
stateMount = lineFields[1] | |
} else if r.MatchString(lineFields[0]) { | |
rootDir = lineFields[1] | |
} | |
} | |
if stateMount == "" || rootDir == "" { | |
err = fmt.Errorf("could not find expected mountpoints, findmnt output: %s", string(output)) | |
} | |
return rootDir, stateMount, err | |
} |
utils.GetAllPartitions
method so we can have a known and consistent mountpoint defined for state partition.The problem is that current Partitions struct and the underlaying ghw library assumes a single mountpoint per partition, while there could be more. In fact in the upgrade pod mountpoints under
/host/run
and /run
are duplicated, we have not criteria from elemental-toolkit to distinguish them or even reporting them.At mid term I'd probably drop ghw library use in favor of an
lsblk
wrapper (we used to have that at a time) and report at list of mountpoints per partition. But this is relatively ambitious refactor... specially in terms of testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I actually changed this to always remount it because the CI tests were failing because /run/initramfs/elemental-state
could not be found. I will revert this change here, most likely this is not the solution if there was an issue with it.
I also noticed the state partition was already RW mounted, this is why I added the debug log somewhere else to confirm this. Then the test just worked. 🤷
cmd/config/config.go
Outdated
if err != nil { | ||
return nil, fmt.Errorf("failed initializing upgrade recovery spec: %v", err) | ||
} | ||
vp := viper.Sub("upgrade-recovery") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure this is correct? shouldn't it be vp := viper.Sub("upgrade")
? IIRC this states the sub schema item, hence it maps whatever is loaded from the yaml under upgrade-recovery
stanza. I guess this could be tested and validated by checking this method properly reads upgrade
section of a yaml. I assume this falls into the scope of cmd/config/config_test.go
.
I think this should be fixed or at least validated with a unit test in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry I got confused (again :D ) with it.
I removed it all and just added a (bit ugly) flag to sanitize the same stanza when used for recovery only, since it's the only logical difference in there.
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
return u.cfg.WriteInstallState( | ||
u.spec.State, statePath, | ||
u.spec.State, filepath.Join(u.spec.Partitions.State.MountPoint, constants.InstallStateFile), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely to cause failures in recovery-only upgrades as for those deployments using btrfs snapshotter the u.spec.Partitions.State.Mountpoint
is likely to point to /
(if running directly on the host) or /host
(if running i the system-upgrade-controller pod).
As said before I think thin can be addressed later in follow up PR, it is not a regression.
Scratch the above.
This is now something in the middle.
elemental upgrade-recovery --recovery-system.uri <imgRef>
is the final choice.elemental upgrade --recovery
can still be used as usual to upgrade both recovery and system at the same time.The
upgrade-recovery
subcommand shares theUpgradeSpec
withupgrade
to reuse the same configuration/initialization logic, but filters it at a command level.Only
--recovery-system
probably makes sense here, most of other flags should be disabled.