Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to ostree-format: "oci" #1262

Closed
wants to merge 489 commits into from
Closed

Conversation

cgwalters
Copy link
Member

Part of coreos/fedora-coreos-tracker#812

In this initial step, we're merely switching the internal
tarball to be a different format.

A future step will change the FCOS pipeline to automatically
push this container to quay.io.

cgwalters and others added 30 commits May 18, 2021 12:08
This way it's a lot clearer under which conditions the generator
runs.
Even though it's on a comment line, because it's in a heredoc, bash does
try to execute this. This fails on FCOS thankfully because there is no
`man` on FCOS, but it still logs an error message. (And... any derived
system which does ship `man`, I think this actually would dump the
manpage into the unit.)
…stable repos

Signed-off-by: Clement Verna <cverna@tutanota.com>
We're currently gating on `ENV{DM_SUSPENDED}=="Active"` but
`10-dm.rules` does:

```
ENV{DM_SUSPENDED}=="Active", ENV{DM_SUSPENDED}="0"
ENV{DM_SUSPENDED}=="Suspended", ENV{DM_SUSPENDED}="1"
```

So what I think is happening here is that our rule happens to
run before that kicks in, so we make the links once, but not
thereafter.

Change the condition to match what `13-dm-disk.rules` from
LVM is doing.

Also slightly reorder the code and add some comments for extra clarity.
If root is on multipath (which is today for CoreOS always `rd.multipath=default`)
then we *know* we must use it for `/boot`.  We're not going to
support "tearing" where `/boot` is on a non-mpath device but
`/` is on mpath.

The current code is I believe racy because at the time the generator
runs (and systemd generators run *early*), we're querying the
"current" properties of the device at
`/dev/disk/by-label/boot`.  But multipathd could still be in
the process of setting up and replacing the target of that
symlink.  This can cause systemd to tear down and reinitialize
the mount, causing races.

https://bugzilla.redhat.com/show_bug.cgi?id=1944660
Hopefully in the future we'll create a nice `rdcore` like Rust
place for our generators.

For now let's factor out a little helper library.
There are some NetworkManager related changes and possibly others
that are causing failures in our bump-lockfile process. We need
to investigate these issues before promoting dracut-054.

coreos/fedora-coreos-tracker#842
Adjust the kernel arguments so that we're now using cgroups v2
in our testing-devel (and subsequently, testing and stable) stream(s).

Context: coreos/fedora-coreos-tracker#292
This will check if a system is still using cgroupsv1 and generate a
message to be printed as part of CLHM.

Co-authored-by: Dusty Mabe <dusty@dustymabe.com>
crun was explicitely included in [1] for Fedora CoreOS but we don't use
it in RHCOS for now as we default to runc [2] so this moves it to the
fedora-coreos-base manifest to make it FCOS only.

[1] coreos@45b0167
[2] openshift/os@80aa676
As we've learned in https://bugzilla.redhat.com/show_bug.cgi?id=1954025,
we can't assume that we can use any individual path before multipathd
unifies them. This means that it's not correct to turn it on from the
second boot onwards only, which is the current approach.

So we need to support multipath at first boot, and further, we need to
delay all I/O to the boot disk to *after* multipathd takes ownership.

This patch does this by introducing a generator and a target. We need to
use a target because `After=multipathd.service` is not enough to ensure
that multipathd finished setting up the devices. The target explicitly
waits for the multipathed boot device to appear. (Note though that we
*can't* assume that root is also directly on top of the multipath device
because of LUKS-on-root.)

This creates an awkward UX, which is that multipath is the only root
setup option which is *not* driven by Ignition but instead set up behind
its back.

This will be somewhat better once Ignition supports kernel arguments,
because the `rd.multipath` karg can be fed that way. But long-term, we
should consider teaching Ignition to configure multipath devices. This
would fix the configuration problem (right now, we only support
first-booting with default configs), and would free users from adding
the necessary kargs, which would be done by the rootmap code as usual.

But still, even in that world, there's a gap where the Ignition config
could be on the boot device, in which case multipathd would need to be
turned on beforehand.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1954025
`lsblk` doesn't know how to query the `PKNAME` and `PTUUID` on multipath
devices, so handle that case.
We need special handling here to grow multipathed devices. The `sfdisk`
line is the same as was added in coreos#392.
We shouldn't conditionalize on `DM_ACTIVATION` here. It's used by device
mapper to differentiate between different types of events, but it
doesn't mean the device isn't active.

For example, in a multipath "reload" event, `DM_ACTIVATION` will be 0,
and the symlinks shouldn't flicker through this.

This fixes udev CHANGE events (e.g. from a partition table reread)
sometimes causing our multipath symlinks to go away even if the
multipath devices themselves are completely fine.

See: https://sourceware.org/git/?p=lvm2.git;a=blob;f=udev/10-dm.rules.in;h=b4fa52ab766effb04fc198fd52e6181ad5758eef;hb=HEAD#l91
See: https://github.com/opensvc/multipath-tools/blob/23a01fa679481ff1144139222fbd2c4c863b78f8/multipath/11-dm-mpath.rules#L49
By design, `kola testiso` exercises the metal path. So adding "metal" to
the filenames of things is redundant.
This exercises the new support for multipath on firstboot:
coreos#1011
We should have done this from the very start.  I hit a few issues
here:

- Docs are missing the requirement for a `root` karg
- cosa needs patching to ignore the crash on the console
The new one causes issues and has taken too long to get a fix.
coreos/fedora-coreos-tracker#850

Let's unblock the lockfile bumper.
This allows us to skip a cycle and get the content faster than
waiting on the bot to do it.
coreosbot and others added 28 commits September 19, 2021 20:54
The var-mount test currently only fails on s390x because it uses LUKS
and requires a TPM. Thus add a simple test without TPM, and restrict
the LUKS version to non-s390x.

Also see 51ee72c ("tests: Enable TPM test for all arches except
s390x")

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Fedora Linux 33 is old. Let's use Fedora Linux 34.
The remote ignition file will be used to verify BZ1980679, including
inject kernel arguments and write something to /etc/testfile
config.bu to include remote.ign
verify kernel arg and exists /etc/testfile
This is a temporary workaround to remove a problematic stop command
from `multipathd.service`, until the already-merged proper fix gets
released in dracut.
Let's confirm for now that we're using iptables-legacy by default
until we switch to defaulting to iptables-nft.

See coreos/fedora-coreos-tracker#676
This is now enabled via the `90-default.preset` from fedora-release
(included since commit 83f5e12).

See:
  - https://src.fedoraproject.org/rpms/fedora-release/pull-request/203
  - https://bugzilla.redhat.com/show_bug.cgi?id=1995495

This reverts commit 12ba5c2.
See coreos/fedora-coreos-tracker#966

There is a fix upstream but we need to wait for it to propagate
down into FCOS.
`gce` is the proper platform name, though we should consider
renaming it.
This rework test logic following review feedback, in order to be
easier to read.
This adds a dropin for 'multipathd.socket' adding the same start
conditions that are present on the service unit. It is a temporary
workaround that can be removed once the packaged one is fixed.

Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2008098
It's just sitting there in Bodhi.

We want it for its own sake, but also for
coreos/rpm-ostree#3103 because of
fedora-silverblue/issue-tracker#210 which can
also apply to FCOS, even if having it as a layer is likely rarer here.
We do `udevadm settle` in a few places to e.g. wait for symlinks to
update based on whatever operation we just did. But if `udevadm settle`
fails, we shouldn't fail the boot for it. It could be failing on some
completely unrelated thing (since it waits for all events).

Ideally in those scripts, we'd wait only for the specific events that we
care about. But even so, we should just opportunistically keep booting.
As a plus, even if we end up failing further down, we'll get a clearer
error.

Related: https://bugzilla.redhat.com/show_bug.cgi?id=2009662
Part of coreos/fedora-coreos-tracker#812

In this initial step, we're merely switching the internal
tarball to be a different format.

A future step will change the FCOS pipeline to automatically
push this container to quay.io.
@cgwalters cgwalters changed the base branch from testing-devel to rawhide October 4, 2021 18:46
@cgwalters cgwalters closed this Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.