Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support overlayfs whiteouts on checkout #2717

Merged
merged 1 commit into from
Sep 28, 2022

Conversation

mangelajo
Copy link
Contributor

Introduces an intermediate format for overlayfs storage, where
.wh-ostree. prefixed files will be converted into char 0:0
whiteout devices used by overlayfs to mark deletions across layers.

Related-Issue: #2712

@openshift-ci
Copy link

openshift-ci bot commented Sep 19, 2022

Hi @mangelajo. Thanks for your PR.

I'm waiting for a ostreedev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member

/ok-to-test

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add ostree commit --retarget-whiteouts or so which would automatically do the inverse side. Now AIUI today osbuild uses rpm-ostree compose commit which could also use a similar API.

But neither of these are blockers.

@@ -35,6 +35,8 @@
#define WHITEOUT_PREFIX ".wh."
#define OPAQUE_WHITEOUT_NAME ".wh..wh..opq"

#define OVERLAYFS_WHITEOUT_PREFIX ".wh-ostree."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

Suggested change
#define OVERLAYFS_WHITEOUT_PREFIX ".wh-ostree."
#define OSTREE_TARGET_WHITEOUT_PREFIX ".ostree-wh."

Here "ostree_" is first because it's a thing conceptually namespaced/scoped to ostree, not to overlayfs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

AT_SYMLINK_NOFOLLOW) < 0))
return glnx_throw_errno_prefix (error, "fchownat");
}
need_copy = FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably just return TRUE; here too - what we're skipping over is the caching stuff which isn't relevant for this.


g_assert (name[0] != '/'); /* Sanity */

if (mknodat(destination_dfd, name, (source_mode & ~S_IFMT) | S_IFCHR, (dev_t)0) < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to verify here is whether we also need to do a fchmodat - I think the mode we set is modified by umask, so in most cases we always unconditionally chmod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point I will verify this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, looking at mknodat documentation:

The file mode is modified by the process's umask in the usual way: in the absence of a default ACL, the permissions of the created node are (mode & ~umask).

So we also need fchmodat

@@ -643,6 +647,32 @@ checkout_one_file_at (OstreeRepo *repo,

need_copy = FALSE;
}
else if (is_overlayfs_whiteout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we may want to make this behavior potentially opt-in? So far ostree has no "magical filenames" - this would be the first.

A concern I have here is that e.g. flatpak would suddenly gain exposure to this. And as we discussed it's probably fine...the worst I can think of is that someone using flatpak on top of overlayfs wouldn't be able to install an app that included a file with this name.

But still, by requiring an opt-in flag we're at least reducing the blast radius.

I think we can then hange the deploy path to pass this flag by default. This feature would clearly be useful for everyone using ostree-for-bootable-systems.

Though I think I'd also be OK having it opt-out (i.e. we add a flag to disable this) instead; then if we discover other use cases don't want it they could turn it off. But opt-in seems safer and only requires a little bit more work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that opt-in as you said,may require changes in the deployment path when we install or upgrade the system. Let me talk to the RHEL4Edge team and try figure out how easy/hard it's to include the flag, probably it's easy or even under the control of the image creator so not a big deal, but I'm not familiar with that.

By looking at our composer installer image I see:

ostreesetup --nogpg --osname=rhel --remote=edge --url=file:///run/install/repo/ostree/repo --ref=rhel/8/x86_64/edge

Do you know where ostreesetup lives?

Note: It seems like any regular user with no special privileges can create char 0:0 devices , and that device always returns error.

As you said the issue would be if trying to checkout inside an overlayfs system.

Copy link
Member

@cgwalters cgwalters Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my proposal here, the ostree admin deploy code would by default pass this option down when it does the checkout. This means no modifications would be necessary for anything higher level than this project.

Conceptually ostree has two separate parts; there's a git-like core represented by OstreeRepo, and the sysroot/deploy bits in OstreeSysroot. The split isn't 100% clean (the magical ostree/1/0/1 type refs are part of that) but that's the idea.

Your modifications here are to OstreeRepo, which is the core of ostree that's used in flatpak and other places, whereas OstreeSysroot is scoped only to bootable host systems - and that's where we want to handle this.

Hmm actually looking at this we of course already have

typedef struct {
...
  gboolean process_whiteouts;
...
} OstreeRepoCheckoutAtOptions;

And...confusingly that's for a different model of whiteouts where we want ostree to handle them, not the container engine. Anyways so...basically I'm saying we'd add gboolean process_passthrough_whiteouts; there, and then in checkout_deployment_tree set that boolean or so. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an opt-out for now, but happy to change into opt-in if we see it's doable in rhel4edge or for any other reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an opt-out for now, but happy to change into opt-in if we see it's doable in rhel4edge or for any other reason.

oh sorry I didn't see you had written comments in the meantime, reading ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds like a good plan to me

Questions to make sure I understood right:

  • so, RHEL uses ostree admin underneath ?
  • I guess I also must verify that switch/upgrade will use the right flag too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ostree admin is just a small CLI for the core libostree C shared library, which is also used by rpm-ostree (e.g. here), which is what's used by most things in Fedora/RHEL today. So modifying the code here will affect everything we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I also must verify that switch/upgrade will use the right flag too

Basically by design, every single ostree operation ends up reconstructing a new filesystem via hardlinks - it never tries to do an "in place" update. checkout_deployment_tree() is the function that does that, and everything else calls it.

Copy link
Contributor Author

@mangelajo mangelajo Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, understood

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ostree_sysroot_deploy_tree -> ostree_sysroot_deploy_tree_with_options -> sysroot_initialize_deployment -> checkout_deploytment_tree -> ostree_repo_checkout_at

@cgwalters
Copy link
Member

Have you been able to prove out that this approach seems viable via a post-processing script that renames the whiteouts in this way? I'm OK landing a minimal patch that only does the checkout side.

@mangelajo
Copy link
Contributor Author

Have you been able to prove out that this approach seems viable via a post-processing script that renames the whiteouts in this way? I'm OK landing a minimal patch that only does the checkout side.

I have manually tested the creation of whiteout files and checkout, but I will verify it with the whole lifecycle

I also checked how overlayfs works for directory deletion and I guess we are covered there:

https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt

says:

whiteouts and opaque directories

In order to support rm and rmdir without changing the lower
filesystem, an overlay filesystem needs to record in the upper filesystem
that files have been removed. This is done using whiteouts and opaque
directories (non-directories are always opaque).

A whiteout is created as a character device with 0/0 device number.
When a whiteout is found in the upper level of a merged directory, any
matching name in the lower level is ignored, and the whiteout itself
is also hidden.

A directory is made opaque by setting the xattr "trusted.overlay.opaque"
to "y"
. Where the upper filesystem contains an opaque directory, any
directory in the lower filesystem with the same name is ignored.

@mangelajo
Copy link
Contributor Author

I'm OK landing a minimal patch that only does the checkout side.

Thanks Colin

@@ -193,6 +195,7 @@ process_one_checkout (OstreeRepo *repo,
checkout_options.force_copy = opt_force_copy;
checkout_options.force_copy_zerosized = opt_force_copy_zerosized;
checkout_options.bareuseronly_dirs = opt_bareuseronly_dirs;
checkout_options.process_passthrough_whiteouts = opt_process_passthrough_whiteouts;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking why this option is not getting down to the callee down the line in my local testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ ./ostree checkout --process-passthrough-whiteouts -v --repo=./testrepo-bare test/stable/x86_64 ./rootfs-out
.ostree-wh.whiteout is_overlayfs_whiteout: 1, enabled:0
not-a-whiteout-just-empty is_overlayfs_whiteout: 0, enabled:0
.ostree-wh.whiteout is_overlayfs_whiteout: 1, enabled:0
.ostree-wh.whiteout2 is_overlayfs_whiteout: 1, enabled:0
+ ls -la rootfs-out
total 8
drwxr-xr-x. 1 majopela majopela  102 Jan  1  1970 .
drwxr-xr-x. 1 majopela majopela 2058 Sep 20 14:24 ..
drwxr-xr-x. 1 majopela majopela   78 Jan  1  1970 another
-rw-r--r--. 1 majopela majopela    0 Sep 20 14:24 not-a-whiteout-just-empty
-rw-r--r--. 1 majopela majopela    0 Sep 20 14:24 .ostree-wh.whiteout
+ ls -la rootfs-out/another
total 0
drwxr-xr-x. 1 majopela majopela  78 Jan  1  1970 .
drwxr-xr-x. 1 majopela majopela 102 Jan  1  1970 ..
-rw-r--r--. 1 majopela majopela   0 Sep 20 14:24 .ostree-wh.whiteout
-rw-r--r--. 1 majopela majopela   0 Sep 20 14:24 .ostree-wh.whiteout2
+ getfattr -d -m '.*' rootfs-out/whiteout
getfattr: rootfs-out/whiteout: No such file or directory

I'm missing something here... ':D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated those for debugging:

  const gboolean is_overlayfs_whiteout = (!is_symlink && g_str_has_prefix (destination_name, OVERLAYFS_WHITEOUT_PREFIX));

  printf("%s is_overlayfs_whiteout: %d, enabled:%d\n", destination_name, is_overlayfs_whiteout, options->process_passthrough_whiteouts);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait... :)

  /* This strange code structure is to preserve testing
   * coverage of both `ostree_repo_checkout_tree` and
   * `ostree_repo_checkout_at` until such time as we have a more
   * convenient infrastructure for testing C APIs with data.
   */
  if (opt_disable_cache || opt_whiteouts || opt_require_hardlinks ||
      opt_union_add || opt_force_copy || opt_force_copy_zerosized ||
      opt_bareuseronly_dirs || opt_union_identical ||
      opt_skiplist_file || opt_selinux_policy || opt_selinux_prefix ||
      opt_process_passthrough_whiteouts)
    {

@mangelajo
Copy link
Contributor Author

I used this as a small functional test:


[majopela@lenovo ostree]$ cat test-lifecycle.sh
#!/bin/sh
set -xe
sudo rm -rf ./tmp
podman pull --root ./tmp quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ba44dead03ea74107f90d58525106fb52d27a120b73c6cc8e2be31d37043ca1c

find ./tmp -type d -exec getfattr -d {} \;
# this chokes on directories or filenames with spaces
find ./tmp -type c -exec sh -c 'touch "$(dirname $1)/.ostree-wh.$(basename $1)"; rm -f "$1"; echo $1' _ {} \;

mode=bare
repo="./testrepo-${mode}"
sudo rm -rf $repo rootfs-out
./ostree --repo=$repo init --mode=$mode
sudo ./ostree commit --repo=$repo ./tmp --branch test/stable/x86_64
sudo ./ostree checkout --process-passthrough-whiteouts -v --repo=$repo test/stable/x86_64 ./rootfs-out

find ./rootfs-out -type c

and it looks good, I´m going to feed it into crio as additional storage now and see if the whiteouts work.

@mangelajo
Copy link
Contributor Author

ok, running the image from an additionalImageStore extracted with ostree checkout and the new flag works correctly.

I guess remaining parts for this patch would be:

  1. overwrite modes implemented
  2. testing , I accept advice here userland testing would be ok?

and as additional PRs:

a) on this repository, adding the ability to encode whiteouts as .ostree-wh. files automatically
b) QUESTION: would this still need rpm-ostree support? or does rpm-ostree simply consume this library.

@cgwalters
Copy link
Member

testing , I accept advice here userland testing would be ok?

Yeah that's fine

b) QUESTION: would this still need rpm-ostree support? or does rpm-ostree simply consume this library.

I'm 97.3% sure no changes to rpm-ostree are necessary for this.

@mangelajo
Copy link
Contributor Author

b) QUESTION: would this still need rpm-ostree support? or does rpm-ostree simply consume this library.

I'm 97.3% sure no changes to rpm-ostree are necessary for this.

🤣

@mangelajo
Copy link
Contributor Author

Ok, this should be now feature complete, writing tests.

Later down I can squash or reorganize the commits as necessary

@mangelajo mangelajo force-pushed the support-whiteouts branch 2 times, most recently from 1285328 to 3dba744 Compare September 26, 2022 16:32
GError **error)
{
guint32 file_mode = g_file_info_get_attribute_uint32 (file_info, "unix::mode");
if (mknodat(destination_dfd, destination_name, (file_mode & ~S_IFMT) | S_IFCHR, (dev_t)0) < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we store the file type in the ostree mode too? Curious why you're explicitly forcing on S_IFCHR here.

Copy link
Contributor Author

@mangelajo mangelajo Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ostree it's stored as a regular file, otherwise in bare mode we would have to create character devices to get those bits preserved. Then it would look more like my original approach here #2713 which was more complex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or are we extracting the S_IFCHR before this call? checking,..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, no, we just detect the .ostree-wh. prefix for a regular empty file, which removes the prefix and triggers this call

@cgwalters
Copy link
Member

Hm, looks like all the tests are failing with
2022-09-26T16:41:39.8776463Z error: Checking out deployment tree: fchmodat: Operation not supported...I wonder if this is related to the file type bits above.

@mangelajo mangelajo force-pushed the support-whiteouts branch 2 times, most recently from 3521d03 to 4de29a0 Compare September 27, 2022 13:38
@mangelajo
Copy link
Contributor Author

Hm, looks like all the tests are failing with 2022-09-26T16:41:39.8776463Z error: Checking out deployment tree: fchmodat: Operation not supported...I wonder if this is related to the file type bits above.

it seemed to be that not all glibs support the NO_FOLLOW_SYMLINK flag which we don't use in other parts of the code.

Comment on lines 628 to 642
if (_checkout_overlayfs_whiteout_at_no_overwrite(options, destination_dfd, destination_name,
file_info, xattrs, cancellable, error))
return TRUE;

if (!error || !g_error_matches(*error, G_IO_ERROR, G_IO_ERROR_EXISTS))
return FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't allowed by GError conventions. It's allowed to pass NULL to ignore errors, in the caller, though this isn't used much in the ostree stack.

This topic heavily relates to e.g. https://users.rust-lang.org/t/why-i-use-anyhow-error-even-in-libraries/68592 where I argue that we want to separate out the "expected/likely error" case from the "disk is failing" cases.

Specifically here, we need to pass back the "found an extant file" as a separate boolean. There's precedent for this in APIs like glnx_fstatat_allow_noent() and ot_openat_ignore_enoent etc.

Something a bit more like:

  gboolean found_extant_file = FALSE;
  if (!_checkout_overlayfs_whiteout_at_no_overwrite(options, destination_dfd, destination_name,
                                                   file_info, xattrs, &found_extant_file, cancellable, error))
    return FALSE;
 
  if (!found_extant_file)
    return TRUE;

  /* fallthrough */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, handling it

case OSTREE_REPO_CHECKOUT_OVERWRITE_NONE:
return FALSE;
case OSTREE_REPO_CHECKOUT_OVERWRITE_UNION_FILES:
g_clear_error(error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we did the above, then we wouldn't need this.

Introduces an intermediate format for overlayfs storage, where
.wh-ostree. prefixed files will be converted into char 0:0
whiteout devices used by overlayfs to mark deletions across layers.

The CI scripts now uses a volume for the scratch directories
previously in /var/tmp otherwise we cannot create whiteout
devices into an overlayfs mounted filesystem.

Related-Issue: ostreedev#2712
@mangelajo
Copy link
Contributor Author

@cgwalters I changed the approach to error handling as recommended, let me know how it looks

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cgwalters cgwalters merged commit b04dde8 into ostreedev:main Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants