Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

composefs: random EINVALs #2042

Open
edsantiago opened this issue Jul 24, 2024 · 29 comments
Open

composefs: random EINVALs #2042

edsantiago opened this issue Jul 24, 2024 · 29 comments
Labels
area/composefs composefs related changes

Comments

@edsantiago
Copy link
Member

Sometimes on commit, sometimes on diff

# podman-remote [options] commit -q testcon newimage
Error: copying layers and metadata for container "cecc9c90f4107cf89808b8234df98e173e1bf052a006665f460a1e86f8d070b8": initializing source containers-storage:testcon: extracting layer "1745071114b3933b11b73a9f099d2b82d4e80262c1211b1b6e7f388466ebdf3c": invalid argument
# podman [options] --pull-option=enable_partial_images=true --pull-option=convert_images=true container checkpoint --create-image alpine-checkpoint-bmjmtt --keep e3793c9b35baf7d6a62a976fb4da4e4fc139c535edd159cd6edcf96a8b4e1d02
Error: exporting root file-system diff for "e3793c9b35baf7d6a62a976fb4da4e4fc139c535edd159cd6edcf96a8b4e1d02": invalid argument
  • rawhide : int podman rawhide root host sqlite
    • 07-24 07:14 in Podman checkpoint podman checkpoint --create-image with running container
  • rawhide : int remote rawhide root host sqlite [remote]
    • 07-22 11:56 in Podman commit podman commit adds exposed ports
x x x x x x
int(2) podman(1) rawhide(2) root(2) host(2) sqlite(2)
remote(1)
cgwalters added a commit to cgwalters/storage that referenced this issue Jul 24, 2024
Motivated by containers#2042
where we just get a bare `invalid argument` out of the entire
storage stack.

My offhand guess skimming some of the code is by far
the most likely thing here is the raw `lgetxattr` call. It'd
be useful to know that for sure.

Signed-off-by: Colin Walters <walters@verbum.org>
@edsantiago
Copy link
Member Author

Another one on commit:

# podman-remote [options] commit -q -f docker --message testing-commit test1 foobar.com/test1-image:latest
Error: copying layers and metadata for container "0e4bc8c8f422fea0f64b850c57307c45d172c136e1e503a60c0f2ea1c0c6c894": initializing source containers-storage:test1: extracting layer "8a6b0c5bf45deb44539988fe8f0214bff28b79dd03cacc9246febc6c7b3a1dce": failed to mount erofs image at "/tmp/CI_uMrH/podman-e2e-1025971979/imagecachedir/overlay/03901b4a2ea88eeaad62dbe59b072b28b6efa00491962b8741081c5df50c65e0/composefs-layers/0": invalid argument

@giuseppe
Copy link
Member

this is more helpful, thanks! The erofs image could be corrupted

CC @cgwalters

@cgwalters
Copy link
Contributor

The erofs image could be corrupted

Hmm...you think possibly some container image -> mkcomposefs -> (internal mkfs.erofs-alike) -> kernel refusing to read it?

I'm unfortunately not yet familiar with the podman test suite, but is this 100% reproducible, so basically setting up composefs for c/storage and run that relevant test?

@cgwalters cgwalters added the area/composefs composefs related changes label Jul 29, 2024
@giuseppe
Copy link
Member

I'm unfortunately not yet familiar with the podman test suite, but is this 100% reproducible, so basically setting up composefs for c/storage and run that relevant test?

I've tried to get as close as possible to the CI setup but I've not managed to reproduce it locally yet :/

Might be worth testing with a version of podman that stores somewhere the erofs image that fails on mount, so we can analyze it.

@cgwalters
Copy link
Contributor

Might be worth testing with a version of podman that stores somewhere the erofs image that fails on mount, so we can analyze it.

Maybe:

  • If the image fails to mount, rename it as .bad or so (or just create a symlink/hardlink with that name)
  • Change podman CI to scrape all such images and store them as cirrus CI artifacts

@giuseppe
Copy link
Member

a couple of improvements for composefs that could help to detect potential failures when creating the erofs image:

@hsiangkao
Copy link
Contributor

Might be worth testing with a version of podman that stores somewhere the erofs image that fails on mount, so we can analyze it.

Maybe:

  • If the image fails to mount, rename it as .bad or so (or just create a symlink/hardlink with that name)

If a image fails to mount, it could just have corrupted on-disk superblock and/or rootinode. fsck.erofs with -d9 could give more information.
Also if dmesg is available, it would give more hints in it too.

@alexlarsson
Copy link
Contributor

Yeah, we probably need the dmesg to figure this one out.

@edsantiago
Copy link
Member Author

At the top of each log is a link to journal. Here is the latest flake list as of this morning:

  • rawhide : int podman rawhide root host sqlite
    • 08-02 08:11 in Podman checkpoint podman checkpoint container with export and try to change the runtime
    • 07-31 20:50 in Podman run podman run --seccomp-policy image (bogus profile)
    • 07-31 17:50 in Podman checkpoint podman checkpoint --create-image with running container
    • 07-31 11:20 in Podman checkpoint podman checkpoint and run exec in restored container
    • 07-26 07:53 in Podman images podman images filter intermediate
    • 07-24 07:14 in Podman checkpoint podman checkpoint --create-image with running container
  • rawhide : int remote rawhide root host sqlite [remote]
    • 08-05 07:47 in Podman run podman run --seccomp-policy image (block all syscalls)
    • 08-04 16:23 in Podman run podman run --seccomp-policy image (bogus profile)
    • 08-04 09:44 in Podman run entrypoint podman run entrypoint with cmd
    • 08-02 08:12 in Podman run entrypoint podman run entrypoint with user cmd no image cmd
    • 08-02 08:12 in Podman push podman push to local registry with authorization
    • 08-01 12:50 in Podman commit podman commit should not commit env secret
    • 08-01 12:50 in Podman checkpoint podman checkpoint --create-image with running container
    • 07-31 23:06 in Podman diff podman diff latest container
    • 07-31 20:51 in Podman checkpoint podman restore multiple containers from single checkpoint image
    • 07-31 17:50 in Podman images podman builder prune
    • 07-29 15:03 in Podman diff podman image diff
    • 07-24 14:13 in Podman commit podman commit container with message
    • 07-22 11:56 in Podman commit podman commit adds exposed ports
x x x x x x
int(19) remote(13) rawhide(19) root(19) host(19) sqlite(19)
podman(6)

@alexlarsson
Copy link
Contributor

I looked at the first one, and its clearly an error in this code:

if err := unix.Mount(loop.Name(), mountPoint, "erofs", unix.MS_RDONLY, mountOpts); err != nil {
	return fmt.Errorf("failed to mount erofs image at %q: %w", mountPoint, err)

Where mountOpts is either empty or "noacl". But I don't see any errors reported from erofs in the journal around that time (or indeed ever). So, where is the EINVAL coming from? If the image was somehow corrupt, shouldn't erofs give some kernel log? Is it the loop device name that is invalid or something?

@alexlarsson
Copy link
Contributor

The second one has:
msg="Unmount "/tmp/CI_5HQJ/podman-e2e-1833510854/imagecachedir/overlay/7ffe79913b6cd452b42b4aef7f51ff4eeecbdde76a464ef55c321a345faaa4a6/composefs-layers/0": invalid argument"

Now, we're failing to unmount too, with EINVAL. That is very strange...

@hsiangkao
Copy link
Contributor

The second one has: msg="Unmount "/tmp/CI_5HQJ/podman-e2e-1833510854/imagecachedir/overlay/7ffe79913b6cd452b42b4aef7f51ff4eeecbdde76a464ef55c321a345faaa4a6/composefs-layers/0": invalid argument"

Now, we're failing to unmount too, with EINVAL. That is very strange...

anyway, clearly erofs itself doesn't return EINVAL on unmount(), I'm not sure how it happens, maybe the directory is already unmounted so it's not a mountpoint though.

@alexlarsson
Copy link
Contributor

Yeah, or maybe the mount failed.

@alexlarsson
Copy link
Contributor

umount:
EINVAL target is not a mount point.

@hsiangkao
Copy link
Contributor

Yeah, or maybe the mount failed.

yeah, anyway, it'd be better to have some dmesg result, since it seems (I think) there are enough kernel prints in the mount failure path.

@alexlarsson
Copy link
Contributor

@hsiangkao Well, the thing is that we see these in the journal logs:

Aug 02 07:11:37 cirrus-task-6203064640602112 kernel: erofs: (device loop1): mounted with root inode @ nid 36.

So, should we not see if there were any other erofs error messages?

@hsiangkao
Copy link
Contributor

@hsiangkao Well, the thing is that we see these in the journal logs:

Aug 02 07:11:37 cirrus-task-6203064640602112 kernel: erofs: (device loop1): mounted with root inode @ nid 36.

So, should we not see if there were any other erofs error messages?

I think if it's printed, this mount is already successed in erofs itself, see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/super.c?h=v6.10#n687

so at least mount() won't return EINVAL at least for this time of mount.

Is it the exact time to fail?

@alexlarsson
Copy link
Contributor

@hsiangkao No, that is just an example of some dmesg output from the journal logs we have. What I meant was, if these successful mounts are reported (and they are), should not also failed mounts be reported in the logs?

I have been unable to find any other erofs log output other than copies of the above example.

@hsiangkao
Copy link
Contributor

hsiangkao commented Aug 6, 2024

@hsiangkao No, that is just an example of some dmesg output from the journal logs we have. What I meant was, if these successful mounts are reported (and they are), should not also failed mounts be reported in the logs?

I have been unable to find any other erofs log output other than copies of the above example.

If there is no "erofs" dmesg log returned in the kernel message, I guess EINVAL wasn't returned by erofs, maybe that is what you mentioned just now: loop device is invalid or likewise...

since it seems that currently erofs added error messages to all error paths in erofs_fc_fill_super():
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/super.c?h=v6.10#n579

@hsiangkao
Copy link
Contributor

@hsiangkao No, that is just an example of some dmesg output from the journal logs we have. What I meant was, if these successful mounts are reported (and they are), should not also failed mounts be reported in the logs?
I have been unable to find any other erofs log output other than copies of the above example.

If there is no "erofs" dmesg log returned in the kernel message, I guess EINVAL wasn't returned by erofs, maybe that is what you mentioned just now: loop device is invalid or likewise...

since it seems that currently erofs added error messages to all error paths in erofs_fc_fill_super(): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/super.c?h=v6.10#n579

oh, I'm not sure how errorfc() actually works now though...

@alexlarsson
Copy link
Contributor

alexlarsson commented Aug 6, 2024

Hmm, yeah, errorfc() seems to log to the error buffer in the fs context, at least in some cases, and you can then extract these messages via the fd (as in e.g. https://github.com/torvalds/linux/blob/b446a2dae984fa5bd56dd7c3a02a426f87e05813/samples/vfs/test-fsmount.c#L20). Does this mean those errors are not logged though?

@hsiangkao
Copy link
Contributor

Hmm, yeah, errorfc() seems to log to the error buffer in the fs context, at least in some cases, and you can then extract these messages via the fd (as in e.g. https://github.com/torvalds/linux/blob/b446a2dae984fa5bd56dd7c3a02a426f87e05813/samples/vfs/test-fsmount.c#L20). Does this mean those errors are not logged though?

Not tried though, but I guess those errorfc()s are hardly triggered:
super.c: errorfc(fc, "failed to set initial blksize");
super.c: errorfc(fc, "unsupported blksize for fscache mode");
super.c: errorfc(fc, "failed to set erofs blksize");
super.c: errorfc(fc, "DAX unsupported by block device. Turning off DAX.");
super.c: errorfc(fc, "unsupported blocksize for DAX");

Anyway, I tend to guess that EINVAL wasn't returned by erofs. It looks like invalid device or something.
I think I will provide file-backed mount in the next version.

@alexlarsson
Copy link
Contributor

If anything I'd expect it to be this:

super.c: ret = -EINVAL;
super.c- if (le32_to_cpu(dsb->magic) != EROFS_SUPER_MAGIC_V1) {
super.c- erofs_err(sb, "cannot find valid erofs superblock");
super.c- goto out;
super.c- }

@edsantiago
Copy link
Member Author

ping, what is the status of this?

  • rawhide : int podman rawhide root host sqlite
    • 08-11 09:26 in Podman checkpoint podman checkpoint container with export and try to change the runtime
    • 08-06 11:36 in Podman checkpoint podman restore multiple containers from single checkpoint image
    • 08-05 09:48 in Podman play kube with build Check that image is built using Containerfile
    • 08-02 08:11 in Podman checkpoint podman checkpoint container with export and try to change the runtime
    • 07-31 20:50 in Podman run podman run --seccomp-policy image (bogus profile)
    • 07-31 17:50 in Podman checkpoint podman checkpoint --create-image with running container
    • 07-31 11:20 in Podman checkpoint podman checkpoint and run exec in restored container
    • 07-26 07:53 in Podman images podman images filter intermediate
    • 07-24 07:14 in Podman checkpoint podman checkpoint --create-image with running container
  • rawhide : int remote rawhide root host sqlite [remote]
    • 08-14 11:43 in Podman checkpoint podman restore multiple containers from single checkpoint image
    • 08-13 16:48 in Podman checkpoint podman restore multiple containers from single checkpoint image
    • 08-06 17:29 in Podman commit podman commit container with change CMD flag
    • 08-06 17:29 in Podman checkpoint podman restore multiple containers from single checkpoint image
    • 08-06 17:29 in Podman checkpoint podman checkpoint a container started with --rm
    • 08-06 15:38 in Podman checkpoint podman restore multiple containers from multiple checkpoint images
    • 08-05 07:47 in Podman run podman run --seccomp-policy image (block all syscalls)
    • 08-04 16:23 in Podman run podman run --seccomp-policy image (bogus profile)
    • 08-04 09:44 in Podman run entrypoint podman run entrypoint with cmd
    • 08-02 08:12 in Podman run entrypoint podman run entrypoint with user cmd no image cmd
    • 08-02 08:12 in Podman push podman push to local registry with authorization
    • 08-01 12:50 in Podman commit podman commit should not commit env secret
    • 08-01 12:50 in Podman checkpoint podman checkpoint --create-image with running container
    • 07-31 23:06 in Podman diff podman diff latest container
    • 07-31 20:51 in Podman checkpoint podman restore multiple containers from single checkpoint image
    • 07-31 17:50 in Podman images podman builder prune
    • 07-29 15:03 in Podman diff podman image diff
    • 07-24 14:13 in Podman commit podman commit container with message
    • 07-22 11:56 in Podman commit podman commit adds exposed ports
x x x x x x
int(28) remote(19) rawhide(28) root(28) host(28) sqlite(28)
podman(9)

@cgwalters
Copy link
Contributor

I'm not totally sure this is all related to composefs; for example:

Error: exporting root file-system diff for "9c8758400f1191794e89029144d40892208ac0b3b11d01dc6e21703dee4a2589": failed to move mount: invalid argument

@edsantiago can you elaborate a bit on the background on this? Basically two questions:

  • Are these tests only running against Fedora rawhide, and we have no data on whether they also appear on e.g. f40/c10s/c9s?
  • Are we sure they only reproduce when composefs is enabled for c/storage? i.e. did we just enable composefs for rawhide or are there separate composefs-only jobs?

@cgwalters
Copy link
Contributor

super.c- erofs_err(sb, "cannot find valid erofs superblock");

Right, I agree that seems like the most likely source of an EINVAL from the erofs layer - and https://github.com/containers/composefs/releases/tag/v1.0.5 contains containers/composefs@76b4da5 which has a "possible" fix for this (though seeing it would just turn the question into how corrupt composefs erofs files were being generated).

Given that some of the EINVALs we're seeing here are coming from what looks like generic VFS operations, I think there's either a kernel bug/regression, or somehow us using composefs tickles other bugs generically for the VFS. It'd be really useful to have a bit more data on these tests on other OS versions.

@edsantiago
Copy link
Member Author

Are these tests only running against Fedora rawhide, and we have no data on whether they also appear on e.g. f40/c10s/c9s?

We are only testing composefs in rawhide.

Are we sure they only reproduce when composefs is enabled for c/storage? i.e. did we just enable composefs for rawhide or are there separate composefs-only jobs?

When composefs is enabled, it takes place on all tests. With the exception of any tests that use their own private $CONTAINERS_STORAGE_CONF but I only see two tests that do so.

@edsantiago
Copy link
Member Author

Still happening

@giuseppe
Copy link
Member

thanks, the change in c/storage helped just a little bit. There are a bunch of reasons why move_mount could fail with EINVAL but I can't see any obvious one where we could end up given the mount happens with the storage locked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/composefs composefs related changes
Projects
None yet
Development

No branches or pull requests

5 participants