Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

composefs?? random ENOENTS in CI #2033

Closed
edsantiago opened this issue Jul 11, 2024 · 7 comments · Fixed by #2036
Closed

composefs?? random ENOENTS in CI #2033

edsantiago opened this issue Jul 11, 2024 · 7 comments · Fixed by #2036
Labels

Comments

@edsantiago
Copy link
Member

Seeing a lot of these in my no-flake-retry PR, only on rawhide root, only today:

something something
Error: copying layers and metadata for container "cid":  \
    initializing source containers-storage:test1: \
    extracting layer "sha": \
    lgetxattr /tmp/CI_RQPB/podman-e2e-827543605/imagecachedir/overlay/sha/merged/usr/bin/killall: no such file or directory

or

Error: exporting root file-system diff for "sha": \
    lgetxattr /tmp/CI_RQPB/podman-e2e-827543605/imagecachedir/overlay/sha/merged/usr/bin/bunzip2: no such file or directory

or llistxattr or even just open. There's also a very weird one in podman diff that does not match the same pattern but I would bet is related.

Today's changes are (1) local registry and (2) test composefs on rawhide.

  • rawhide : int podman rawhide root host sqlite
    • 07-11 09:14 in Podman commit podman commit with volumes mounts and no include-volumes
    • 07-11 09:14 in Podman pod create podman create pod with --infra-image
    • 07-11 09:14 in Podman checkpoint podman restore multiple containers from single checkpoint image
  • rawhide : int remote rawhide root host sqlite [remote]
    • 07-11 09:14 in Podman run entrypoint podman run user entrypoint with command overrides image entrypoint and image cmd
x x x x x x
int(4) podman(3) rawhide(4) root(4) host(4) sqlite(4)
remote(1)
@edsantiago
Copy link
Member Author

One more run of my PR, lots more failures. This is looking really bad for composefs.

Note that one of those failures is in diff and it exhibits the same ENOENT symptom.

  • rawhide : int podman rawhide root host sqlite
    • 07-11 13:45 in Podman checkpoint podman checkpoint and restore container with root file-system changes
    • 07-11 13:45 in Podman run podman run --seccomp-policy image (bogus profile)
    • 07-11 13:45 in Podman build podman build --from, --add-host, --cap-drop, --cap-add
    • 07-11 09:14 in Podman commit podman commit with volumes mounts and no include-volumes
    • 07-11 09:14 in Podman pod create podman create pod with --infra-image
    • 07-11 09:14 in Podman checkpoint podman restore multiple containers from single checkpoint image
  • rawhide : int remote rawhide root host sqlite [remote]
    • 07-11 13:46 in Podman diff podman image diff
    • 07-11 09:14 in Podman run entrypoint podman run user entrypoint with command overrides image entrypoint and image cmd
x x x x x x
int(8) podman(6) rawhide(8) root(8) host(8) sqlite(8)
remote(2)

@edsantiago
Copy link
Member Author

Another CI run, another bunch of failures, I'm not going to bother posting them all.

One curious finding: so far, no failure in sys tests. Could it be that the magic command-line --pull-option is not working? Maybe one of those options is necessary, but e2e tests aren't actually getting them?

@Luap99
Copy link
Member

Luap99 commented Jul 12, 2024

I don't think the problem would be the cli option. I think one major difference is that the e2e tests use this special imagestore setup on the the main regular store for images to share images

@Luap99
Copy link
Member

Luap99 commented Jul 12, 2024

@giuseppe PTAL

@giuseppe
Copy link
Member

sorry for not catching this at review time, we need to pass each flag with a different --pull-option: containers/podman#23257, that is not a fix for this case though, since we want to support non-composefs additional stores as well.

I've also bumped the c/storage dependency as there are two fixes for composefs that might be helpful in this case.

I am still investigating the problem though, trying to reproduce locally.

@edsantiago
Copy link
Member Author

I've rebased my no-retries PR, on main which now includes containers/podman#23257, and I'm sorry to say that the flake persists.

@giuseppe giuseppe transferred this issue from containers/podman Jul 16, 2024
@giuseppe giuseppe added the jira label Jul 17, 2024
giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
when NaiveDiff is used, the Diff/Changes operations can trigger the
mount of the layer.  Prevent that multiple processes step on each
other and one of them performs an unmount while the other one is still
accessing the mount.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
use a private "merged" directory when mounting from an additional
store.

Operations like "Diff()" and "Changes()" cause an implicit mount when
the naive differ is used.

The issue was not observed earlier because native overlay can achieve
these operations without using a mount.

Since these mounts are performed read-only, and overlay supports
multiple mounts using the same lowerdirs, use a private location for
the "merged" directory.  The location is owned by the current
writeable store, that is locked for writing.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
when NaiveDiff is used, the Diff/Changes operations can trigger the
mount of the layer.  Prevent that multiple processes step on each
other and one of them performs an unmount while the other one is still
accessing the mount.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
use a private "merged" directory when mounting from an additional
store.

Operations like "Diff()" and "Changes()" cause an implicit mount when
the naive differ is used.

The issue was not observed earlier because native overlay can achieve
these operations without using a mount.

Since these mounts are performed read-only, and overlay supports
multiple mounts using the same lowerdirs, use a private location for
the "merged" directory.  The location is owned by the current
writeable store, that is locked for writing.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member

potential fix: #2036, I am still validating it

I'll update here when it is ready for testing/review

giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
when NaiveDiff is used, the Diff/Changes operations can trigger the
mount of the layer.  Prevent that multiple processes step on each
other and one of them performs an unmount while the other one is still
accessing the mount.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
use a private "merged" directory when mounting from an additional
store.

Operations like "Diff()" and "Changes()" cause an implicit mount when
the naive differ is used.

The issue was not observed earlier because native overlay can achieve
these operations without using a mount.

Since these mounts are performed read-only, and overlay supports
multiple mounts using the same lowerdirs, use a private location for
the "merged" directory.  The location is owned by the current
writeable store, that is locked for writing.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
when NaiveDiff is used, the Diff/Changes operations can trigger the
mount of the layer.  Prevent that multiple processes step on each
other and one of them performs an unmount while the other one is still
accessing the mount.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
use a private "merged" directory when mounting from an additional
store.

Operations like "Diff()" and "Changes()" cause an implicit mount when
the naive differ is used.

The issue was not observed earlier because native overlay can achieve
these operations without using a mount.

Since these mounts are performed read-only, and overlay supports
multiple mounts using the same lowerdirs, use a private location for
the "merged" directory.  The location is owned by the current
writeable store, that is locked for writing.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
when NaiveDiff is used, the Diff/Changes operations can trigger the
mount of the layer.  Prevent that multiple processes step on each
other and one of them performs an unmount while the other one is still
accessing the mount.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
use a private "merged" directory when mounting from an additional
store.

Operations like "Diff()" and "Changes()" cause an implicit mount when
the naive differ is used.

The issue was not observed earlier because native overlay can achieve
these operations without using a mount.

Since these mounts are performed read-only, and overlay supports
multiple mounts using the same lowerdirs, use a private location for
the "merged" directory.  The location is owned by the current
writeable store, that is locked for writing.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
use a private "merged" directory when mounting from an additional
store.

Operations like "Diff()" and "Changes()" cause an implicit mount when
the naive differ is used.

The issue was not observed earlier because native overlay can achieve
these operations without using a mount.

Since these mounts are performed read-only, and overlay supports
multiple mounts using the same lowerdirs, use a private location for
the "merged" directory.  The location is owned by the current
writeable store, that is locked for writing.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/storage that referenced this issue Jul 17, 2024
use a private "merged" directory when mounting from an additional
store.

Operations like "Diff()" and "Changes()" cause an implicit mount when
the naive differ is used.

The issue was not observed earlier because native overlay can achieve
these operations without using a mount.

Since these mounts are performed read-only, and overlay supports
multiple mounts using the same lowerdirs, use a private location for
the "merged" directory.  The location is owned by the current
writeable store, that is locked for writing.

Closes: containers#2033

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants