Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: disable artifacts cache with composefs #23274

Merged
merged 3 commits into from
Jul 24, 2024

Conversation

giuseppe
Copy link
Member

layers restored from a tarball won't be converted to composefs (for now at least), so disable the cache when using composefs.

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 15, 2024
@giuseppe giuseppe force-pushed the no-artifacts-composefs branch 3 times, most recently from 3712f6f to 11d155c Compare July 15, 2024 10:29
@giuseppe giuseppe marked this pull request as ready for review July 15, 2024 11:05
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 15, 2024
@giuseppe giuseppe marked this pull request as draft July 15, 2024 12:22
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 15, 2024
@giuseppe giuseppe force-pushed the no-artifacts-composefs branch 2 times, most recently from 28447aa to 5d6c264 Compare July 15, 2024 13:28
@@ -106,7 +107,7 @@ func (matcher *exitCleanlyMatcher) Match(actual interface{}) (success bool, err
}

// Exit status is 0. Now check for anything on stderr
if stderr != "" {
if stderr != "" && os.Getenv("NO_TEST_CACHE") == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. Please add an enormous FIXME comment here explaining why the NO_TEST_CACHE exclusion is necessary and when it can be removed. I'm really really uncomfortable with this, is there any other way to be more selective about when/where warnings are silenced?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many tests fail right now with NO_TEST_CACHE because they perform an image pull.

This one was the easiest fix, the alternative would be to filter out any output generated by a pull, or pass -q to each command that could cause a pull.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I already had added -q to any tests that might do a pull. Do you have a log handy, showing sample failures?

Also, in the other diff, please never say "for now". Please always include exact datestamps and, if possible, issue numbers, so a future reader can think "hmmm this should be fixed by now".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a "revert" patch to the current PR so we can get a list of the failing tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong, it seems we don't need this change

@giuseppe giuseppe marked this pull request as ready for review July 15, 2024 14:48
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 15, 2024
@edsantiago
Copy link
Member

Was this supposed to fix the lxgetattr flake? It does not seem to: https://api.cirrus-ci.com/v1/artifact/task/5139484897968128/html/int-podman-rawhide-root-host-sqlite.log.html

@giuseppe
Copy link
Member Author

Was this supposed to fix the lxgetattr flake? It does not seem to: https://api.cirrus-ci.com/v1/artifact/task/5139484897968128/html/int-podman-rawhide-root-host-sqlite.log.html

no, I am still trying to find a reproducer.

@giuseppe giuseppe marked this pull request as draft July 15, 2024 15:22
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 15, 2024
@giuseppe giuseppe force-pushed the no-artifacts-composefs branch 2 times, most recently from 235e406 to 99fca62 Compare July 15, 2024 20:41
@giuseppe
Copy link
Member Author

@edsantiago
Copy link
Member

OIC. Now I see what NO_TEST_CACHE does. That... is not a solution. We can't run CI like that. If that's the only way to get CI to pass, let's just disable composefs.

@giuseppe giuseppe force-pushed the no-artifacts-composefs branch 2 times, most recently from db9a07e to ebc67de Compare July 17, 2024 00:03
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 17, 2024
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@giuseppe
Copy link
Member Author

This makes me super uncomfortable. It seems like a kludgy workaround for bugs in composefs, and I worry that some day someone else may use it to work around other future bugs. And this only works because of the new local registry; prior to that it would pull from the net, which would be slow and flaky.

the images are pulled only once, and stored in the additional image store so it would still work pulling them from the network.

It is a workaround for composefs, since it supports only pulling from a registry at the moment. We might need to support pulling from other sources to enable "conversion" of images. That was not initially needed for partial pulls, so it is something we still need to implement.

@giuseppe
Copy link
Member Author

The commit stack is also deeply troubling: I had a lot of trouble reviewing commit-by-commit because work in one is undone in the next. If this is to be approved, may I encourage you to give some thought to cleaning up the individual commits?

I've not noticed that these pieces were left in the second commit. Dropped them

Copy link

Cockpit tests failed for commit 1ab3e1c. @martinpitt, @jelly, @mvollmer please check.

@martinpitt
Copy link
Contributor

Wrt. the chpasswd breakage in rawhide: This was fallout from an intermediate bad shadow-utils upload in rawhide. See stratis-storage/stratisd#3623 (comment) ff. for details.

TL/DR: Please retry the failed tests here and in #23275 and other recent PRs. Thanks!

@giuseppe
Copy link
Member Author

@containers/podman-maintainers PTAL

test/e2e/e2e.test Outdated Show resolved Hide resolved
Comment on lines 57 to 58
# artifacts restored from a tarball won't be stored correctly
# in composefs, so we need to disable this feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question is still open here, is this temporary or permanent?

contrib/cirrus/setup_environment.sh Outdated Show resolved Hide resolved
Copy link

Cockpit tests failed for commit 464b01d. @martinpitt, @jelly, @mvollmer please check.

@edsantiago
Copy link
Member

New flake (as in, one I've never seen before, and is happening on rawhide root:

# podman [options] --pull-option=enable_partial_images=true --pull-option=convert_images=true container checkpoint --create-image alpine-checkpoint-bmjmtt --keep e3793c9b35baf7d6a62a976fb4da4e4fc139c535edd159cd6edcf96a8b4e1d02
Error: exporting root file-system diff for "e3793c9b35baf7d6a62a976fb4da4e4fc139c535edd159cd6edcf96a8b4e1d02": invalid argument

I'm blaming composefs. Should I open a new issue for it?

@giuseppe
Copy link
Member Author

giuseppe commented Jul 24, 2024

yes please, I think it is the same root issue as #23274 (comment)

EDIT: you could open it directly in c/storage, it definitely looks like composefs

@edsantiago
Copy link
Member

containers/storage#2042

@Luap99
Copy link
Member

Luap99 commented Jul 24, 2024

Should we then not merge for now given there is a (new?) flake? Or does this fix enough other flakes/problems that this should be merged regardless?

@edsantiago
Copy link
Member

Speaking as the person running the no-flake-retry tests, this is a big improvement over the current state of composefs in podman. With this PR in #17831, I see fewer flakes. Still see flakes, and I fully expect there to be more, but as-is right now composefs is unusable. Incremental progress is ok with me.

@edsantiago
Copy link
Member

That said, if the EINVAL flake is an easy one to find/fix, it'd sure be nice to whack it

@giuseppe
Copy link
Member Author

without this PR we are not creating the files in the additional image store in the composefs format

@mheon
Copy link
Member

mheon commented Jul 24, 2024

LGTM

@giuseppe
Copy link
Member Author

let's wait for containers/storage#2043 and containers/storage#2044 before merging. Hopefully they can help to troubleshoot the "invalid argument" flake

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
the condition is based on the fact that podman save|podman load
doesn't recreate the same digest, thus it would fail if the image in
the additional store was pulled with a simple "podman pull".

The same sequence of commands would fail using podman manually after a
"podman pull alpine".

Ignore the cache and use only the images that were pulled in the main
store.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
layers restored from a tarball won't be converted to composefs so
disable the cache when using composefs.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member Author

patches for c/storage included, ready to merge

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2024
Copy link
Contributor

openshift-ci bot commented Jul 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 443b04b into containers:main Jul 24, 2024
89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants