Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "cmdlib: workaround rofiles-fuse mounts leaking" #3860

Merged
merged 2 commits into from
Sep 5, 2024

Conversation

HuijingHei
Copy link
Member

This reverts commit ba45b29.

@HuijingHei
Copy link
Member Author

HuijingHei commented Aug 19, 2024

Do not merge this, just to test #3848

@marmijo
Copy link
Member

marmijo commented Aug 19, 2024

CI was stuck in the build stage for hours so I restarted the job.

@jlebon
Copy link
Member

jlebon commented Aug 20, 2024

CI is stuck because it's hitting the very issue that ba45b29 is working around.

If you log into the cluster, oc exec into that pod, and look at the contents of tmp/build/runvm-console.txt, you'll see the kernel panic and error messages mentioned in the commit message.

@jlebon
Copy link
Member

jlebon commented Aug 20, 2024

BTW, I didn't try very hard to reproduce this locally, but it might. (Using FORCE_UNPRIVILEGED=1.)

@HuijingHei
Copy link
Member Author

HuijingHei commented Aug 21, 2024

BTW, I didn't try very hard to reproduce this locally, but it might. (Using FORCE_UNPRIVILEGED=1.)

Thanks @jlebon for the pointer!

Run locally using FORCE_UNPRIVILEGED=1 cosa build, but get error Could not resolve host: kojipkgs.fedoraproject.org, any suggestions?

Ignore the error, need to update https://github.com/coreos/coreos-assembler/blob/main/src/cmd-build#L372 to remove RUNVM_NONET=1 temporarily to do testing, as this disables net by default.

@jlebon
Copy link
Member

jlebon commented Aug 21, 2024

BTW, I didn't try very hard to reproduce this locally, but it might. (Using FORCE_UNPRIVILEGED=1.)

Thanks @jlebon for the pointer!

Run locally using FORCE_UNPRIVILEGED=1 cosa build, but get error Could not resolve host: kojipkgs.fedoraproject.org, any suggestions?

Ignore the error, need to update main/src/cmd-build#L372 to remove RUNVM_NONET=1 temporarily to do testing, as this disables net by default.

Hmm, you shouldn't have to use RUNVM_NONET. When using FORCE_UNPRIVILEGED, make sure to use it for both cosa fetch and cosa build.

@HuijingHei
Copy link
Member Author

Thanks @jlebon , much appreciated for your guidance.

Revert #3848 and #3862, then do some testing.
Can reproduce locally using rpm-ostree-2024.7-1.fc40.x86_64, but can not reproduce with latest continuous build https://download.copr.fedorainfracloud.org/results/@CoreOS/continuous/fedora-40-x86_64/07922907-rpm-ostree/rpm-ostree-2024.7.32.g8dd6ec3b-1.fc40.x86_64.rpm, tried 3 times and all builds work with command:

$ cosa init https://github.com/coreos/fedora-coreos-config.git
$ rpm -q rpm-ostree
rpm-ostree-2024.7.32.g8dd6ec3b-1.fc40.x86_64

$ FORCE_UNPRIVILEGED=1 cosa fetch; FORCE_UNPRIVILEGED=1 cosa build

Will keep an eye for this and do more testing.

@jlebon
Copy link
Member

jlebon commented Aug 27, 2024

Thanks @jlebon , much appreciated for your guidance.

Revert #3848 and #3862, then do some testing. Can reproduce locally using rpm-ostree-2024.7-1.fc40.x86_64, but can not reproduce with latest continuous build https://download.copr.fedorainfracloud.org/results/@CoreOS/continuous/fedora-40-x86_64/07922907-rpm-ostree/rpm-ostree-2024.7.32.g8dd6ec3b-1.fc40.x86_64.rpm, tried 3 times and all builds work with command:

$ cosa init https://github.com/coreos/fedora-coreos-config.git
$ rpm -q rpm-ostree
rpm-ostree-2024.7.32.g8dd6ec3b-1.fc40.x86_64

$ FORCE_UNPRIVILEGED=1 cosa fetch; FORCE_UNPRIVILEGED=1 cosa build

Will keep an eye for this and do more testing.

Ahhh interesting. That it'd be a regression in rpm-ostree seems surprising. Can you sanity-check that if you build tag v2024.7 locally, it fails? If so, then you can git bisect it (good: git main, bad: tag v2024.7).

@HuijingHei
Copy link
Member Author

HuijingHei commented Sep 3, 2024

Can you sanity-check that if you build tag v2024.7 locally, it fails? If so, then you can git bisect it (good: git main, bad: tag v2024.7).

Sorry for the confused result, build rpm-ostree locally (using git main instead of rpm) and rsync it to coreos-assember, and do more testing.

So the root cause is we use fusermount but rpm-ostree-2024.7-1.fc40 drops fuse and include fuse3(see PR), verify that coreos/rpm-ostree#5074 can fix this issue.

If update to rpm-ostree-2024.7.38.gcd6fd88b-1.fc40.x86_64.rpm, which will install fuse as dependency (not sure if it is using rpm-ostree.spec), this is why it does not hit the issue.

@jlebon
Copy link
Member

jlebon commented Sep 4, 2024

Ahhh yes, of course. supermin respects RPM deps and so dutyfully pulled in fusermount3. So yes, https://src.fedoraproject.org/rpms/rpm-ostree/c/3c602a23787fd2df873c0b18df3133c9fec4b66a is what broke us and the timeline matches up I think.

I just tagged https://koji.fedoraproject.org/koji/buildinfo?buildID=2540995 into f40-coreos-continuous. Rebased this PR and also added a revert of 5e7638f.

@HuijingHei
Copy link
Member Author

Cool, thanks @jlebon very much for the pointer and guidance!

@HuijingHei
Copy link
Member Author

/lgtm
/approve

@HuijingHei HuijingHei merged commit 74d2d13 into coreos:main Sep 5, 2024
5 checks passed
@HuijingHei HuijingHei deleted the debug-rpmostree-failed branch September 5, 2024 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants