Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add a verity_cleanup function to test helpers.bash #565

Merged
merged 1 commit into from
Dec 1, 2023

Conversation

raharper
Copy link
Contributor

@raharper raharper commented Nov 28, 2023

Add a cleanup function to unmount underlying atoms if one of the mount fails.

What type of PR is this?

bug

Which issue does this PR fix:

#541

What does this PR do / Why do we need it:

Add a cleanup function which tracks the successfully mounted atoms, and if one
fails, then it will unmount the successful atoms before returning the error.

If an issue # is not available please add repro steps and logs showing the issue:

Testing done on this change:

make check locally.

Automation added to e2e:

Will this break upgrades or downgrades?

Does this PR introduce any user-facing change?:

no


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@smoser
Copy link
Contributor

smoser commented Nov 28, 2023

Do you have evidence or experience that this actually fixes the issue?

Also, you should 'Fixes #541' in your commit message.

@raharper
Copy link
Contributor Author

Do you have evidence or experience that this actually fixes the issue?

Not yet, I didn't have a setup to run the CI locally (going to set that up now);

Also, you should 'Fixes #541' in your commit message.

Sure, will add.

@raharper raharper force-pushed the fix/debug-atomfs-unmount branch 3 times, most recently from 6bc6245 to d894e7f Compare November 29, 2023 20:22
@raharper raharper changed the title Add retry with backoff on dm device removal fix: Add a verity_cleanup function to test helpers.bash Nov 29, 2023
Copy link

codecov bot commented Nov 29, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (3897848) 53.01% compared to head (489cfec) 57.14%.
Report is 1 commits behind head on main.

Files Patch % Lines
pkg/atomfs/molecule.go 75.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #565      +/-   ##
==========================================
+ Coverage   53.01%   57.14%   +4.12%     
==========================================
  Files          64       64              
  Lines        7505     7516      +11     
==========================================
+ Hits         3979     4295     +316     
+ Misses       2815     2479     -336     
- Partials      711      742      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@raharper
Copy link
Contributor Author

ok 6 --no-squashfs-verity works in 2sec
ok 7 mount + umount works in 2sec
ok 8 mount + umount + mount a tree of images works in 3sec
ok 9 bad existing verity device is rejected in 12sec

\o/

Copy link
Contributor

@smoser smoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for digging on this. i just ran 'losetup -a' on my laptop and found:

$ losetup -a | grep stackertest |wc -l
33

But only 9 of those 33 had 'verity' .
there are several others.

$ losetup -a | sed -e '/stackertest/!d' -e 's,.*/stackertest-,,' -e 's,[.].*,,' | sort -u
test_bad_existing_verity_device_is_rejected
test_mount_-2b_umount_-2b_mount_a_tree_of_images_works

Do you know if this will also fix the 'mount a tree of images' ?

test/helpers.bash Outdated Show resolved Hide resolved
test/helpers.bash Outdated Show resolved Hide resolved
test/helpers.bash Outdated Show resolved Hide resolved
test/helpers.bash Outdated Show resolved Hide resolved
test/helpers.bash Outdated Show resolved Hide resolved
@raharper
Copy link
Contributor Author

thank you for digging on this. i just ran 'losetup -a' on my laptop and found:

$ losetup -a | grep stackertest |wc -l
33

But only 9 of those 33 had 'verity' . there are several others.

$ losetup -a | sed -e '/stackertest/!d' -e 's,.*/stackertest-,,' -e 's,[.].*,,' | sort -u
test_bad_existing_verity_device_is_rejected
test_mount_-2b_umount_-2b_mount_a_tree_of_images_works

Do you know if this will also fix the 'mount a tree of images' ?

Here's what I know...

In the bad image test, we use the busybox image, create a new top layer
then we fake the verity data and try to mount that up. The mount brings up
all layers of the image (top layer with a touch, and the real busybox layer)

Normally, when we call stacker atomfs unmount, it will walk the oci layer
list and tear down each verity device and backing loopdev. However, since the
mount failed, we have nothing to invoke stacker atomfs umount at.

Thinking about it now, it looks like we need to call umount internally if
the mount fails.

I'll look at doing that now.

To why we see the failure, I've not yet found out precisely the order, but I
believe bats runs tests within a .bats file in parallel (and we enable that).
In the atomfs.bats test, each uses the same base busybox image, and some
make modifications creating top layers above which are unique. Neccesarily,
when we mount any of these busybox images, then there is a common base busybox
layer that gets setup as a verity device. What I think is happening is that
when one of the tests goes to tear down the verity or loop device that the
base layer is using, it might get a "device in use" error.

I can't seem to reproduce that easily; I have previously, but not today it
seems

test/atomfs.bats Outdated Show resolved Hide resolved
@raharper raharper force-pushed the fix/debug-atomfs-unmount branch 2 times, most recently from c710b10 to f3643d3 Compare November 30, 2023 19:31
@raharper raharper requested a review from smoser November 30, 2023 22:14
Copy link
Contributor

@smoser smoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does look great, but i have one more comment inline. I'm hope that I'm not wrong again and go 0-for-2 today on reading this PR.

pkg/atomfs/molecule.go Outdated Show resolved Hide resolved
The atomfs bad verity test case ends up leaving verity and loop
devices which then break other test cases.  The root cause is that
when the test case attempts to mount the invalid atomfs the underlying
atom was already mounted.  Once the verity fails, stacker exits but
does not remove any previously mounted at atoms in the molecule.  Fix
this by creating a cleanup function which runs if we fail to mount an
atom.

- Add a check to the atomfs.bats which checks for leftover loop mounts

Fixes: project-stacker#541

Signed-off-by: Ryan Harper <rharper@woxford.com>
Copy link
Contributor

@smoser smoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good. thanks.

@raharper raharper merged commit 123ba76 into project-stacker:main Dec 1, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants