Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM loop device not cleaned up in CI #252

Open
zeha opened this issue Nov 19, 2023 · 12 comments
Open

VM loop device not cleaned up in CI #252

zeha opened this issue Nov 19, 2023 · 12 comments

Comments

@zeha
Copy link
Member

zeha commented Nov 19, 2023

 * Removing loopback mount of file /code/qemu-1.img.
previous state:
loop3p1	(253:0)
/dev/loop1: [2065]:72205 (/var/lib/snapd/snaps/core20_2015.snap)
/dev/loop2: [2065]:72206 (/var/lib/snapd/snaps/snapd_20290.snap)
/dev/loop0: [2065]:72204 (/var/lib/snapd/snaps/lxd_24322.snap)
/dev/loop3: [2065]:282883 (/code/qemu-1.img)
after kpartx-d
loop3p1	(253:0)
/dev/loop1: [2065]:72205 (/var/lib/snapd/snaps/core20_2015.snap)
/dev/loop2: [2065]:72206 (/var/lib/snapd/snaps/snapd_20290.snap)
/dev/loop0: [2065]:72204 (/var/lib/snapd/snaps/lxd_24322.snap)
/dev/loop3: [2065]:282883 (/code/qemu-1.img)
loop_part is: loop3p1
loop3p1	(253:0)
/dev/loop1: [2065]:72205 (/var/lib/snapd/snaps/core20_2015.snap)
/dev/loop2: [2065]:72206 (/var/lib/snapd/snaps/snapd_20290.snap)
/dev/loop0: [2065]:72204 (/var/lib/snapd/snaps/lxd_24322.snap)
/dev/loop3: [2065]:282883 (/code/qemu-1.img)
 * Finished execution of grml-debootstrap. Enjoy your Debian system.

At least in GitHub Actions the cleanup of the loop device doesn't seem to work properly.

@zeha zeha changed the title VM loopp VM loop device not cleaned up in CI Nov 19, 2023
@adrelanos
Copy link
Contributor

Also modprobe loop is failing as I mentioned in #248 (comment) - same issue or separate issue?

@zeha
Copy link
Member Author

zeha commented Nov 20, 2023

Separate issue, I'd think. The loop device generally works there.

@adrelanos
Copy link
Contributor

Got any (CI) log where this can be seen?

Maybe a github actions upstream bug?

Do you think you could come up with minimal code for reproduction? Then this could be reported to github actions.

@zeha
Copy link
Member Author

zeha commented Dec 8, 2023

@adrelanos
Copy link
Contributor

I don't fully understand that code. However, to report this bug to github actions we'd need a tiny script as minimal and simple as possible. Surely not using docker if avoidable and certainly not mentioning grml-debootstrap.

qemu-img, parted, kpartx, losetup, mount... Which are the minimal steps required to reproduce this on github CI?

Maybe there's already an open bug report:
https://github.com/actions/runner/issues

@adrelanos
Copy link
Contributor

Maybe not a github actions bug.

Here people had a similar issues:

Someone indicated using losetup with -P --partscan might help.

-P, --partscan

Force the kernel to scan the partition table on a newly created loop device. Note that the partition table parsing depends on sector sizes. The default is sector size is 512 bytes, otherwise you need to use the option --sector-size together with --partscan.

Are more important takeaway might be that one cannot (easily) mount the "same" image twice. Does your code attempt to mount both images at the same time?

It's not the same file but the images created by your scripts might look confusingly similar to the Linux coreutils.

Here is how others fixed a similar issue by using mount with sizelimit but I think this might not be applicable here.
ryankurte/docker-rpi-emu@a66a966

Would it be an option for you to modify your PR to mount only 1 image at a time to work around this bug?

From above forum topic a user suggested:

You don't need to create a loop device, using the "loop" parameter in the mount command suffice.
mount -o loop,offset=$((98304*512)),sizelimit=1753219072 /srv/raspi/current/2019-04-08-raspbian-stretch-lite.img /mnt

Not sure to grml-debootstrap could do something similar, i.e. avoid kpartx / losetup. Using offset might be more complicated and error prone.

@zeha
Copy link
Member Author

zeha commented Dec 8, 2023

No, the problem here is like this:

  1. grml-debootstrap puts the img file onto a loop device, so it can modify the partitions in the image. And it really wants the loop device with partitions, so it can modify the EFI partition and the root filesystem, and delegate placement of everything to fdisk etc.
  2. When grml-debootstrap is done, the image should not be attached to a loop device. This fails for unknown reasons.
  3. Later the CI scripts try to mount the image again, and this "obviously" fails because step 2 failed.

If grml-debootstrap weren't a shell script I'd try replacing losetup/(k)partx/... with syscalls, but alas...

@adrelanos
Copy link
Contributor

syscalls might help with debugging and finding out what the issue is but generally I think it's better to stick with the Linux coreutils.

There was a mysterious kpartx in the past that might still not be fully / cleanly fixed.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734794

If there's anything similar would be good to get that reported upstream.

Are you sure about the offset? I don't know where the number 4194304 is coming from.

Maybe replace the mount using offset with the usual way of doing this?

Could you add additional debug output please?

  • Always use kpartx with -v.
  • Always use losetup with -v.
  • Always use dmsetup with -v.
  • Run mount before and after.

@zeha
Copy link
Member Author

zeha commented Dec 8, 2023

There was a mysterious kpartx in the past that might still not be fully / cleanly fixed.

Yeah, I was generally thinking we could switch from kpartx to partx, as thats in util-linux. But I haven't investigated this option.

Are you sure about the offset? I don't know where the number 4194304 is coming from.

The offset is correct for the specific configuration tested; but this is exactly why I don't want to deal with offsets. (k)partx does this calculation, and I don't want to write code for parsing partition tables...
(Comment above the number explains where it comes from.)

@zeha
Copy link
Member Author

zeha commented Dec 11, 2023

https://github.com/grml/grml-debootstrap/actions/runs/7172550946/job/19529980137?pr=250#step:4:3166

This is from a run with more -v. You can see how kpartx -d apparently did nothing.

@adrelanos
Copy link
Contributor

./tests/docker-test-b2b.sh: line 19: dmsetup: command not found

@zeha
Copy link
Member Author

zeha commented Dec 15, 2023

./tests/docker-test-b2b.sh: line 19: dmsetup: command not found

sure, but this is a long time after the problem occurred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants