-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very frequent freezing of FreeBSD VM during teardown #61
Comments
Does this seem to be the same issue as #29? |
Hard to say - but for me it was rather deterministic - basically it was a miracle if the build finished successfully. I think the running script either doesn't know the VM got killed or the killing is blocking it up for some reason. It had to manually kill these builds after 5 hours. I was also wondering if there is rsync that syncs back files from VM to the host - cannot this be possibly turned off as well? I don't need that synching - maybe this is what blocks forever? |
I forked Blend2D to be able to easier debug the CI workflow. I disabled all matrix entries except for FreeBSD and switch to macOS runner. So far I have not been able to reproduce the issue. Here's an example of a CI run [1]. As yo can see, I've run it five times. Did I do something wrong?
Yes, the action syncs back files to the host.
I guess I can add an option for that. Could you please create a separate issue for this?
In your first example, that is what seems to be happening. In your second example it gets stuck after syncing, it's the final force shutdown of the VM that times out. As an extra precaution the action kills the VM in case it fails to shutdown using the regular shutdown command. You can easily see this in the CI log by enabling timestamps. [1] https://github.com/cross-platform-actions/blend2d/actions/runs/5899978065 |
BTW, I see that you run all the *BSD workflows on Linux. I recommend running on macOS instead because it supports hardware accelerated nested virtualization, which the Linux runners don't. You can force using QEMU as the hypervisor on macOS using the [1] https://github.com/cross-platform-actions/action#inputs |
I have changed the runners to use Linux and QEMU as that seemed to be more stable in my case. |
BTW I have changed the runners to use MacOS, but that doesn't solve the issue. It seems this doesn't really matter at all. I get a very frequent build failures on FreeBSD because of this teardown issue. BTW I know that there is a sync process to sync files back from VM after the run, cannot this be the source of the problem? Can this be possibly disabled by an option to avoid syncing back if I don't need that functionality? |
I guess that's likely if it fails when syncing back the files.
Yeah, I guess so. |
@jacob-carlborg I've also noticed freezing on 0.19.1. Upon success not freezing, I get an error related to syncing files back. Is there a way to disable syncing back to the host or ignore if the teardown fails? https://github.com/chipsenkbeil/service-manager-rs/actions/runs/6553345333/job/17798792946 |
@chipsenkbeil in your case there's a clear error message. There are some files it doesn't have access to read. |
@kobalicek could you please try to enable debug output by setting the following variables: |
Yes, and it's interesting because these are generate files from a compiler by running a build command. As a user, I wasn't expecting to encounter an error like this. I suppose my only option is to delete them before finishing because otherwise this fails to sync. The freezing happens the majority of the time, which is why I flagged it here as a potential, reproducible situation. Will try to delete before teardown and see if that helps. |
Today's failure looks like this:
I think in the end this must be related to syncing the files. |
Which is related to #64 |
@kobalicek if you get the error: |
@chipsenkbeil, @kobalicek I've created a new release which adds support for disabling file syncing: https://github.com/cross-platform-actions/action/releases/tag/v0.20.0. |
@jacob-carlborg fantastic! Thanks for rolling this out so quickly 😄 |
I would close this one - I don't have this problem at the moment, I would open a new issue if I face a similar issue in the future. |
I am also seeing this issue, I think. It happens most often with FreeBSD 12.4 for me. |
@kobalicek have you disabled file syncing? Ideally I would like to solve the issue without having to relying on disabling file syncing. |
@kobalicek @chipsenkbeil @manxorist I wonder if this could be related to how much memory the VM is using. I got another report that there might not be enough memory for the host #68. Could you please try reducing the memory to see if there's a difference? |
…platform-actions/action#61 (comment)>. git-svn-id: https://source.openmpt.org/svn/openmpt/trunk/OpenMPT@19903 56274372-70c3-4bfc-bfc3-4c3a0b034d27
@jacob-carlborg
I am still seeing hangs: https://github.com/OpenMPT/openmpt/actions/runs/6731368321/job/18295891777 |
@manxorist that's disappointing. In this case it's hanging when shutting down the VM. |
Merged revision(s) 19903-19905 from trunk/OpenMPT: [Mod] build: CI: GitHub: Try reducing CPA VM size to 4GB. See <cross-platform-actions/action#61 (comment)>. ........ [Mod] build: CI: GitHub: Try disabling syncing back files, which we do not need, for CPA builds. See <cross-platform-actions/action#65>. ........ [Mod] build: CI: GitHub: Update CPA to v0.21.1. ........ ........ git-svn-id: https://source.openmpt.org/svn/openmpt/branches/OpenMPT-1.30@19907 56274372-70c3-4bfc-bfc3-4c3a0b034d27
[Mod] build: CI: GitHub: Try reducing CPA VM size to 4GB. See <cross-platform-actions/action#61 (comment)>. ........ [Mod] build: CI: GitHub: Try disabling syncing back files, which we do not need, for CPA builds. See <cross-platform-actions/action#65>. ........ [Mod] build: CI: GitHub: Update CPA to v0.21.1. ........ git-svn-id: https://source.openmpt.org/svn/openmpt/branches/OpenMPT-1.31@19906 56274372-70c3-4bfc-bfc3-4c3a0b034d27
I switched FreeBSD to QEMU on macOS and the first 4 runs went without any problem so far. I will continue monitoring and report back if it indeed fixes the FreeBSD issue for me. I also tried switching OpenBSD to QEMU on macOS, and I am seeing VM startup issues there. See #73. |
This will only mitigate the issue and doesn't fix the root cause. The action doesn't shutdown the VM anymore. Since the action is run inside a VM itself, everything will be cleaned up automatically. Hopefully this will make the issue less likely to occur.
@kobalicek @manxorist @chipsenkbeil I've created a branch that skips shutting down the VM and just lets the action exit: https://github.com/cross-platform-actions/action/tree/no-vm-shutdown. It would be great if anyone could give it a try to see if it helps. Unfortunately I haven't been able to find the root cause but this might mitigate some of the problem. |
…tdown branch. See <cross-platform-actions/action#61 (comment)>. git-svn-id: https://source.openmpt.org/svn/openmpt/trunk/OpenMPT@19926 56274372-70c3-4bfc-bfc3-4c3a0b034d27
[Mod] build: CI: GitHub: Switch FreeBSD to experimental CPA no-vm-shutdown branch. See <cross-platform-actions/action#61 (comment)>. ........ git-svn-id: https://source.openmpt.org/svn/openmpt/branches/OpenMPT-1.31@19928 56274372-70c3-4bfc-bfc3-4c3a0b034d27
[Mod] build: CI: GitHub: Switch FreeBSD to experimental CPA no-vm-shutdown branch. See <cross-platform-actions/action#61 (comment)>. ........ git-svn-id: https://source.openmpt.org/svn/openmpt/branches/OpenMPT-1.30@19929 56274372-70c3-4bfc-bfc3-4c3a0b034d27
|
Well, ignore the last comment. I got confused about the various configurations and tested macOS/QEMU instead of macOS/xhyve. I will re-test. |
…tdown branch and xhyve. See <cross-platform-actions/action#61 (comment)>. git-svn-id: https://source.openmpt.org/svn/openmpt/trunk/OpenMPT@19930 56274372-70c3-4bfc-bfc3-4c3a0b034d27
I'll give it a try. Even with skipping the copying back of files, it was still hanging at times. What do I need to set after switching to this branch? Any specific flag? |
2 times 13.2 and 2 times 12.4 for now, all successful. |
@chipsenkbeil no flags, it's automatic. If you look at the output you can verify if it shuts down the VM or not. Here's an example of where it doesn't shut down the VM [1]. And in the next example [2], it shuts down the VM, you can see the output: [1] https://github.com/cross-platform-actions/action/actions/runs/6928693321/job/18844968427#step:3:2046 |
@jacob-carlborg switched over to the branch. Only one run thus far and it worked fine. Will jump in if it hangs again, but the repo using it has low volume of updates, so it may be a little while. |
As we already established in #67, the VMs are for the majority (or all) use cases non-persistent and throw-away anyway, so is there a reason for properly shutting them down in the first place? I think for testability and correctness sake, there should always be a mode available with proper file syncing barriers and proper shutdown in place, but in the default case, nobody cares what happens with the VM after then build files have (optionally) been synced back. |
I was going to say "no, there's no reason" and I was planning to merge this branch regardless if it helps with this issue or not because it would be a good change anyway, less things for the action to do means the job finishes sooner. But now I started thinking, what if a job performs some additional major steps after the VM step, then the VM will unnecessarily occupy resources like CPU and memory. |
I guess that's a fair point that I did not consider. Still, for users who just care to run something like a test suite (my use case), it really does not matter what happens with the VM, and the whole action does nothing else after running things inside the VM. So a general option would probably a good idea to have. |
Yes, I agree. Perhaps default to not shutting down the VM? I think the only steps I have that are after the VM step is to upload binaries to a GitHub release. |
Well, I think resource consumption for following steps is a valid concern and the default should be to properly shutdown the VM, and skipping proper shutdown should only be optional. |
Hmm, I'm thinking ahead of this feature request as well #26. Trying to figure out how the API should look like. What you're suggesting would be the safest alternative, no risk of breaking anything. But it would be more verbose if one would use the action in multiple steps. I don't know how common that would be. What to optimize for in the API. |
Added - Added support for using the action in multiple steps in the same job ([cross-platform-actions#26](cross-platform-actions#26)). All the inputs need to be the same for all steps, except for the following inputs: `sync_files`, `shutdown_vm` and `run`. - Added support for specifying that the VM should not shutdown after the action has run. This adds a new input parameter: `shutdown_vm`. When set to `false`, this will hopefully mitigate very frequent freezing of VM during teardown ([cross-platform-actions#61](cross-platform-actions#61), [cross-platform-actions#72](cross-platform-actions#72)). Changed - Always terminate VM instead of shutting down. This is more efficient and this will hopefully mitigate very frequent freezing of VM during teardown ([cross-platform-actions#61](cross-platform-actions#61), [cross-platform-actions#72](cross-platform-actions#72)). - Use `unsafe` as the cache mode for QEMU disks. This should improve performance ([cross-platform-actions#67](cross-platform-actions#67)).
I have been experiencing a very frequent freezing during teardown lately with FreeBSD VMs.
I was using Xhyve + FreeBSD 13.2 version.
For example these two consecutive runs failed every time for the same reason:
I'm not sure what to do, because my builds are basically failing due to these issues. Temporarily I switched to QEMU virtualization and that seems to be more stable in my case.
The text was updated successfully, but these errors were encountered: