-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in testCreateDownloadAnOS() on tumbleweed, disable broken SELinux on rawhide #1724
Conversation
5a296d9
to
43f6925
Compare
I actually just ran into this on a machine in our lab. Consistently crashed 3 times before the vm started |
So here we have debug output, with a broken test log. Locally, I have a working log. I'll do a deep dive after lunch. |
In the working case, we see the call to
which mean (4) "Resumed" and (2) "Started". So these supersede the previous "Stopped" event. This explains the screenshot where the VM is indeed shown as "Running". So that's a typical race condition which is entirely timing dependent. Testing a fix in commit 6399fdb which waits for the initial state, but it failed. Testing with a sleep in db28cac, just to make sure that this is really it (not a real solution of course). But this is still not enough. I also tried a "proper" wait: --- test/check-machines-create
+++ test/check-machines-create
@@ -1342,7 +1342,9 @@ vnc_password= "{vnc_passwd}"
user_login = virt_install_cmd_out.split("user-login=", 1)[1].split(",")[0].rstrip()
self.assertIn(user_login, self.user_login)
- time.sleep(5)
+ # wait for virt-install to finish
+ if self.create_and_run:
+ self.machine.execute(f"while {virt_install_cmd}; do sleep 1; done", timeout=300)
def fill(self):
b = self.browser With the default 60s timeout that fails (times out). With 300s timeout it eventually goes away. This is super annoying, but let's see how far it gets: commit 3cf37a4 |
OK, succeeded . There's something wrong with virt-install on tumbleweed, but I'm running out of steam/time and domain knowledge here. Let's put that in. |
…ed() Similar to what we do in other scenarios, make sure that the VM is in the expected state after creating the VM. This does not fix any flake, it's just a good additional check.
Thanks @martinpitt for taking a look and figuring it out! I should have time tomorrow to take a better look at this. Hopefully I get lucky and find what's going on with virt-install 🙂 |
Awesome -- this now fails everywhere except tumbleweed. So I'll restrict the workaround for now. this failure is unrelated and also very high on the weather report. But one thing at a time.. |
Wait for `virt-install` to finish, so that it doesn't race with the UI shutting down and deleting the VM. This takes *very* long on current OpenSUSE Tumbleweed unfortunately, but that's better than almost always failing. virt-install doesn't end at all anywhere else, so for now we have to restrict this hack to tumbleweed.
Meh, this rawhide failure is entirely unrelated and new, also happened in fedora-selinux/selinux-policy#2235:
This feels like the same as in https://issues.redhat.com/browse/RHEL-46893 - let's discuss in the selinux-policy PR. |
Similar to commit 986b141, that broken version was uploaded to Rawhide. See https://bugzilla.redhat.com/show_bug.cgi?id=2297965
Filed https://bugzilla.redhat.com/show_bug.cgi?id=2297965 . I added another commit to disable SELinux in rawhide, similar to our already existing RHEL 10 hack. sigh |
@martinpitt, do you have an idea why? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't argue with the results. :-)
Unfortunately not. See my debugging notes above for everything else I've tried. The exact inner workings of virt-install are still mysterious to me -- sometimes it keeps running forever, sometimes it finishes, I can't predict it 😢 So this is indeed just a "throw hands into the air" hack, and I'm sure it'll bite back some day. |
I tried to look into it but wasn't able to figure out anything either. |
This test almost always fails on tumbleweed, but never ever locally. I even tried to reproduce the exact series of nondestructive tests:
The weather report also shows that this fails exclusively on tumbleweed, in more than 77% of runs.
Turning debugging to 11.
I'll give this a little bit of investigation, but only so much -- if I can't figure it out quickly, I'll disable the test for now.
@Nykseli @Lunarequest @SludgeGirl FYI