-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transient network issues in guest VMs during install #514
Comments
I do believe this is still a problem, at least on latest master. Steps to reproduce:
observe
Looking at the code, it's clear why this is failing: the apt update commands should be run only against templates: securedrop-workstation/dom0/sd-logging-setup.sls Lines 4 to 8 in 5482f2e
but instead it's outside the template block. Can someone else reproduce to confirm? Will submit PR with patch shortly. |
Can reproduce reliably (3/3 attempts) with I'd expect the resolution failure during apt calls to appear in fresh prod installs, although I've not tested that locally yet. |
I did not see this issue on a clean |
Second run completed without error; however, note that I didn't do any downgrades prior to the re-run. Happy to re-test with targeted downgrades if that could make a difference. |
Tried again, this time purging the template RPM from dom0 (which If no one else can reproduce, priority should be low, although I'd still advocate for review of #535 as a cleanup task. |
It occurs to me that transient network errors on the apt-test server would be a sufficient explanation for the variable behavior we're seeing. In fact, @zenmonkeykstop reported trouble pulling from apt-test around the same time window I was observing the failures described above. Only the apt-test repo showed problems for me, none of the other upstream repos. |
Also could not reproduce the error while following steps described in #514 (comment) |
Ok, so I uninstalled my prod installation with I have a different failure:
Perhaps I did something wrong though. |
I'm currently doing a Here's the log: There's no issue with the network, as far as I can tell, and manual updates appear to work fine. |
According to the management VM logs, the error most recently happened with |
The changes in #535 look like they'd resolve that case, although I don't have a good explanation for why the dns resolution problem is sporadic: if the wrong VMs (i.e. AppVMs) are targeted for apt updates, then the failure should always occur. |
Ah, thanks for the reminder about #535. Since I'm currently running prod I don't have that fix in yet. I think it may have been "sporadic" in the sense of only occurring when updates are available. I just applied all available updates and it successfully ran without errors. Will try to re-test for this case once I'm on staging. |
I've run a few installs since #535 landed and not seen this issue since then. Tagging "needs repro" for now, we can close if we don't see evidence of it during the next QA cycle. |
This appears to be resolved, feel free to reopen if you see it again. |
Both @emkll and I have observed transient network issues in guest VMs during prod installs, which cause required repository operations to fail, causing the whole install to fail.
See here for report from 0.2.3-rpm QA. Reboot did not resolve:
During my install I saw it both for
deb.qubes-os.org
and fordeb.debian.org
. Default Qubes config, i.e. no package updates over Tor for regular VMs.In spite of those failures, required packages appeared to be correctly installed. Only restarting all VMs once more and re-running resolved.
These issues are intermittent and we've not seen them for all installs.
The text was updated successfully, but these errors were encountered: