-
-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wireguard - waiting for DNS before trying to start the interface #30459
Comments
This must be solved upstream. In my systemd-networkd pull request I do exponential backoff. |
cc @zx2c4 |
Any plans to finish that soon? Funny enough I changed the resolv algorithm a bit last night, actually: https://git.zx2c4.com/WireGuard/commit/?id=76a9bb3898fa7ce8574a32d587014ae91ab34703 This is based on the discussion here: https://sourceware.org/glibc/wiki/NameResolver
However, it appears that @sjau is receiving EAI_NONAME, which means the above patch won't help. Why is @sjau receiving a permanent error at this stage? @sjau - could you describe in depth your DNS configuration? |
Having a Turris Omni router. In it I run dnsmasq to resolve the server domain name (dyndns address) locally. So that it resolves properly from the lan and from the wan. The entry for dnsmasq in the /etc/hosts.add file in the TO router is:
|
The hardware used isn't relevant. What is in your |
Client doesn't use dnsmasq... that's the router. |
So either avahi's mdns or the glibc resolver return this. |
I also remember that I had to treat every error as transient error in networkd. |
In that case, I have no idea what belongs to what and what on earth you're talking about. So let's start over: Our issue is with the client. I don't want to hear about other computers. Just the client. Would you summarize in one post all of the relevant DNS information about the client? |
Well, you wanted to know in depth dns configuration. Since I don't run a dns server on the client, I assumed you meant the dns server that I use... so I gave you that. That's from the client, the information you requested: |
Okay, thanks for the clarification. So:
Are any of these tweaked by you? Or is this a standard NixOS situation? For comparison, my line (from a different distro) just looks like:
So one line in It looks like this is a NixOS-particularity: nixpkgs/nixos/modules/config/nsswitch.nix Line 21 in 72a64ea
It was caused by commit 987aac7 . This commit changed the more sensible This is only a hypothesis. We won't know until @sjau tests it out, by changing |
I did set to use Avahi:
but I don't really need it. |
Fixed by #30472. Thanks! |
actually, this hasn't been solved yet... it's still failing
Wouldn't it be better for wg to retry automatically? |
On some kinds of "error conditions" it's not much meaningful to retry, as described in https://git.zx2c4.com/WireGuard/commit/?id=76a9bb3898fa7ce8574a32d587014ae91ab34703 but from the log it's not clear what exactly the OS got from DNS. |
How to get different log? That's from journalctl.... And once Nixos is boot and I reissue again as root |
Looks like it's returning |
How to do that?
That's the resolv.conf once the system has booted up but no idea if that's the same at that point during boot. |
Something is probably different, given that restarting the service later fixes the problem. @sjau: what's your DNS setup? On the client you get all servers from DHCP only? DHCP is served by Turris Omnia router and these three addresses belong to it? Assuming the name is |
except for a few entries in the hosts file, I get everything from the TO router. |
Ok, I disabled IPv6 on my notebook and still the same:
|
Can you post a more substantial log to see the various other things happening? |
This is still happening for me :-( |
My workaround is this: https://github.com/sjau/nix-expressions/blob/master/wgStartFix.nix |
I can confirm that this is still an issue. A nicer workaround without requiring a separate service to watch wireguard may be: networking.wireguard.interfaces.wg0 = {
preSetup = ''
# Try to access the DNS for up to 300s
for i in {1..300}; do
${pkgs.iputils}/bin/ping -c1 '<insert domain to resolve here>' && break
echo "Attempt $i: DNS still not available"
sleep 1s
done
'';
...
} On my first try, that succeeded to start the wireguard service after 11 failed attempts (i.e. 11 seconds after network-online). |
The question is why this happens. Yours works also but what if you have multiple wg up? |
Yes, that would be good to know. @Mic92 do you have any ideas?
Why would that make a difference? You'd need to add a similar |
After some research I though the solution would be to add |
@timokau between our glibc resolver and the application is nscd. Maybe that alters the errno returned to the application. Apart from that anything in our |
I'm not sure I understand that. Why do you assume an error is altered? |
@timokau because the error returned by our setup seems not to match what other distributions return, when the resolver cannot be reached: #30459 (comment) |
Looks like I don't know enough about the subject to contribute :/ |
Hmmm, I just discovered that systemd.network does support now wireguard as well: Maybe it's easier to just setup there? According to the nixos options it's not yet supported though. |
Coming back to this ancient problem: Last night gchristensen and I were talking a bit about this problem. in the end he suggested to add a
To the configuration.nix so that it will restart. However during rebuilding it complained that restart on failure does not go hand-in-hand with "oneshot" systemd type. (There's also an issue on systemd tracker that asks for adding retry on failure to oneshot type: systemd/systemd#2582 ). So, I tried to alter the oneshot type to simple by altering
to
in the Since I didn't know the proper syntax to add
to the After that I did rebuild and it works fine. It will now re-try to connect to the wg vpns even if first attempt isn't successfull. Currently my notebook connects to 3 wg vpns with this and I've had several reboots (for testing) and they always came back up again. So the question is: What is the benefit of "oneshot" compared to "simple" since "oneshot" seems to prevent wg coming up properly when using a domain name as server traget instead of an ip address. |
This got fixed: |
While #61971 did fix the issue, later changes introduced the same problem again for the wg peers. They will not get started properly because of oneshot type and no dns available at bootup. |
Thank you for your contributions. This has been automatically marked as stale because it has had no activity for 180 days. If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity. Here are suggestions that might help resolve this more quickly:
|
Still important to me |
Relevant to my interests as well, and I'm also a little confused as to why the systemd service runs as a 'oneshot' instead of continually retrying (maybe at exponential intervals up to 30 seconds) Edit: |
I marked this as stale due to inactivity. → More info |
still important to me as well, an easier vpn setup |
Our whole The oneshot services setting up the interface do wait for @sjau the original issue seems to be solved. If you still experience problems, please open a new issue with a reproducer and up2date logs. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/why-do-i-have-to-restart-wireguard-on-every-reboot/46376/1 |
Just found this https://github.com/sis2022/wireguard-dynip-update seems to be working quite reliably since I started using it a week ago. Running it on a handful of hosts now I got no more connection issues, yay! It's cronjobbed quite tightly by default but I found it to be a no-brainer and it is easily rescheduled less often if need be. |
Issue description
I have wireguard installed and I use a domain name for the connection to the wireguard server. Problem is, that upon boot, wg doesn't seem to wait for dns resolution and hence starting the interface fails.
Steps to reproduce
Add wireguard client configuration like:
And often - not always - I get journalctl entries like this
From what I read is that it can't resolve the dns properly.
Technical details
nixos-version
, Ubuntu/Fedora:lsb_release -a
, ...)18.03pre117886.874a3c033c (Impala)
nix-env --version
)nix-env (Nix) 1.11.15
nix-instantiate --eval "<nixpkgs>" -A lib.nixpkgsVersion
)"18.03pre117939.3ee33f35f8"
grep build-use-sandbox /etc/nix/nix.conf
)build-use-sandbox = false
The text was updated successfully, but these errors were encountered: