Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wireguard: fail to start peers due dns resolution failure #260402

Open
datafoo opened this issue Oct 11, 2023 · 1 comment
Open

wireguard: fail to start peers due dns resolution failure #260402

datafoo opened this issue Oct 11, 2023 · 1 comment

Comments

@datafoo
Copy link
Contributor

datafoo commented Oct 11, 2023

Describe the bug

Some times, my wireguard peers fail to start at boot time due to dns resolution failure

Steps To Reproduce

  1. Configure multiple wireguard interfaces with networking.wireguard.interfaces. The more interfaces the more likely you will reproduce the bug. On my machine, 2 interfaces were enough to reproduce the bug.
  2. Reboot the machine until systemctl list-units --failed shows your wireguard peers fail to start

Expected behavior

The peer should start successfully at every boot.

Additional context

The issue is explained in detailed at https://discourse.nixos.org/t/name-resolution-fails-at-boot-time/33867 but here is a simplified explanation.

The existence of the Wireguard interfaces leads to premature reach of network-online.target. The wireguard peers are started immediately after network-online.target is reached but they fail because the network is not actually online yet.

[root@mymachine:~]# journalctl -u wireguard-wg* -u dhcpcd.service -u network-online.target -b -3
Oct 09 16:11:30 mymachine systemd[1]: Starting DHCP Client...
Oct 09 16:11:30 mymachine dhcpcd[1323]: dhcpcd-9.4.1 starting
Oct 09 16:11:30 mymachine dhcpcd[1340]: dev: loaded udev
Oct 09 16:11:30 mymachine dhcpcd[1340]: DUID <REDACTED>
Oct 09 16:11:30 mymachine systemd[1]: Starting WireGuard Tunnel - wg0...
Oct 09 16:11:30 mymachine systemd[1]: Starting WireGuard Tunnel - wg1...
Oct 09 16:11:30 mymachine dhcpcd[1340]: no valid interfaces found
Oct 09 16:11:30 mymachine dhcpcd[1323]: no valid interfaces found
Oct 09 16:11:30 mymachine dhcpcd[1340]: ens33: waiting for carrier
Oct 09 16:11:30 mymachine dhcpcd[1340]: ens33: waiting for carrier
Oct 09 16:11:30 mymachine systemd[1]: Finished WireGuard Tunnel - wg0.
Oct 09 16:11:30 mymachine systemd[1]: Finished WireGuard Tunnel - wg1.
Oct 09 16:11:30 mymachine dhcpcd[1340]: wg0: waiting for carrier
Oct 09 16:11:30 mymachine dhcpcd[1340]: wg1: IAID <REDACTED>
Oct 09 16:11:30 mymachine dhcpcd[1340]: wg1: waiting for 3rd party to configure IP address
Oct 09 16:11:30 mymachine dhcpcd[1340]: wg0: carrier acquired
Oct 09 16:11:30 mymachine dhcpcd[1340]: wg0: IAID <REDACTED>
Oct 09 16:11:30 mymachine dhcpcd[1340]: wg0: using static address <REDACTED>/24
Oct 09 16:11:30 mymachine dhcpcd[1340]: wg0: adding route to <REDACTED>/24
Oct 09 16:11:30 mymachine systemd[1]: Started DHCP Client.
Oct 09 16:11:30 mymachine systemd[1]: Reached target Network is Online.
Oct 09 16:11:30 mymachine systemd[1]: Starting WireGuard Peer - wg0 - <REDACTED> (<REDACTED>)...
Oct 09 16:11:30 mymachine systemd[1]: Starting WireGuard Peer - wg1 - <REDACTED> (<REDACTED>)...
Oct 09 16:11:30 mymachine wireguard-wg0-peer-<REDACTED>-start[1604]: Name or service not known: `wireguard.example.com:51820'
Oct 09 16:11:30 mymachine wireguard-wg1-peer-<REDACTED>-start[1606]: Name or service not known: `wireguard.example.com:51821'
Oct 09 16:11:30 mymachine systemd[1]: wireguard-wg0-peer-<REDACTED>.service: Main process exited, code=exited, status=1/FAILURE
Oct 09 16:11:30 mymachine systemd[1]: wireguard-wg1-peer-<REDACTED>.service: Main process exited, code=exited, status=1/FAILURE
Oct 09 16:11:30 mymachine wireguard-wg0-peer-<REDACTED>-post-stop[1627]: RTNETLINK answers: No such process
Oct 09 16:11:30 mymachine systemd[1]: wireguard-wg0-peer-<REDACTED>.service: Control process exited, code=exited, status=2/INVALIDARGUMENT
Oct 09 16:11:30 mymachine systemd[1]: wireguard-wg0-peer-<REDACTED>.service: Failed with result 'exit-code'.
Oct 09 16:11:30 mymachine wireguard-wg1-peer-<REDACTED>-post-stop[1629]: RTNETLINK answers: No such process
Oct 09 16:11:30 mymachine systemd[1]: Failed to start WireGuard Peer - wg0 - <REDACTED> (<REDACTED>).
Oct 09 16:11:30 mymachine systemd[1]: wireguard-wg1-peer-<REDACTED>.service: Control process exited, code=exited, status=2/INVALIDARGUMENT
Oct 09 16:11:30 mymachine systemd[1]: wireguard-wg1-peer-<REDACTED>.service: Failed with result 'exit-code'.
Oct 09 16:11:33 mymachine systemd[1]: Failed to start WireGuard Peer - wg1 - <REDACTED> (<REDACTED>).
Oct 09 16:11:33 mymachine systemd[1]: Reached target WireGuard Tunnel - wg0.
Oct 09 16:11:33 mymachine systemd[1]: Reached target WireGuard Tunnel - wg1.
Oct 09 16:11:35 mymachine dhcpcd[1340]: ens33: carrier acquired
Oct 09 16:11:35 mymachine dhcpcd[1340]: ens33: IAID <REDACTED>
Oct 09 16:11:35 mymachine dhcpcd[1340]: ens33: adding address <REDACTED>
Oct 09 16:11:36 mymachine dhcpcd[1340]: ens33: soliciting an IPv6 router
Oct 09 16:11:37 mymachine dhcpcd[1340]: ens33: Router Advertisement from <REDACTED>
Oct 09 16:11:37 mymachine dhcpcd[1340]: ens33: adding address <REDACTED>/64
Oct 09 16:11:37 mymachine dhcpcd[1340]: ens33: adding route to <REDACTED>/64
Oct 09 16:11:37 mymachine dhcpcd[1340]: ens33: requesting DHCPv6 information
Oct 09 16:11:37 mymachine dhcpcd[1340]: ens33: adding default route via <REDACTED>
Oct 09 16:11:37 mymachine dhcpcd[1340]: ens33: rebinding lease of <REDACTED>
Oct 09 16:11:37 mymachine dhcpcd[1340]: ens33: probing address <REDACTED>/24
Oct 09 16:11:38 mymachine dhcpcd[1340]: ens33: REPLY6 received from <REDACTED>
Oct 09 16:11:38 mymachine dhcpcd[1340]: ens33: refresh in 86400 seconds
Oct 09 16:11:42 mymachine dhcpcd[1340]: ens33: leased <REDACTED> for 86400 seconds
Oct 09 16:11:42 mymachine dhcpcd[1340]: ens33: adding route to <REDACTED>/24
Oct 09 16:11:42 mymachine dhcpcd[1340]: ens33: adding default route via <REDACTED>

To work around the issue, meaning: to prevent dhcpcd from leading to premature reach of network-online.target, there are a few possibilities:

  • Add the wireguard interfaces to networking.dhcpcd.denyInterfaces
  • Change the default from hardware-configuration.nix
    networking.useDHCP = false;
    networking.interfaces.ens33.useDHCP = true;

The result:

[root@mymachine:~]# journalctl -u wireguard-wg* -u dhcpcd.service -u network-online.target -b
Oct 10 09:15:04 mymachine systemd[1]: Starting DHCP Client...
Oct 10 09:15:04 mymachine dhcpcd[1357]: dhcpcd-9.4.1 starting
Oct 10 09:15:04 mymachine dhcpcd[1371]: dev: loaded udev
Oct 10 09:15:04 mymachine dhcpcd[1371]: DUID <REDACTED>
Oct 10 09:15:05 mymachine systemd[1]: Starting WireGuard Tunnel - wg0...
Oct 10 09:15:05 mymachine systemd[1]: Starting WireGuard Tunnel - wg1...
Oct 10 09:15:05 mymachine dhcpcd[1371]: no valid interfaces found
Oct 10 09:15:05 mymachine dhcpcd[1357]: no valid interfaces found
Oct 10 09:15:05 mymachine dhcpcd[1371]: ens33: waiting for carrier
Oct 10 09:15:05 mymachine dhcpcd[1371]: ens33: waiting for carrier
Oct 10 09:15:05 mymachine systemd[1]: Finished WireGuard Tunnel - wg1.
Oct 10 09:15:05 mymachine systemd[1]: Finished WireGuard Tunnel - wg0.
Oct 10 09:15:08 mymachine dhcpcd[1371]: ens33: carrier acquired
Oct 10 09:15:08 mymachine dhcpcd[1371]: ens33: IAID <REDACTED>
Oct 10 09:15:08 mymachine dhcpcd[1371]: ens33: adding address <REDACTED>
Oct 10 09:15:09 mymachine dhcpcd[1371]: ens33: soliciting an IPv6 router
Oct 10 09:15:10 mymachine dhcpcd[1371]: ens33: Router Advertisement from <REDACTED>
Oct 10 09:15:10 mymachine dhcpcd[1371]: ens33: adding address <REDACTED>/64
Oct 10 09:15:10 mymachine dhcpcd[1371]: ens33: adding route to <REDACTED>/64
Oct 10 09:15:10 mymachine dhcpcd[1371]: ens33: requesting DHCPv6 information
Oct 10 09:15:10 mymachine dhcpcd[1371]: ens33: adding default route via <REDACTED>
Oct 10 09:15:10 mymachine dhcpcd[1371]: ens33: rebinding lease of <REDACTED>
Oct 10 09:15:10 mymachine dhcpcd[1371]: ens33: probing address <REDACTED>/24
Oct 10 09:15:11 mymachine dhcpcd[1371]: ens33: REPLY6 received from <REDACTED>
Oct 10 09:15:11 mymachine dhcpcd[1371]: ens33: refresh in 86400 seconds
Oct 10 09:15:12 mymachine systemd[1]: Started DHCP Client.
Oct 10 09:15:12 mymachine systemd[1]: Reached target Network is Online.
Oct 10 09:15:12 mymachine systemd[1]: Starting WireGuard Peer - wg0 - <REDACTED> (<REDACTED>)...
Oct 10 09:15:12 mymachine systemd[1]: Starting WireGuard Peer - wg1 - <REDACTED> (<REDACTED>)...
Oct 10 09:15:12 mymachine systemd[1]: Finished WireGuard Peer - wg1 - <REDACTED> (<REDACTED>).
Oct 10 09:15:12 mymachine systemd[1]: Finished WireGuard Peer - wg0 - <REDACTED> (<REDACTED>).
Oct 10 09:15:12 mymachine systemd[1]: Reached target WireGuard Tunnel - wg0.
Oct 10 09:15:12 mymachine systemd[1]: Reached target WireGuard Tunnel - wg1.
Oct 10 09:15:14 mymachine dhcpcd[1371]: ens33: leased <REDACTED> for 86400 seconds
Oct 10 09:15:14 mymachine dhcpcd[1371]: ens33: adding route to <REDACTED>/24
Oct 10 09:15:14 mymachine dhcpcd[1371]: ens33: adding default route via <REDACTED>

May be related to #171079, #63869

Notify maintainers

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[root@mymachine:/etc/nixos]# nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.55, NixOS, 23.05 (Stoat), 23.05.20230927.5cfafa1`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.13.5`
 - channels(me): `"home-manager-23.05.tar.gz"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
@ryan27996
Copy link

Disclaimer: I've only been using NixOS for 2 days...

I'm also having this issue on 23.11 on my laptop:

nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.87, NixOS, 23.11 (Tapir), 23.11.6478.bc194f70731c`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.1`
 - channels(root): `"nixos-23.11"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
journalctl --unit wg-quick-wgtest.service 
Apr 20 18:42:21 localhost systemd[1]: Starting wg-quick WireGuard Tunnel - wgtest...
Apr 20 18:42:21 localhost wg-quick-wgtest-start[2356]: Warning: `/nix/store/randomstuff-config-wgtest/wgtest.conf' is world accessible
Apr 20 18:42:21 localhost wg-quick-wgtest-start[2356]: [#] ip link add wgtest type wireguard
Apr 20 18:42:21 localhost wg-quick-wgtest-start[2356]: [#] wg setconf wgtest /dev/fd/63
Apr 20 18:42:21 localhost wg-quick-wgtest-start[2381]: Name or service not known: `wg.example.com:51820'
Apr 20 18:42:21 localhost wg-quick-wgtest-start[2381]: Configuration parsing error
Apr 20 18:42:21 localhost wg-quick-wgtest-start[2356]: [#] ip link delete dev wgtest
Apr 20 18:42:21 localhost systemd[1]: wg-quick-wgtest.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 18:42:21 localhost systemd[1]: wg-quick-wgtest.service: Failed with result 'exit-code'.
Apr 20 18:42:21 localhost systemd[1]: Failed to start wg-quick WireGuard Tunnel - wgtest.

The problem could be related to the service being of type "oneshot": nixos/modules/services/networking/wg-quick.nix

I'm using the instructions from https://nixos.wiki/wiki/WireGuard in the "Setting up WireGuard server/client with wg-quick and dnsmasq"

I fixed it by adding this to my nix config:

{ config, pkgs, lib, ... }:

{
  networking.wg-quick.interfaces = { ...
    wgtest = {
      address = [ "REDACTED" "REDACTED"];
      dns = [ "REDACTED" "REDACTED" ];
      privateKeyFile = "/root/wireguard-keys/privkey";

      peers = [
        {
          publicKey = "REDACTED";
          presharedKeyFile = "/root/wireguard-keys/wg_example_com-psk";
          allowedIPs = [ "REDACTED" "REDACTED" ];
          endpoint = "wg.example.com:51820";
          persistentKeepalive = 25;
        }
      ];
    };
  };
  systemd.services.wg-quick-wgtest.serviceConfig = {
    Type = lib.mkForce "exec";
    RestartSec = 1;
    Restart = "on-failure";
  };
}

Not sure if that's the 'right' way to fix it, but it works, hope that helps someone!

Would it make sense to update the nix module from oneshot to exec? According to the systemd docs: "It is recommended to use Type=exec for long-running services, as it ensures that process setup errors [...] Also note it is generally not recommended to use idle or oneshot for long-running services."

cprussin added a commit to cprussin/dotfiles that referenced this issue Jul 8, 2024
This was happening because dhcpcd was trying to run on the wireguard interface,
which was leading to `network-online.target` coming up prematurely.

See NixOS/nixpkgs#260402 for details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants