Various software refuses to bind to it's listenaddress #71106

wucke13 · 2019-10-14T08:34:54Z

This is a weird bug. When a service is changed during nixos-rebuild and therefore restarted, it refuses to bind to it's given port. For Example:

starting the following units: accounts-daemon.service, cups.socket, mpd.service, nix-daemon.socket, systemd-modules-load.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, tlp.service, udisks2.service, zerotierone.service
Job failed. See "journalctl -xe" for details.
warning: the following units failed: cups.socket

● cups.socket - CUPS Scheduler
   Loaded: loaded (/nix/store/nkihl7vbzbqz990hq15q42h69c06pijg-cups-2.2.12/lib/systemd/system/cups.socket; enabled; vendor preset: enabled)
  Drop-In: /nix/store/za3k732cia0g9y21xc5sdfflvgfsxd9j-system-units/cups.socket.d
           └─overrides.conf
   Active: failed (Result: resources) since Mon 2019-10-14 10:17:56 CEST; 3s ago
   Listen: /run/cups/cups.sock (Stream)
           /run/cups/cups.sock (Stream)
           127.0.0.1:631 (Stream)
       IP: 2.1K in, 2.2K out
      CPU: 645us

Oct 14 10:17:58 zornpad systemd[1]: /nix/store/nkihl7vbzbqz990hq15q42h69c06pijg-cups-2.2.12/lib/systemd/system/cups.socket:6: ListenStream= references a path below legacy directory /var/run/, updating /var/run/cups/cups.sock → /run/cups/cups.sock; please update the unit file accordingly....
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to receive listening socket (127.0.0.1:631): Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to listen on sockets: Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed with result 'resources'.
Oct 14 10:17:58 zornpad systemd[1]: Failed to listen on CUPS Scheduler.
Oct 14 10:17:58 zornpad systemd[2307]: cups.socket: Failed to create listening socket (127.0.0.1:631): Cannot assign requested address
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to receive listening socket (127.0.0.1:631): Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to listen on sockets: Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed with result 'resources'.
Oct 14 10:17:58 zornpad systemd[1]: Failed to listen on CUPS Scheduler.
Hint: Some lines were ellipsized, use -l to show in full.
warning: error(s) occurred while switching to the new configuration
sudo nixos-rebuild test  18.28s user 4.83s system 7% cpu 5:23.89 total

Though nothing is listening on that port:

$ sudo ss -tlpn
State    Recv-Q   Send-Q            Local Address:Port        Peer Address:Port
LISTEN   0        128               127.0.0.53%lo:53               0.0.0.0:*       users:(("systemd-resolve",pid=1003,fd=17))
LISTEN   0        128                     0.0.0.0:22               0.0.0.0:*       users:(("sshd",pid=1028,fd=3))
LISTEN   0        128         139.174.76.3%wlp3s0:21915            0.0.0.0:*       users:(("zerotier-one",pid=2313,fd=17))
LISTEN   0        128         139.174.76.3%wlp3s0:21916            0.0.0.0:*       users:(("zerotier-one",pid=2313,fd=19))
LISTEN   0        5                       0.0.0.0:6600             0.0.0.0:*       users:(("mpd",pid=2301,fd=7))
LISTEN   0        128         139.174.76.3%wlp3s0:9993             0.0.0.0:*       users:(("zerotier-one",pid=2313,fd=15))
LISTEN   0        5                       0.0.0.0:4713             0.0.0.0:*       users:(("pulseaudio",pid=1418,fd=32))
LISTEN   0        128                     0.0.0.0:5355             0.0.0.0:*       users:(("systemd-resolve",pid=1003,fd=12))
LISTEN   0        128                        [::]:22                  [::]:*       users:(("sshd",pid=1028,fd=4))
LISTEN   0        5                          [::]:4713                [::]:*       users:(("pulseaudio",pid=1418,fd=10))
LISTEN   0        128                        [::]:5355                [::]:*       users:(("systemd-resolve",pid=1003,fd=14))

I have this problem with all kind of services, for example mpd. Even a local build of zola (a static site generator with a preview function) doesn't start at this point:

$ zola serve
No port available.

Changing the port in question doesn't change anything.
The symptomatics is always the same: according to ss the port is not in use yet the software claims that it cannot bind to the port. I'm desperate for input on how to debug this.

Edit 1:
After a fresh restart, things (zola serve) work! Then, after nixos-rebuild test, things (zola serve) still work. So, I still don't know how to reproduce this reliable.

Edit 2:
After a systemctl suspend, things (zola serve) won't work.

Expected behavior
Service starts listening on it's assigned and free port

Additional context
I've encountered this on two of my machines, one with linux 4 and one with linux 5

Metadata

system: "x86_64-linux"
host os: Linux 5.3.5, NixOS, 19.09.789.7952807791d (Loris)
multi-user?: yes
sandbox: yes
version: nix-env (Nix) 2.3
channels(root): "nixos-19.09.789.7952807791d, nixos-unstable-20.03pre197080.dd050270c78"
channels(wucke13): "nixos-unstable-20.03pre196480.8b46dcb3db5"
nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos

The text was updated successfully, but these errors were encountered:

arianvp · 2019-10-14T08:58:37Z

are you running any large server workloads on this box? First thing I was thining of is that perhaps your file descriptor pool is exhausted?

arianvp · 2019-10-14T09:00:25Z

Could you show the output of $ strace zola serve (Or any other cli tool that tries to bind to a port?)
I want to know what syscall is failing and with what error code

wucke13 · 2019-10-14T09:50:35Z

are you running any large server workloads on this box? First thing I was thining of is that perhaps your file descriptor pool is exhausted?

Nope, one of the machines is my laptop with few services.

Could you show the output of $ strace zola serve (Or any other cli tool that tries to bind to a port?)
I want to know what syscall is failing and with what error code

Sure! trace.txt

At least for zola serve, the systemctl suspend seems to be a reliable way to reproduce this behavior.

arianvp · 2019-10-14T10:33:08Z

Thanks. could you also show the output of $ ip link when this issue happens?

From the logs it seems like your loopback interface is not available (which is odd)

socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(8995), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRNOTAVAIL (Cannot assign requested address)
close(3)                                = 0

``

wucke13 · 2019-10-14T10:51:37Z

Sure 😄

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s25: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT group default qlen 1000
    link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff
4: zt<zerotier id>: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff
5: zt<zerotier id>: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff

$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 <hostname>
::1 localhost

Worth noting, I do use systemd-networkd to configure my interfaces.

arianvp · 2019-10-14T11:25:23Z

In that case, a networkctl status -a when the issue is manifested will also be useful ;)

wucke13 · 2019-10-14T11:41:10Z

As requested. Note that I removed the zerotier interfaces from the list, as I do not expect the issue to be related to them.

$ networkctl status -a
● 1: lo
             Link File: /nix/store/6snycpaz9zrs5m7xz6dixl1nl0ngdrma-systemd-243/lib/systemd/network/99-default.link
          Network File: /etc/systemd/network/99-main.network
                  Type: loopback
                 State: carrier (configured)
                   MTU: 65536
  Queue Length (Tx/Rx): 1/1

● 2: enp0s25
             Link File: /nix/store/6snycpaz9zrs5m7xz6dixl1nl0ngdrma-systemd-243/lib/systemd/network/99-default.link
          Network File: /etc/systemd/network/99-main.network
                  Type: ether
                 State: no-carrier (configuring)
                  Path: pci-0000:00:19.0
                Driver: e1000e
                Vendor: Intel Corporation
                 Model: Ethernet Connection (3) I218-LM
            HW Address: <my mac address> (LCFC(HeFei) Electronics Technology co., ltd)
                   MTU: 1500 (min: 68, max: 9000)
  Queue Length (Tx/Rx): 1/1
      Auto negotiation: yes
                 Speed: n/a
                  Port: tp

● 3: wlp3s0
             Link File: /nix/store/6snycpaz9zrs5m7xz6dixl1nl0ngdrma-systemd-243/lib/systemd/network/99-default.link
          Network File: /etc/systemd/network/99-main.network
                  Type: wlan
                 State: routable (configured)
                  Path: pci-0000:03:00.0
                Driver: iwlwifi
                Vendor: Intel Corporation
                 Model: Wireless 7265 (Dual Band Wireless-AC 7265)
            HW Address: <my mac address> (Intel Corporate)
                   MTU: 1500 (min: 256, max: 2304)
  Queue Length (Tx/Rx): 1/1
               Address: 10.0.0.17
                        fe80::6257:18ff:fe6e:65d1
               Gateway: 10.0.0.1 (<brand name of my gateway>)
                   DNS: 10.0.0.1

alexbakker · 2019-10-14T11:57:59Z

Looks like the loopback interface has lost its configuration as it doesn't have any addresses. This sounds like the same issue as I reported here.

wucke13 · 2019-10-14T12:03:16Z

@alexbakker Ah yes!

arianvp · 2019-10-14T12:57:26Z

That networkd is managing the lo interface by default seems undesirable to me. We should probably exclude it in the generated config. For completeness sake, could you post

$ cat /etc/systemd/network/99-main.network

here?

networkd managing the lo interface should be supported as far as I understand. That stuff goes bananas after suspend sounds like an upstream systemd-networkd issue to me. and should probably be filed as a bug with the systemd project. I'd like to try to reproduce this on a non-NixOS distro using your 99-main.network and then see if we can file an upstream issue for it.

For you to get further and not have a buggy experience, I advice to make lo explicitly unmanaged by default like in the issue @alexbakker linked. We should also probably make this the default; and backport it to 19.09.

wucke13 · 2019-10-14T16:54:11Z

@arianvp Sure!

[Network]
DHCP=yes
IPv6AcceptRA=no
DNSSEC=no

Many thanks for the information, this makes sence.

arianvp · 2019-10-14T17:05:21Z

Interesting. Could you show your networking config section of your nixos configuration? I'm kind of surprised it generated a file with DHCP=yes on all interfaces. I thought we removed this from 19.09. It should've given you an evaluation error. I wonder if this is also what is causing the issues. Asking for a DHCP lease on loopback is a bit weird.

as per release notes:

The networking.useDHCP option is unsupported in combination with networking.useNetworkd in anticipation of defaulting to it by default. It has to be set to false and enabled per interface with networking.interfaces..useDHCP = true;

The origianl PR:
#69302

globin · 2019-10-15T07:14:44Z

I have experienced the lo interface losing its IP addresses when managed by systemd, as far as I could tell at suspend/wakeup. Since, I have removed the 99-main for me, which we will remove soon for 20.03, too.

cc @fpletz

wucke13 · 2019-10-15T07:40:32Z

Could you show your networking config section of your nixos configuration?

Sure!

networking = {
    useNetworkd = true;
    useDHCP = false;
    firewall. enable = false;
};
systemd.services."systemd-networkd-wait-online".enable = false;
systemd.network = {
  enable = true;
  networks = {
    "10-unmanaged" = {
      matchConfig.Name = "docker* nlmon* tun* virbr* zt*";
      linkConfig.Unmanaged = true;
    };
    "99-main" = {
      DHCP = "yes";
      extraConfig = ''
        IPv6AcceptRA=no
        DNSSEC=no
      '';
    };
  };
};

So looks like it is my fault, as I have DHCP=yes. Would you mind sharing a basic idiomatic sample for a systemd-networkd config?

arianvp · 2019-10-15T07:46:28Z

    "99-main" = {
      MatchConfig.Name = "en* wl*"; 
      DHCP = "yes";
      extraConfig = ''
        IPv6AcceptRA=no
        DNSSEC=no
      '';
    };

Should do the job in your case. Explicitly only match interfaces for which you want to actually do DHCP.

Or you could add lo to your Umanaged list:

  networks = {
    "10-unmanaged" = {
      matchConfig.Name = "lo docker* nlmon* tun* virbr* zt*";
      linkConfig.Unmanaged = true;
    };
    "99-main" = {
      matchConfig.Name = "*";
      DHCP = "yes";
      extraConfig = ''
        IPv6AcceptRA=no
        DNSSEC=no
      '';
    };
  };

Both will work. One is blacklist based, the other whitelist based. Note that we'll probably move to a model where all interfaces are unmanaged by default in 20.03, so I would say method 1 is a bit more idiomatic.

arianvp · 2019-10-15T08:07:31Z

I was looking to see if this is a systemd bug, but this behaviour is documented. It's not a bug. From the systemd-networkd man page (emphasis mine):

systemd-networkd will manage network addresses and routes for any link for which it finds a .network file with an appropriate [Match] section, see systemd.network(5). For those links, it will flush existing network addresses and routes when bringing up the device.

The linux kernel brings lo up. So when systemd-network starts, there is nothing for it to do. However when you suspend, and unsuspend, lo will be down, and systemd-networkd tries to configure it, which will flush the addresses before configuring, making lo lose the 127.0.0.1 address. However this might be a buggy part. the lo interface isn't down when you unsuspend, I think networkd merely thinks it is down and then flushes its config. It's a bit of a weird edge case that we could classify as a bug perhaps?

Another way to fix this would be to have an explicit configuration for loopback, and make it a very high presedence. This will cause the loopback interface to always be correctly configured. But I think explicitly not managing the lo interface is probably better.

networks."00-loopback" = {  matchConfig.Name = "lo";   Address = "127.0.0.1"; };

alexbakker · 2019-10-15T08:13:10Z

I would personally much rather have the option to have networkd not touch any network interfaces by default, except for the ones that have a config in configuration.nix.

arianvp · 2019-10-15T08:21:59Z

Agreed. That is the option we're planning to implement soon. (See @globin 's reaciton)

wucke13 · 2019-10-15T08:36:32Z

Issue is solved by #71106 (comment), thank you all!

Just maching all network interfaces caused many breakages, see NixOS#18962 and NixOS#71106. We already don't support the global networking.useDHCP, networking.defaultGateway(6) options if networking.useNetworkd is enabled, but direct users to configure the per-device networking.interfaces.<name?>.… options.

wucke13 added the 0.kind: bug label Oct 14, 2019

wucke13 closed this as completed Oct 15, 2019

flokli mentioned this issue Oct 23, 2019

nixos/network: remove 99-main.network #71790

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various software refuses to bind to it's listenaddress #71106

Various software refuses to bind to it's listenaddress #71106

wucke13 commented Oct 14, 2019 •

edited

Loading

arianvp commented Oct 14, 2019

arianvp commented Oct 14, 2019

wucke13 commented Oct 14, 2019

arianvp commented Oct 14, 2019

wucke13 commented Oct 14, 2019

arianvp commented Oct 14, 2019

wucke13 commented Oct 14, 2019

alexbakker commented Oct 14, 2019

wucke13 commented Oct 14, 2019

arianvp commented Oct 14, 2019

wucke13 commented Oct 14, 2019 •

edited

Loading

arianvp commented Oct 14, 2019 •

edited

Loading

globin commented Oct 15, 2019 •

edited

Loading

wucke13 commented Oct 15, 2019

arianvp commented Oct 15, 2019 •

edited

Loading

arianvp commented Oct 15, 2019 •

edited

Loading

alexbakker commented Oct 15, 2019

arianvp commented Oct 15, 2019 •

edited

Loading

wucke13 commented Oct 15, 2019

Various software refuses to bind to it's listenaddress #71106

Various software refuses to bind to it's listenaddress #71106

Comments

wucke13 commented Oct 14, 2019 • edited Loading

arianvp commented Oct 14, 2019

arianvp commented Oct 14, 2019

wucke13 commented Oct 14, 2019

arianvp commented Oct 14, 2019

wucke13 commented Oct 14, 2019

arianvp commented Oct 14, 2019

wucke13 commented Oct 14, 2019

alexbakker commented Oct 14, 2019

wucke13 commented Oct 14, 2019

arianvp commented Oct 14, 2019

wucke13 commented Oct 14, 2019 • edited Loading

arianvp commented Oct 14, 2019 • edited Loading

globin commented Oct 15, 2019 • edited Loading

wucke13 commented Oct 15, 2019

arianvp commented Oct 15, 2019 • edited Loading

arianvp commented Oct 15, 2019 • edited Loading

alexbakker commented Oct 15, 2019

arianvp commented Oct 15, 2019 • edited Loading

wucke13 commented Oct 15, 2019

wucke13 commented Oct 14, 2019 •

edited

Loading

wucke13 commented Oct 14, 2019 •

edited

Loading

arianvp commented Oct 14, 2019 •

edited

Loading

globin commented Oct 15, 2019 •

edited

Loading

arianvp commented Oct 15, 2019 •

edited

Loading

arianvp commented Oct 15, 2019 •

edited

Loading

arianvp commented Oct 15, 2019 •

edited

Loading