Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Services depending on keys.target can cause hanging boots on NixOS containers #67265

Closed
Ma27 opened this issue Aug 22, 2019 · 5 comments
Closed
Labels
0.kind: bug 6.topic: nixos 6.topic: nixos-container Imperative and declarative systemd-nspawn containers

Comments

@Ma27
Copy link
Member

Ma27 commented Aug 22, 2019

Describe the bug

When starting an imperative NixOS container which is deployed using the container backend from NixOps with several secrets uploaded using the deployment.keys module and a dovecot2 unit from services.dovecot installed, the boot times out and causes the container to fail as it's waiting for an infinite amount of time for keys.target (which is a systemd-target that indicates whether all keys from NixOps were successfully uploaded).

This happens because several modules from nixpkgs (including dovecot) wait for keys.target by default, but are wanted by multi-user.target which causes the system to wait until the unit is started up (which is supposed to happen keys.target is reached).

The problem with NixOS containers is that they don't have a proper uplink until the container@name.service is completely started when using scripted networking as the ve-<name> interface on the host side is configured after the container is started up:

postStartScript = (cfg:
let
ipcall = cfg: ipcmd: variable: attribute:
if cfg.${attribute} == null then
''
if [ -n "${variable}" ]; then
${ipcmd} add ${variable} dev $ifaceHost
fi
''
else
''${ipcmd} add ${cfg.${attribute}} dev $ifaceHost'';
renderExtraVeth = name: cfg:
if cfg.hostBridge != null then
''
# Add ${name} to bridge ${cfg.hostBridge}
ip link set dev ${name} master ${cfg.hostBridge} up
''
else
''
echo "Bring ${name} up"
ip link set dev ${name} up
# Set IPs and routes for ${name}
${optionalString (cfg.hostAddress != null) ''
ip addr add ${cfg.hostAddress} dev ${name}
''}
${optionalString (cfg.hostAddress6 != null) ''
ip -6 addr add ${cfg.hostAddress6} dev ${name}
''}
${optionalString (cfg.localAddress != null) ''
ip route add ${cfg.localAddress} dev ${name}
''}
${optionalString (cfg.localAddress6 != null) ''
ip -6 route add ${cfg.localAddress6} dev ${name}
''}
'';
in
''
if [ -n "$HOST_ADDRESS" ] || [ -n "$LOCAL_ADDRESS" ] ||
[ -n "$HOST_ADDRESS6" ] || [ -n "$LOCAL_ADDRESS6" ]; then
if [ -z "$HOST_BRIDGE" ]; then
ifaceHost=ve-$INSTANCE
ip link set dev $ifaceHost up
${ipcall cfg "ip addr" "$HOST_ADDRESS" "hostAddress"}
${ipcall cfg "ip -6 addr" "$HOST_ADDRESS6" "hostAddress6"}
${ipcall cfg "ip route" "$LOCAL_ADDRESS" "localAddress"}
${ipcall cfg "ip -6 route" "$LOCAL_ADDRESS6" "localAddress6"}
fi
${concatStringsSep "\n" (mapAttrsToList renderExtraVeth cfg.extraVeths)}
fi
# Get the leader PID so that we can signal it in
# preStop. We can't use machinectl there because D-Bus
# might be shutting down. FIXME: in systemd 219 we can
# just signal systemd-nspawn to do a clean shutdown.
machinectl show "$INSTANCE" | sed 's/Leader=\(.*\)/\1/;t;d' > "/run/containers/$INSTANCE.pid"
''
);

With the container being unreachable until start-up is done, it's impossible to send keys on an unattended reboot to containers to ensure that keys.target is properly reached (which makes the system wait for dovecot2.service as it currently depends on keys.target). The timeout of dovecot2 keeps the container from properly starting up.

In my case the issue wouldn't exist if dovecot2.service didn't depend on keys.target as I only deploy secrets for services.borgbackup currently, so it's completely unnecessary for dovecot2.service to wait for that target.

My current workaround looks like this:

{ lib, ... }: {
  systemd.services.dovecot2 = {
    wants = lib.mkForce [ ];
    after = lib.mkForce [ "network.target" ];
  };
}

To Reproduce
Steps to reproduce the behavior:

  1. Deploy a container with deployment.targetEnv = "container";
  2. Deploy several secrets with deployment.keys and a dovecot instance using services.dovecot
  3. Try to reboot the container

Expected behavior

I originally expected that no service would wait for the keys on its own without explicitly configuring it to do so as the nixops manual recommends to use the <key-name>-key.service units and recommends to explicitly add those to the units in question.

However one might argue as well that the actual issue is the broken uplink for NixOS containers at boot, so I'd like to gather some opinions before filing a patch :)

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
  - systemd
  - services.dovecot
  - services.httpd
  - services.nsd
  - services.strongswan
  - services.strongswan-swanctl

CCing @edolstra @hrdinka (for dovecot2) and @lheckemann (as we talked about this earlier that day)

@Ma27 Ma27 added 0.kind: bug 6.topic: nixos 6.topic: nixos-container Imperative and declarative systemd-nspawn containers labels Aug 22, 2019
@hrdinka
Copy link
Contributor

hrdinka commented Aug 23, 2019

Hi,

Thanks for the detailed write-up. I have added the keys.target dependency to dovecot back in time. My problem then was that dovecot/the hole system would not start because it was missing the key files. Actually every service could possible depend on a key deployed by nixops. Therefore a solution that fixes this problem for all services would be favorable.

While it would be great to have a proper replacement, finding one isn't easy for the reasons described above. Since nixops does cover this in its documentation now (it didn't back then), I am in favor of removing the keys.target dependency from all services. We should however, add this to the NixOS realease notes and wait for NixOS 19.09 before porting this to stable.

@lheckemann
Copy link
Member

Absolutely agree that this shouldn't go into 19.03, but yeah I'm also in favour of making the change on master before the feature freeze (7th September) :)

@Ma27
Copy link
Member Author

Ma27 commented Aug 23, 2019

Thanks for the feedback! I'll open a PR tomorrow which removes the dependencies to keys.target from modules in <nixpkgs/nixos>.

However I'd keep this issue open after that until we've discussed whether keys.target should be declared in a module in NixOps.

Ma27 added a commit to Ma27/nixpkgs that referenced this issue Aug 27, 2019
The `keys.target` is used to indicate whether all NixOps keys were
successfully uploaded on an unattended reboot. However this can cause
startup issues e.g. with NixOS containers (see NixOS#67265) and can block
boots even though this might not be needed (e.g. with a dovecot2
instance running that doesn't need any of the NixOps keys).

As described in the NixOps manual[1], dependencies to keys should be
defined like this now:

``` nix
{
  systemd.services.myservice = {
    after = [ "secret-key.service" ];
    wants = [ "secret-key.service" ];
  };
}
```

However I'd leave the issue open until it's discussed whether or not to
keep `keys.target` in `nixpkgs`.

[1] https://nixos.org/nixops/manual/#idm140737322342384
@Ma27
Copy link
Member Author

Ma27 commented Apr 16, 2020

The actual issue has been fixed for 19.09 already, so this should be closable now.

@Ma27 Ma27 closed this as completed Apr 16, 2020
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/issues-using-nixos-container-to-set-up-an-etcd-cluster/8438/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug 6.topic: nixos 6.topic: nixos-container Imperative and declarative systemd-nspawn containers
Projects
None yet
Development

No branches or pull requests

4 participants