Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixops deploy doesn't bring server in desired state because it doesn't start stopped systemd units #1063

Open
nh2 opened this issue Dec 15, 2018 · 7 comments

Comments

@nh2
Copy link
Contributor

nh2 commented Dec 15, 2018

If my nginx shuts down because of some failure in another systemd unit that it depends on, and I fix the issue and deploy with nixops, my nginx isn't actually started.

In general, when I run nixops deploy, the desired state isn't reached if a unit stopped for some reason.

It's only reached with --force-reboot or nixops reboot.

I think this is because nixops deploy doesn't actually systemctl isolate any target like multi-user.target (which is the default target that we put requiredBy on in NixOS service modules if we want it to be started).

I propose that we should probably systemctl isolate multi-user.target during nixops deploy.

Any opposing views?

@nh2
Copy link
Contributor Author

nh2 commented Dec 15, 2018

In other words, nixops deploy is not congruent in the terminology of https://blog.flyingcircus.io/2016/05/06/thoughts-on-systems-management-methods/.

@nh2
Copy link
Contributor Author

nh2 commented Jan 10, 2019

Here's some more info on how things work right now:

https://github.com/NixOS/nixpkgs/blob/542ef2b182dff9756abf782a650f80599c515e4a/nixos/modules/system/activation/switch-to-configuration.pl#L69

getActiveUnits uses systemctl list-units --full --no-legend.

After a successful systemctl activate multi-user.target, that looks like this:

...
systemd-udevd-kernel.socket                                                              loaded active running   udev Kernel Socket                                                
basic.target                                                                             loaded active active    Basic System                                                      
encrypted-links.target                                                                   loaded active active    All Encrypted Links                                               
getty.target                                                                             loaded active active    Login Prompts                                                     
local-fs-pre.target                                                                      loaded active active    Local File Systems (Pre)                                          
local-fs.target                                                                          loaded active active    Local File Systems                                                
multi-user.target                                                                        loaded active active    Multi-User System                                                 
network-interfaces.target                                                                loaded active active    All Network Interfaces (deprecated)                               
network-online.target                                                                    loaded active active    Network is Online                                                 
network-pre.target                                                                       loaded active active    Network (Pre)                                                     
network.target                                                                           loaded active active    Network                                                           
nss-lookup.target                                                                        loaded active active    Host and Network Name Lookups                                     
...

and we can see:

# systemctl status multi-user.target
● multi-user.target - Multi-User System
   Loaded: loaded (/nix/store/3hmpbbcv1db42m9g34c9g4q6qinw50x4-systemd-237/example/systemd/system/multi-user.target; linked; vendor preset: enabled)
   Active: active since Thu 2019-01-10 21:00:22 UTC; 1min 15s ago
     Docs: man:systemd.special(7)

But as soon as one service having RequiredBy = [ "multi-user.target" ] stops, the multi-user.target is no longer active. For example, if I systemctl stop myservice then it looks like this:

# systemctl status multi-user.target
● multi-user.target - Multi-User System
   Loaded: loaded (/nix/store/3hmpbbcv1db42m9g34c9g4q6qinw50x4-systemd-237/example/systemd/system/multi-user.target; linked; vendor preset: enabled)
   Active: inactive (dead) since Thu 2019-01-10 21:02:27 UTC; 2s ago
     Docs: man:systemd.special(7)

That also makes it disappear from the systemctl list-units --full --no-legend list. Consequently, the bit in https://github.com/NixOS/nixpkgs/blob/542ef2b182dff9756abf782a650f80599c515e4a/nixos/modules/system/activation/switch-to-configuration.pl#L174-L177

is not executed and multi-user.target is not started by switch-to-configuration (which nixops calls).

So, if any service stopped (thus stopping multi-user.target), then nixops deploy will currently not start any declared units, and thus it's not congruent.

@nh2
Copy link
Contributor Author

nh2 commented Jan 10, 2019

A remaining question is whether it should be nixops or switch-to-configuration that is to be made congruent.

nh2 added a commit to nh2/nixops that referenced this issue Jan 11, 2019
This makes `nixops deploy` congruent (the declared services
will always be running after deploy, no matter what the state
of the server was before the deploy).
@nh2
Copy link
Contributor Author

nh2 commented Jan 11, 2019

PR at #1078

@allgreed
Copy link

allgreed commented Aug 5, 2019

@nh2 until your work gets merged - do you think there's a better workaround than running systemctl isolate ... "by hand" after every deployment?

With regards to your question: would it hurt to have it in both places at some point in time with nixops leading the way?

@deepfire
Copy link
Contributor

I have the same issue with nixops -- units not reaching nominal state post-deploy, despite being enabled in the Nixops network definition:

  Loaded: loaded (/nix/store/00g5g2ws4032brlm4fwb7lakmdcgyi0z-unit-foo.service/foo.service; enabled; vendor preset: enabled)
   Active: inactive (dead)

@JosephLucas
Copy link
Contributor

I stumbled on this issue too. I have to deploy like this to re-trigger the multi-user.target

nixops deploy -d <deployment> && sleep 5 && nixops ssh-for-each -d <deployment> "systemctl isolate multi-user.target"

nh2 added a commit to nh2/nixops that referenced this issue Jul 1, 2021
This makes `nixops deploy` congruent (the declared services
will always be running after deploy, no matter what the state
of the server was before the deploy).
nh2 added a commit to nh2/nixpkgs that referenced this issue Nov 1, 2021
Details on NixOS/nixops#1063 (comment).

`partOf` makes that if `smokeping.service` is stopped, `thttpd.service` will
be stopped as well.
(But not that `thttpd` will be started when `smokeping` is started).

Once `thttpd.service` is stopped that way, `Restart = always` will not apply.

When the smokeping config options are changed, NixOS's `switch-configuration.pl`
will stop `smokeping` (whit shuts down thttpd due to `partOf`), and then restart
smokeping; but this does not start thttpd.
As a result, thttpd will be off after changing the config, which isn't desired.

This commit fixes it by removing the `partOf`, which makes `Restart` work
as expected.
erictapen pushed a commit to NixOS/nixpkgs that referenced this issue Nov 1, 2021
Details on NixOS/nixops#1063 (comment).

`partOf` makes that if `smokeping.service` is stopped, `thttpd.service` will
be stopped as well.
(But not that `thttpd` will be started when `smokeping` is started).

Once `thttpd.service` is stopped that way, `Restart = always` will not apply.

When the smokeping config options are changed, NixOS's `switch-configuration.pl`
will stop `smokeping` (whit shuts down thttpd due to `partOf`), and then restart
smokeping; but this does not start thttpd.
As a result, thttpd will be off after changing the config, which isn't desired.

This commit fixes it by removing the `partOf`, which makes `Restart` work
as expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants