Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

systemd weave script issues #1607

Closed
WitchDoc42 opened this issue Oct 28, 2015 · 17 comments
Closed

systemd weave script issues #1607

WitchDoc42 opened this issue Oct 28, 2015 · 17 comments
Assignees
Milestone

Comments

@WitchDoc42
Copy link

I have some issues with the systemd script. I run weave one a single system, without peers. The "docker attach weave" command is unnecessary in my case and, since no peers are found, hangs. (i'm not sure how long the time-out is, have not waited for it). I resolved this issue by adding a second "service" that executes docker weave attach and that requires weave.service. Since i do not have peers ate the moment this service is not enabled on my system. I cannot test it either.

weave-peers.service;

[Unit]
Description=Connect Weave Network to peers
Documentation=http://docs.weave.works/weave/latest_release/
Requires=weave.service
After=weave.service
[Service]
EnvironmentFile=-/etc/sysconfig/weave
ExecStart=/usr/bin/docker attach weave
[Install]
WantedBy=multi-user.target

I removed the docker line from the original scipt and made weave launch the ExecStart target. That was not enough because the ExecStop target is fired directly after ExecStart, killing the just started weave instance. This was fixed by adding "RemainAfterExit=yes".

weave.service;

[Unit]
Description=Weave Network
Documentation=http://docs.weave.works/weave/latest_release/
Requires=docker.service
After=docker.service
[Service]
RemainAfterExit=yes
EnvironmentFile=-/etc/sysconfig/weave
ExecStart=/usr/local/bin/weave launch $PEERS
ExecStop=/usr/local/bin/weave stop
[Install]
WantedBy=multi-user.target

I am running weave on an up to date CentOs 7 host. I do not know much about systemd and can not say if the above modification is desirable but it works for my environment.

@WitchDoc42
Copy link
Author

I would like to add that i moved the files to /usr/lib/systemd/system and made a symbolic link in /etc/systemd/system as that seems to comply with existing systemd scripts.

@rade
Copy link
Member

rade commented Oct 28, 2015

The "docker attach weave" command is unnecessary in my case and, since no peers are found, hangs

The attach command is supposed to "hang".

What exactly isn't working?

@WitchDoc42
Copy link
Author

From the command line "weave launch" starts two docker containers; weave and weaveproxy. When i reboot my machine with the original systemd script only the weave container is running and it cannot be stopped by systemctl. systemctl status weave gives the following errors;

weave.service operation timed out. exiting
Failed to start Weave Network.
Unit weave.service entered failed state.

With the modification above both containers are started.

@rade
Copy link
Member

rade commented Oct 29, 2015

This doesn't make much sense, tbh. Alas I am not a systemd expert. @errordeveloper might be able to figure out what's going on here.

@errordeveloper
Copy link
Contributor

First of all, @WitchDoc42, thank you for reporting this issue to us. As you said, you are new to systemd, so please let me try to demystify this a little bit.

As @rade said, the ExecStart=/usr/bin/docker attach weave statement in weave.services is intended to run in foreground.

I am not sure why do use say that it "hangs" when no peers are found... I assume you are implying that when $PEERS variable is unset and expands to an empty string, which should work exactly the same as when $PEERS is set. Could you please tell me exactly how did you determine that it "hangs", as you say?

The way you split one unit into two (with RemainAfterExit=yes and ExecStart=/usr/local/bin/weave launch $PEERS in one and ExecStart=/usr/bin/docker attach weave in another) is definitely misleading. Once again, the point of docker attach weave is to keep a process running in foreground to keep systemd happy as well as forward the logs to the journal, so when you run systemctl status weave you will see few of the most recent log messages from Weave router.

You certainly don't need to put unit files in /usr/lib/systemd/system and make symlinks to those from /etc/systemd/system. That doesn't change anything really and it's fine to just put your unit files in /etc/systemd/system.

@errordeveloper
Copy link
Contributor

From the command line "weave launch" starts two docker containers; weave and weaveproxy. When i reboot my machine with the original systemd script only the weave container is running and it cannot be stopped by systemctl

Could you please provide the output of journalctl -u weave?

@rade
Copy link
Member

rade commented Nov 4, 2015

@WitchDoc42 ping

@rade rade added this to the n/a milestone Nov 10, 2015
@rade
Copy link
Member

rade commented Nov 10, 2015

@WitchDoc42 I am closing this; if you have more info please re-open.

@rade rade closed this as completed Nov 10, 2015
@DanielDent
Copy link

I don't know if I'm experiencing the same issue as @WitchDoc42 , but I am having problems getting systemd to start weave automatically on boot.

When I use the documented procedure (having systemd run "launch", it hangs). Sometimes I get complaints about the weave plugin or the weave proxy having already started. I split my launch into three commands and start the plugin, proxy, and router on their own.

# systemctl status weave
● weave.service - Weave Network
   Loaded: loaded (/etc/systemd/system/weave.service; enabled)
   Active: failed (Result: timeout) since Fri 2016-01-08 18:11:11 PST; 51s ago
     Docs: http://docs.weave.works/weave/latest_release/
  Process: 880 ExecStartPre=/usr/local/bin/weave launch-plugin (code=exited, status=0/SUCCESS)
  Process: 678 ExecStartPre=/usr/local/bin/weave launch-proxy (code=exited, status=0/SUCCESS)
  Process: 418 ExecStartPre=/usr/local/bin/weave launch-router $WEAVE_IP_CONFIG $WEAVE_PEERS (code=killed, signal=TERM)

Jan 08 18:11:06 dev systemd[1]: weave.service start-pre operation timed out. Terminating.
Jan 08 18:11:09 dev weave[678]: 114bfa65993d11205fd02ce1e420124991d9ddba971f275371995c9d7cc8172d
Jan 08 18:11:11 dev weave[880]:     6992193e1d4cdd3ff7a48d238b3a2807fe197d0f65db78347ac177866ad2b85c
Jan 08 18:11:11 dev systemd[1]: Failed to start Weave Network.
Jan 08 18:11:11 dev systemd[1]: Unit weave.service entered failed state.

Weave still fails to start on boot. When I look, there is a docker run command which the weave launch started, and the docker run command simply hangs.

When I run the exact same weave launch command that hangs when executed by systemd, weave starts fine.

I thought there might be some kind of race condition with startup order. I tried having a 10 second delay before the weave launch is executed. This did not help.

@rade rade removed this from the n/a milestone Jan 10, 2016
@rade
Copy link
Member

rade commented Jan 10, 2016

@DanielDent

When I look, there is a docker run command which the weave launch started, and the docker run command simply hangs.

Does weave report work at that point? If so, please post the output.

@rade rade reopened this Jan 10, 2016
@DanielDent
Copy link

@rade Weave report just complains the weave container isn't running:

# weave report
weave container is not running.

Using an @reboot cronjob which waits 10 seconds and then launches weave works. Here are some logs of a failed systemd based startup.

docker[467]: time="2016-01-10T12:26:39.016425886-08:00" level=info msg="Loading containers: done."
docker[467]: time="2016-01-10T12:26:39.017143275-08:00" level=info msg="Daemon has completed initialization"
docker[467]: time="2016-01-10T12:26:39.017721693-08:00" level=info msg="Docker daemon" commit=a34a1d5 execdriver=native-0.2 graphdriver=aufs version=1.9.1
docker[467]: time="2016-01-10T12:26:39.464736343-08:00" level=info msg="GET /v1.21/version"
docker[467]: time="2016-01-10T12:26:39.466844550-08:00" level=info msg="GET /v1.21/version"
docker[467]: time="2016-01-10T12:26:39.468282614-08:00" level=info msg="GET /events"
docker[467]: time="2016-01-10T12:26:39.490367457-08:00" level=info msg="GET /v1.21/version"
docker[467]: time="2016-01-10T12:26:39.491880613-08:00" level=info msg="GET /v1.21/networks/weave"
docker[467]: time="2016-01-10T12:26:39.491989679-08:00" level=error msg="Handler for GET /v1.21/networks/weave returned error: network weave not found"
docker[467]: time="2016-01-10T12:26:39.492034486-08:00" level=error msg="HTTP Error" err="network weave not found" statusCode=404
docker[467]: time="2016-01-10T12:26:39.493591512-08:00" level=info msg="POST /v1.21/networks/create"
docker[467]: time="2016-01-10T12:26:39.497188687-08:00" level=info msg="GET /v1.21/containers/weave/json"
docker[467]: time="2016-01-10T12:26:39.501044513-08:00" level=warning msg="Unable to connect to plugin: /run/docker/plugins/weavemesh.sock, retrying in 1s"
docker[467]: time="2016-01-10T12:26:40.502040019-08:00" level=warning msg="Unable to connect to plugin: /run/docker/plugins/weavemesh.sock, retrying in 2s"
docker[467]: time="2016-01-10T12:26:42.502555755-08:00" level=warning msg="Unable to connect to plugin: /run/docker/plugins/weavemesh.sock, retrying in 4s"
docker[467]: time="2016-01-10T12:26:46.503128157-08:00" level=warning msg="Unable to connect to plugin: /run/docker/plugins/weavemesh.sock, retrying in 8s"
docker[467]: time="2016-01-10T12:26:54.503594194-08:00" level=error msg="Handler for POST /v1.21/networks/create returned error: Post http:///run/docker/plugins/weavemesh.sock/IpamDriver.RequestPool: http: ContentLength=79 with Body length 0"
docker[467]: time="2016-01-10T12:26:54.503655000-08:00" level=error msg="HTTP Error" err="Post http:///run/docker/plugins/weavemesh.sock/IpamDriver.RequestPool: http: ContentLength=79 with Body length 0" statusCode=500

I think the issue may have something to do with the weave network interfaces not existing/getting created.

@rade
Copy link
Member

rade commented Jan 10, 2016

So this is definitely a different issue to the one reported by the OP since their report predates the incorporation of the Weave docker network plugin into weave launch, and that, according to the logs, is where things are going wrong.

Do you experience any issues when not launching the plugin? (Warning: docker becomes very unhappy if it cannot find a plugin that it thinks is still in use).

@DanielDent
Copy link

Here's a boot where I reverted to doing a regular weave launch, and before rebooting, I docker rm -f'd all containers - even dead containers - such that the container with a docker restart policy isn't around.

systemd[1]: Starting Weave Network...
systemd[1]: weave.service start-pre operation timed out. Terminating.
systemd[1]: Failed to start Weave Network.
systemd[1]: Unit weave.service entered failed state.

To further narrow things down, I ran a fresh boot with all containers removed prior to boot. On this boot, the only thing systemd is running is weave launch-router (as opposed to weave launch, or each of the 3 services individually). I get the exact same output.

It feels to me like the weave router depends on a some part of the boot process having completed which hasn't yet been completed when docker and then it get launched.

@rade
Copy link
Member

rade commented Jan 10, 2016

Please post the unit definitions for that most recent of your tests.

Are you sure Docker has forgotten about the weave network? It remembers things even across reboots. Make sure there isn't anything unusual in the docker daemon logs (i.e. the kind of stuff you saw in #1607 (comment)).

Also, starting weave as WEAVE_DEBUG=1 weave launch-router and capturing stdout and stderr should give us a clue where it is hanging.

@DanielDent
Copy link

Thanks for your help here, I really appreciate it. It turns out the issue had nothing to do with Weave and everything to do with a broken Docker installation.

When I spun up a fresh VM & ran my CM tools against it to set up a minimal test environment, the problem stopped being reproducible. I was able to identify the difference:

My test environment was an instance where I had allowed docker-machine to provision docker on the VM. It installed a systemd file in /etc/systemd/system with a problematic approach to sockets & dependencies (doesn't list any!). This systemd unit file was taking precedence over the systemd unit file in /lib.

Which meant that weave's systemd unit was having it's docker dependency satisfied by the broken configuration that docker-machine put in, where niceties like sockets being carefully created and networks being available are ignored.

@etoews
Copy link

etoews commented Jul 31, 2016

@DanielDent Would you please expand on "problematic approach to sockets"? What was the problem and what was the fix?

@DanielDent
Copy link

I think you'll find the information you are looking for in the related issue docker/machine#2795 - if you have a more specific question, let me know.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants