Fleet hangs on starting second instance of service #1334

srinath-imaginea · 2015-08-24T17:27:57Z

So I have a relatively simple coreos cluster with 5 machines. I need to boot up 2 instances of nginx as the loadbalancer. Here is my truncated unit file:

[Unit]
Description=Nginx LoadBalancer
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=0
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill lb.%i
ExecStartPre=-/usr/bin/docker rm lb.%i
ExecStartPre=-/usr/bin/docker pull nginx
ExecStart=/usr/bin/docker run  --name="lb.%i" --hostname="lb%i" -p 80:80 -p 443:443 lbv2
ExecStop=-/usr/bin/docker kill lb.%i

[X-Fleet]
Conflicts=lb@*.service
Conflicts=mongod@*.service

I was able to start lb@1.service without issues. But everytime I try starting lb@2.service, fleet goes into a loop. Here is the debug info:

core@ip ~ $ fleetctl --debug start lb@2.service
2015/08/24 17:16:55 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 404 Not Found
2015/08/24 17:16:55 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%40.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%40.service?alt=json 200 OK
2015/08/24 17:16:55 DEBUG http.go:28: HTTP PUT http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP PUT http://domain-sock/fleet/v1/units/lb%402.service?alt=json 201 Created
2015/08/24 17:16:55 DEBUG fleetctl.go:578: Created Unit(lb@2.service) in Registry
2015/08/24 17:16:55 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 200 OK
2015/08/24 17:16:55 DEBUG fleetctl.go:715: Setting Unit(lb@2.service) target state to launched
2015/08/24 17:16:55 DEBUG http.go:28: HTTP PUT http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP PUT http://domain-sock/fleet/v1/units/lb%402.service?alt=json 204 No Content
2015/08/24 17:16:55 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 200 OK
2015/08/24 17:16:56 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:56 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 200 OK
2015/08/24 17:16:56 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:56 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 200 OK
2015/08/24 17:16:57 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:57 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 200 OK

And it just goes on forever. Both the services in the Conflicts section are already up and running. Any help appreciated!

Version info:

CoreOS stable (723.3.0)
fleetctl version 0.10.2
systemd 220

The text was updated successfully, but these errors were encountered:

mischief · 2015-08-24T17:36:39Z

@srinath-imaginea what version of coreos? what version of fleetctl?

srinath-imaginea · 2015-08-24T17:39:30Z

@mischief, sorry should've mentioned that first.

CoreOS stable (723.3.0)
fleetctl version 0.10.2
systemd 220

srinath-imaginea · 2015-08-24T18:21:18Z

After some digging, it looks like there were too many Conflicts defined in the various unit files, and one of the machines in the cluster was down, even though it was listed by fleetctl list-machines. I rebooted that machine and the lb@2.service came up without a hitch.

A more user-friendly error message might be nice to have though.

mischief · 2015-08-24T18:53:18Z

@srinath-imaginea perhaps you can tune agent_ttl to make the machine list react more quickly to machines that go down.

if you can, try fleet on the upcoming alpha (which is not out yet).

dongsupark mentioned this issue Oct 21, 2016

Running units on part of the cluster stopped and started after master disconnect #1690

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fleet hangs on starting second instance of service #1334

Fleet hangs on starting second instance of service #1334

srinath-imaginea commented Aug 24, 2015

mischief commented Aug 24, 2015

srinath-imaginea commented Aug 24, 2015

srinath-imaginea commented Aug 24, 2015

mischief commented Aug 24, 2015

Fleet hangs on starting second instance of service #1334

Fleet hangs on starting second instance of service #1334

Comments

srinath-imaginea commented Aug 24, 2015

mischief commented Aug 24, 2015

srinath-imaginea commented Aug 24, 2015

srinath-imaginea commented Aug 24, 2015

mischief commented Aug 24, 2015