Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Fleet hangs on starting second instance of service #1334

Open
srinath-imaginea opened this issue Aug 24, 2015 · 4 comments
Open

Fleet hangs on starting second instance of service #1334

srinath-imaginea opened this issue Aug 24, 2015 · 4 comments

Comments

@srinath-imaginea
Copy link

So I have a relatively simple coreos cluster with 5 machines. I need to boot up 2 instances of nginx as the loadbalancer. Here is my truncated unit file:

[Unit]
Description=Nginx LoadBalancer
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=0
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill lb.%i
ExecStartPre=-/usr/bin/docker rm lb.%i
ExecStartPre=-/usr/bin/docker pull nginx
ExecStart=/usr/bin/docker run  --name="lb.%i" --hostname="lb%i" -p 80:80 -p 443:443 lbv2
ExecStop=-/usr/bin/docker kill lb.%i

[X-Fleet]
Conflicts=lb@*.service
Conflicts=mongod@*.service

I was able to start lb@1.service without issues. But everytime I try starting lb@2.service, fleet goes into a loop. Here is the debug info:

core@ip ~ $ fleetctl --debug start lb@2.service
2015/08/24 17:16:55 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 404 Not Found
2015/08/24 17:16:55 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%40.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%40.service?alt=json 200 OK
2015/08/24 17:16:55 DEBUG http.go:28: HTTP PUT http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP PUT http://domain-sock/fleet/v1/units/lb%402.service?alt=json 201 Created
2015/08/24 17:16:55 DEBUG fleetctl.go:578: Created Unit(lb@2.service) in Registry
2015/08/24 17:16:55 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 200 OK
2015/08/24 17:16:55 DEBUG fleetctl.go:715: Setting Unit(lb@2.service) target state to launched
2015/08/24 17:16:55 DEBUG http.go:28: HTTP PUT http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP PUT http://domain-sock/fleet/v1/units/lb%402.service?alt=json 204 No Content
2015/08/24 17:16:55 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:55 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 200 OK
2015/08/24 17:16:56 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:56 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 200 OK
2015/08/24 17:16:56 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:56 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 200 OK
2015/08/24 17:16:57 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json
2015/08/24 17:16:57 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/lb%402.service?alt=json 200 OK

And it just goes on forever. Both the services in the Conflicts section are already up and running. Any help appreciated!

Version info:

CoreOS stable (723.3.0)
fleetctl version 0.10.2
systemd 220
@mischief
Copy link
Contributor

@srinath-imaginea what version of coreos? what version of fleetctl?

@srinath-imaginea
Copy link
Author

@mischief, sorry should've mentioned that first.

CoreOS stable (723.3.0)
fleetctl version 0.10.2
systemd 220

@srinath-imaginea
Copy link
Author

After some digging, it looks like there were too many Conflicts defined in the various unit files, and one of the machines in the cluster was down, even though it was listed by fleetctl list-machines. I rebooted that machine and the lb@2.service came up without a hitch.

A more user-friendly error message might be nice to have though.

@mischief
Copy link
Contributor

@srinath-imaginea perhaps you can tune agent_ttl to make the machine list react more quickly to machines that go down.

if you can, try fleet on the upcoming alpha (which is not out yet).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants