-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FATAL error when metrics cannot be delivered #320
Comments
Looks like this is due to the domain name resolution failing when the func gmStatsDRegistry(prefix, addr string, interval time.Duration) (Registry, error) {
if addr == "" {
return nil, errors.New(" statsd addr missing")
}
a, err := net.ResolveUDPAddr("udp", addr)
if err != nil {
return nil, fmt.Errorf(" cannot connect to StatsD: %s", err)
}
r := gm.NewRegistry()
go statsd.StatsD(r, interval, prefix, a)
return &gmRegistry{r}, nil
} Because we use Consul as the dns for our StatsD service, the address is unresolvable until the service has started and registered. Perhaps a configuration option to postpone / retry address resolution would be helpful. |
This patch retries configuring metrics during startup to mitigate a race between fabio and metrics availability. Fixes #320
@simonsparks Indeed. Fabio waits for consul but not the metrics. I've hijacked the registry retry config parameters and made this more robust. Could you check whether that solves your issue? |
We just ran into this issue this week when fabio came up before the local consul instance did. We're running with fabio 1.5.2 at the moment |
@pvandervelde I've taken the patch and added two proper metrics config parameters |
Sweet. I'll grab the next release :) Thanks for fixing it! |
In our deployment scenario, Fabio is configured to deliver metrics to a remote StatsD collector.
When the infrastructure is provisioned, Fabio and other core services are started before the StatsD service so there is a period of time when metrics would not be collected.
The problem we found is that, if Fabio can't find the StatsD endpoint on startup, it logs a fatal error and exits. Presumably this might also happen after startup if the StatsD service was temporarily unavailable. We haven't tested whether this occurs for other supported metrics implementations as well.
I think it would be preferable for Fabio to continue operating without delivering its metrics rather than exiting.
An example log extract of the observed behaviour:
The text was updated successfully, but these errors were encountered: