Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Output causes panic on SIGHUP #1530

Closed
banks opened this issue Jul 21, 2016 · 3 comments · Fixed by #1753
Closed

Prometheus Output causes panic on SIGHUP #1530

banks opened this issue Jul 21, 2016 · 3 comments · Fixed by #1753

Comments

@banks
Copy link

banks commented Jul 21, 2016

Bug report

When using prometheus output, a SIGHUP will cause process to panic and crash.

Relevant telegraf.conf:

// /etc/telegraf/telegraf.d/prometheus_output.conf
[outputs.prometheus_client]
listen = ":9126"

System info:

$ telegraf --version
Telegraf - version 0.13.1
$ uname -a
Linux mesos-us-east-1.router.0 3.19.0-33-generic #38~14.04.1-Ubuntu SMP Fri Nov 6 18:17:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Steps to reproduce:

  1. Start telegraf with prometheus output configured
  2. Send SIGHUP to reload config

Expected behavior:

Config reloads gracefully

Actual behavior:

Panic with:

2016/07/21 13:16:49 Reloading Telegraf config
2016/07/21 13:16:49 Hang on, flushing any cached metrics before shutdown
2016/07/21 13:16:49 Output [prometheus_client] buffer fullness: 15 / 10000 metrics. Total gathered metrics: 3690785. Total dropped metrics: 0.
2016/07/21 13:16:49 Output [prometheus_client] wrote batch of 15 metrics in 743.762µs
panic: http: multiple registrations for /metrics

goroutine 1 [running]:
panic(0xe84ec0, 0xc820483830)
    /usr/local/go/src/runtime/panic.go:481 +0x3e6
net/http.(*ServeMux).Handle(0xc82005eab0, 0x1240ea0, 0x8, 0x7f51a09b4090, 0xc820b205a0)
    /usr/local/go/src/net/http/server.go:1926 +0x297
net/http.Handle(0x1240ea0, 0x8, 0x7f51a09b4090, 0xc820b205a0)
    /usr/local/go/src/net/http/server.go:1961 +0x4b
github.com/influxdata/telegraf/plugins/outputs/prometheus_client.(*PrometheusClient).Start(0xc8203f2de0, 0x0, 0x0)
    /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/outputs/prometheus_client/prometheus_client.go:42 +0x7f
github.com/influxdata/telegraf/agent.(*Agent).Connect(0xc82016c040, 0x0, 0x0)
    /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:53 +0x16a
main.main()
    /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:237 +0x1c76

Additional info:

I guess this is due to the configuration (default in our case) for the prometheus metrics endpoint not being freed from the http HandlerMux and the code blindly trying to reregister it again.

@sparrc
Copy link
Contributor

sparrc commented Jul 21, 2016

This is has been fixed already in 1.0.0-beta3: #1339

@sparrc sparrc closed this as completed Jul 21, 2016
@shamil
Copy link

shamil commented Sep 11, 2016

I'm still getting similar (but for duplicate metrics) panic when reloading the agent in version 1.0.0

2016/09/11 09:51:52 Reloading Telegraf config
2016/09/11 09:51:52 Hang on, flushing any cached metrics before shutdown
2016/09/11 09:51:52 Output [prometheus_client] buffer fullness: 0 / 1000 metrics. Total gathered metrics: 45. Total dropped metrics: 0.
panic: duplicate metrics collector registration attempted

goroutine 1 [running]:
panic(0x124fc20, 0xc82000f460)
        /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/prometheus/client_golang/prometheus.MustRegister(0x7fce34783080, 0xc8203f2040)
        /home/ubuntu/telegraf-build/src/github.com/prometheus/client_golang/prometheus/registry.go:119 +0x6d
github.com/influxdata/telegraf/plugins/outputs/prometheus_client.(*PrometheusClient).Start(0xc8203f2040, 0x0, 0x0)
        /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/outputs/prometheus_client/prometheus_client.go:31 +0x5c
github.com/influxdata/telegraf/agent.(*Agent).Connect(0xc820020060, 0x0, 0x0)
        /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:51 +0x16a
main.reloadLoop(0xc820050780, 0x0, 0x0)
        /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:232 +0x1987
main.main()
        /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:321 +0x5d

@sparrc
Copy link
Contributor

sparrc commented Sep 12, 2016

slightly different issue but I'll just reopen this one

@sparrc sparrc reopened this Sep 12, 2016
sparrc added a commit that referenced this issue Sep 12, 2016
sparrc added a commit that referenced this issue Sep 12, 2016
jackzampolin pushed a commit that referenced this issue Oct 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants