Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.13.0: panic in influxdb input plugin "interface is json.Delim, not string" #1268

Closed
zarnovican opened this issue May 25, 2016 · 5 comments · Fixed by #1271
Closed

v0.13.0: panic in influxdb input plugin "interface is json.Delim, not string" #1268

zarnovican opened this issue May 25, 2016 · 5 comments · Fixed by #1271
Labels
bug unexpected problem or unintended behavior

Comments

@zarnovican
Copy link
Contributor

Bug report

Telegraf process is sporadically crashing with panic: interface conversion: interface is json.Delim, not string. I'm pretty sure that none of the crash happened before update to 0.13.0. Around the same time, I have also updated InfluxDB to 0.13.0.

On the other hand, we have machines with the same Telegraf version (OS and everything) that haven't crashed since update.

System info:

telegraf 0.13.0
OS: 64-bit Ubuntu Trusty 14.04

Steps to reproduce:

Not consistently reproducible.

Additional info:

I have a feeling that this problem is happening under stressful conditions (like high CPU usage on the monitored host). See "Command timed out" for several minutes before..

2016/05/25 06:58:08 Gathered metrics, (1m0s interval), from 20 inputs in 8.949796949s
2016/05/25 06:58:19 Output [influxdb] buffer fullness: 99 / 10000 metrics. Total gathered metrics: 1040258. Total dropped metrics: 0.
2016/05/25 06:58:19 Output [influxdb] wrote batch of 99 metrics in 62.166429ms
2016/05/25 06:59:06 Error in input [exec]: exec: Command timed out. for command '/usr/lib/telegraf/scripts/sumproc.sh -u sentry'
2016/05/25 06:59:07 Error in input [exec]: exec: Command timed out. for command '/usr/lib/telegraf/scripts/sumproc.sh -f /var/run/icinga2/icinga2.pid'
2016/05/25 06:59:07 Error in input [exec]: exec: Command timed out. for command '/usr/lib/telegraf/scripts/sumproc.sh -f /var/run/postgresql/9.3-main.pid'
2016/05/25 06:59:07 Error in input [exec]: exec: Command timed out. for command '/usr/lib/telegraf/scripts/sumproc.sh -f /var/run/telegraf/telegraf.pid'
2016/05/25 06:59:07 Gathered metrics, (1m0s interval), from 20 inputs in 7.459498165s
2016/05/25 06:59:24 Output [influxdb] buffer fullness: 97 / 10000 metrics. Total gathered metrics: 1040355. Total dropped metrics: 0.
2016/05/25 06:59:24 Output [influxdb] wrote batch of 97 metrics in 156.915989ms
panic: interface conversion: interface is json.Delim, not string

goroutine 449482 [running]:
panic(0x10834e0, 0xc820677d80)
        /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/influxdata/telegraf/plugins/inputs/influxdb.(*InfluxDB).gatherURL(0xc8200c8440, 0x7f05202373f8, 0xc8204ef280, 0xc8201bb621, 0x20, 0x0, 0x0)
        /root/go/src/github.com/influxdata/telegraf/plugins/inputs/influxdb/influxdb.go:157 +0x328
github.com/influxdata/telegraf/plugins/inputs/influxdb.(*InfluxDB).Gather.func1(0xc8201db340, 0xc8200c8440, 0x7f05202373f8, 0xc8204ef280, 0xc82004d800, 0xc8201bb621, 0x20)
        /root/go/src/github.com/influxdata/telegraf/plugins/inputs/influxdb/influxdb.go:45 +0xa5
created by github.com/influxdata/telegraf/plugins/inputs/influxdb.(*InfluxDB).Gather
        /root/go/src/github.com/influxdata/telegraf/plugins/inputs/influxdb/influxdb.go:48 +0x185
2016/05/25 08:44:19 Starting Telegraf (version 0.13.0)
2016/05/25 08:44:19 Loaded outputs: influxdb
2016/05/25 08:44:19 Loaded inputs: processes exec swap system cpu kernel mem net netstat disk diskio exec exec exec influxdb exec postgresql exec redis exec
2016/05/25 08:44:19 Tags enabled: Name=monitoring host=ip-10-73-170-141
2016/05/25 08:44:19 Agent Config: Interval:1m0s, Debug:false, Quiet:false, Hostname:"ip-10-73-170-141", Flush Interval:1m0.371046598s 

Reading the stacktrace, it is possible that the problem is in influxdb plugin. This machine is monitoring the same InfluxDB, which is also the target for the metrics. Telegraf config snippet..

[agent]
  interval = "60s"
  flush_interval = "60s"
  flush_jitter = "10s"

[[outputs.influxdb]]
  urls = ["http://xxxx:8086"]
  database = "telegraf"
  precision = "s"

[[inputs.influxdb]]
  urls = [ "http://127.0.0.1:8086/debug/vars", ]

Monitored InfluxDB is also on 0.13.0.

@sparrc sparrc added the bug unexpected problem or unintended behavior label May 25, 2016
@sparrc
Copy link
Contributor

sparrc commented May 25, 2016

I'm not sure why this would only happen under load, but it is in fact an easy fix, will have it in the next release.

This bug was added in 0.12.1 with this PR: #1008

@jchauncey
Copy link

@Sparcc id like to get this fix asap. any news on 1.0 will be released?

@sparrc
Copy link
Contributor

sparrc commented Jun 6, 2016

It's already in 0.13.1 I think?

@jchauncey
Copy link

It's not in the changelog for 0.13.1 just in 1.0

@zarnovican
Copy link
Contributor Author

It was fixed post-0.13.1

$ git describe 5fe7e6e40e14beae680dbf698addb82a7c08b71b
0.13.1-9-g5fe7e6e
$ git tag --contains 5fe7e6e40e14beae680dbf698addb82a7c08b71b
1.0.0-beta1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants