Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker input panic when decode fails #1052

Closed
sparrc opened this issue Apr 18, 2016 · 4 comments
Closed

Docker input panic when decode fails #1052

sparrc opened this issue Apr 18, 2016 · 4 comments

Comments

@sparrc
Copy link
Contributor

sparrc commented Apr 18, 2016

This bug was discovered by our cloud team. This can happen if there is a decode error.

panic: runtime error: invalid memory address or nil pointer dereference

[signal 0xb code=0x1 addr=0x0 pc=0x57eaee]



goroutine 49 [running]:

github.com/influxdata/telegraf/plugins/inputs/docker.gatherContainerStats(0x0, 0x7f214d869138, 0xc820646f00, 0xc8206a4a20)

/Users/nhaugo/src/go/src/github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:238 +0x352e

github.com/influxdata/telegraf/plugins/inputs/docker.(*Docker).gatherContainer(0xc820149cc0, 0xc820684280, 0x40, 0xc820684300, 0x1, 0x4, 0xc8206a4570, 0x25, 0xc820684380, 0x40, ...)

/Users/nhaugo/src/go/src/github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:228 +0x965

github.com/influxdata/telegraf/plugins/inputs/docker.(*Docker).Gather.func1(0xc8206a99c0, 0xc820149cc0, 0x7f214d869138, 0xc820646f00, 0xc820684280, 0x40, 0xc820684300, 0x1, 0x4, 0xc8206a4570, ...)

/Users/nhaugo/src/go/src/github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:112 +0xa5

created by github.com/influxdata/telegraf/plugins/inputs/docker.(*Docker).Gather

/Users/nhaugo/src/go/src/github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:116 +0x585
@sparrc
Copy link
Contributor Author

sparrc commented Apr 18, 2016

cc @nhaugo @goller

@ewales
Copy link

ewales commented May 5, 2016

@sparrc I'm dealing with problem now in 0.12.1 so in an attempt to patch it I added the error handling code you wrote (36d330f) to the 0.12.1 tag. This does remove the exception and prevent telegraf from crashing but I am still having issues. Telegraf will run fine until I do a deploy of my software then it will hang showing no new entries in the log. (The deploy process kills a running container, the starts a new one with a new tag of the same image.) If I delete the killed container than telegraf recovers and continues sending metrics (before the patch this is where I would see the exception).

2016/05/05 18:56:03 Gathered metrics, (10s interval), from 10 inputs in 3.558962691s
2016/05/05 18:56:04 Wrote 167 metrics to output influxdb in 13.338655ms
2016/05/05 18:56:11 Gathered metrics, (10s interval), from 10 inputs in 1.564626701s
2016/05/05 18:56:18 Wrote 102 metrics to output influxdb in 10.956226ms
2016/05/05 18:56:33 Wrote 94 metrics to output influxdb in 12.116606ms
2016/05/05 19:32:25 Gathered metrics, (10s interval), from 10 inputs in 36m5.408395554s
2016/05/05 19:32:25 Wrote 24 metrics to output influxdb in 4.909009ms
2016/05/05 19:32:27 Gathered metrics, (10s interval), from 10 inputs in 2.153244874s
2016/05/05 19:32:33 Gathered metrics, (10s interval), from 10 inputs in 3.567692221s
2016/05/05 19:32:39 Wrote 171 metrics to output influxdb in 16.620942ms
2016/05/05 19:32:41 Gathered metrics, (10s interval), from 10 inputs in 1.563377367s

You can see in the logs that it hung gathering metrics for 36 minutes.

@sparrc
Copy link
Contributor Author

sparrc commented May 5, 2016

thanks for the report @ewales, this is most likely issue #1133, which will also be fixed in release 0.13

@ewales
Copy link

ewales commented May 5, 2016

Thanks @sparrc, I patched in the code from master using context.WithTimeout and it seems to have corrected the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants