-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can not find docker metrics in influxdb, anyone can help? #645
Comments
I finally fix this problem~~~~ |
@asdfsx what was the issue/solution? |
@sparrc nothing but restart telegraf several times, then it appeared in influxdb......but I think there is still some problems remain. and when I execute sql:
|
might be a permissions issue, try
The upward slope is normal. Docker's CPU "usage" metric is actually just a counter of CPU ticks used. |
I think you are right!
You can see the telegraf was started by user BTW. You said that
or I need some function else, like |
yes, that query would be fine |
works fine. However, the service fails to send the docker metrics and the log fills with multiple instances of
The default permission on the unix socket is 660 (UID: |
@sparrc I'm seeing the same issue with v0.10.3-1:
However, % docker ps; works just fine. |
I think this is most likely related to Docker Version as I'm not seeing it at hosts with |
I've written small app which is scaled down plugin from Telegraf and I'm getting the same error/seeing the same issue even at Host with Docker v1.8.3. However, I can see "requests" being made in logs:
Origin of the message is https://github.com/influxdata/telegraf/blob/master/plugins/inputs/docker/docker.go#L119 I have no idea how to fix this issue, though. I have found no way to increase timeout value or anything related to such setting. It's possible, though, I haven't dug deep enough. |
@zstyblik the docker plugin has a hardcoded timeout of 5s, https://github.com/influxdata/telegraf/blob/master/plugins/inputs/docker/docker.go#L108-L114 which I believe should be more than enough. I think that should be a configuration setting, but that's a topic for another discussion. |
@tripledes unfortunately, this setting isn't related to the issue. |
@zstyblik obviously, a closed pipe has nothing to do with a timeout, but you suggested to increase the timeout and I just provided information regarding it being hardcoded ❔ |
Seems like the closed pipe is a side effect of the timeout over docker socket. Looks to me like it might be some synchronisation issue in dockerClient, but just a guess. On the other hand, I've been looking how |
@sparrc should we keep this opened ? As the issue can be reproduced, I believe it should be opened until a fix is found. |
yep, sure |
First attempt to switch to Docker's engine-api, if anyone is willing to test it, it's here: https://github.com/zooplus/telegraf/tree/docker_engine_api Besides having better compatibility I think one of the advantages of using engine-api is that they use context for all request so they can handle failure better. I'd be very glad to have some feedback, I tried to keep the output as it was before but the following items would need some love:
And whether possible, I'd like to make the plugin a bit more flexible, using jsonflattener? So we don't need to specify all the metrics upfront. But I guess this could be left for follow-ups. @sparrc what are your thoughts about the change? I think it could also be done with Go's std lib, but would require a bigger effort to have the same functionalities (context, api version compatibilities, ...). |
@tripledes I don't have time to test but this sounds fine with me. There is also a PR up for improving some of the docker metrics: #754, how does that fit in? |
@sparrc I currently have an instance of Telegraf with my changes running on our test env, no issues for now, just some blkio metric names that I need to check...other than that it's running fine, still some feedback from anyone involved on this issue would be very much appreciated :) @asdfsx @AdithyaBenny Regarding #754, I just had a quick look and I don't think it'd be an issue, I could reapply my changes on top of it once you get it merged. |
@tripledes if you make any change to the telegraf for this issue, I'd like to try |
@asdfsx here: https://github.com/zooplus/telegraf/tree/docker_engine_api you'd need to compile it yourself. I could provide a compiled binary if needed. |
@tripledes I just compile it on ubuntu, and run it via the following command |
@asdfsx thanks! Just let us know whether you find any issue so it can be fixed before submitting a PR. |
any changes considering this issue? |
@sporokh I understand you're also hitting the issue, right? I'd like to have a PR ready by the end of the week...although cannot really promise, little bit short on time this week, but I'll try. |
@tripledes Thanks a lot Sergio!
|
@tripledes any possibilities of a PR by the end of this week? |
@sparrc sorry, little short on time lately, I'll try over the weekend. In case I don't manage to get the time I'll ping u back. |
@sparrc Just finished modifying the input, haven't done anything on the tests yet and just run a manual test, although looking promising. I'll get to the tests tomorrow in the meantime anyone willing to test ? https://github.com/tripledes/telegraf/tree/engine-api Feedback welcome 👍 |
thank you @tripledes, this has worked well for me |
@sparrc glad to hear! I'd like to have a better look to the input plugin whenever I get a bit of time (quite busy lately at work), as I think it should be checking for API version and also to have some kind of integration tests against the supported docker api versions. Just some ideas. |
Check the syslogs (
then you have to add
|
To anyone looking for a solution on ARM based architecture... As root open the cmdline.txt file... Add the following to the end of the file... Reboot the system... Verify that the changes have worked! Hope this helps. |
Try to Run this command sudo chmod 666 /var/run.docker.sock. |
I start telegraf with the following config
and start telegraf by following command:
but can not find any measurements about docker in influxdb:
actually, I can see docker datas collected by telegraf
The text was updated successfully, but these errors were encountered: