-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wavefront output plugin: idle connection timeout causes lost metrics #7160
Comments
cc @puckpuck |
I need to defer this one to @vikramraman and @prydin whom will be taking over the Wavefront plugins. From the surface this could be an issue as part of the wavefront-sdk. Perhaps a version update is needed? |
Is there any update on this open issue? I too see the same kind of problem: 2020-08-25T14:12:08Z E! [agent] Error writing to outputs.wavefront: Wavefront sending error: write tcp 10.x.xxx.xx:35042->172.20.56.248:2878: write: broken pipe |
Running the Wavefront output in "plain socket mode" assumes a stable connection between Telegraf and the Wavefront proxy. That requirement is not satisfied if the load balancer resets the connection after 60 seconds. I would recommend using the HTTP protocol instead, as it is a lot more load balancer friendly and doesn't require a stable connection. |
@prydin, can you give the exact format we'd need to use for http mode, and since which telegraf version it has been supported? |
See my workaround #2 in the description above for how to use http mode. Essentially, using a url field instead of host and port. The wavefront output plugin doc has an example I believe. |
I think this was resolved in: #11560 |
I ran into a very confusing issue that I'd like to make sure gets documented and hopefully fixed so others don't fall into the same issue. Using Telegraf 1.13.4 on an AWS EC2 instance passing through an AWS Internal ELB, I was seeing missing metrics and connection errors in the Telegraf logs.
Relevant telegraf.conf:
System info:
Telegraf 1.13.4
Amazon Linux 2
Connection through AWS Internal ELB with default 60-sec idle connection timeout
Steps to reproduce:
Expected behavior:
Expected behavior is that metrics show normally, once every 60 sec.
Actual behavior:
The first 3-4 minutes of metrics are missed, followed by a single instance of metrics, followed by another 3-4 minutes of missed data.
Additional info:
The telegraf log shows the following connection reset message once every 3-4 minutes:
2020-02-28T21:47:13Z I! resetting wavefront proxy connection
2020-02-28T21:47:13Z I! write tcp 10.234.11.217:36870->10.234.245.107:2878: write: broken pipe
2020-02-28T21:48:10Z I! connected to Wavefront proxy at address: wfproxy.example.net:2878
Workarounds:
The text was updated successfully, but these errors were encountered: