Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wavefront output plugin: idle connection timeout causes lost metrics #7160

Closed
randallt opened this issue Mar 12, 2020 · 7 comments
Closed

wavefront output plugin: idle connection timeout causes lost metrics #7160

randallt opened this issue Mar 12, 2020 · 7 comments
Labels
area/wavefront docs Issues related to Telegraf documentation and configuration descriptions

Comments

@randallt
Copy link

I ran into a very confusing issue that I'd like to make sure gets documented and hopefully fixed so others don't fall into the same issue. Using Telegraf 1.13.4 on an AWS EC2 instance passing through an AWS Internal ELB, I was seeing missing metrics and connection errors in the Telegraf logs.

Relevant telegraf.conf:

[[outputs.wavefront]]
  host = "wfproxy.example.net"
  port = 2878
  metric_separator = "."
  convert_paths = true

System info:

Telegraf 1.13.4
Amazon Linux 2
Connection through AWS Internal ELB with default 60-sec idle connection timeout

Steps to reproduce:

  1. Create EC2 instance
  2. Install telegraf 1.13.4
  3. Configure to use WF Proxy endpoint through AWS Internal ELB
  4. Configure telegraf interval to 60 sec
  5. Start telegraf

Expected behavior:

Expected behavior is that metrics show normally, once every 60 sec.

Actual behavior:

The first 3-4 minutes of metrics are missed, followed by a single instance of metrics, followed by another 3-4 minutes of missed data.

Additional info:

The telegraf log shows the following connection reset message once every 3-4 minutes:
2020-02-28T21:47:13Z I! resetting wavefront proxy connection
2020-02-28T21:47:13Z I! write tcp 10.234.11.217:36870->10.234.245.107:2878: write: broken pipe
2020-02-28T21:48:10Z I! connected to Wavefront proxy at address: wfproxy.example.net:2878

Workarounds:

  1. If I change the AWS Internal ELB idle connection timeout above 60 sec, then things seem to work normally.
  2. If I change the Wavefront output plugin to use 'http' mode by specifying the 'url' setting instead of 'host' and 'port', then it also seems to work normally (perhaps an http keep-alive is sent).
@danielnelson danielnelson added area/wavefront docs Issues related to Telegraf documentation and configuration descriptions labels Mar 12, 2020
@danielnelson
Copy link
Contributor

cc @puckpuck

@puckpuck
Copy link
Contributor

puckpuck commented Mar 13, 2020

I need to defer this one to @vikramraman and @prydin whom will be taking over the Wavefront plugins.

From the surface this could be an issue as part of the wavefront-sdk. Perhaps a version update is needed?

@KarthikAthisamy
Copy link

Is there any update on this open issue? I too see the same kind of problem:

2020-08-25T14:12:08Z E! [agent] Error writing to outputs.wavefront: Wavefront sending error: write tcp 10.x.xxx.xx:35042->172.20.56.248:2878: write: broken pipe
2020-08-25T14:12:12Z E! [agent] Error writing to outputs.wavefront: Wavefront sending error: write tcp 10.x.xxx.xx:35042->172.20.56.248:2878: write: broken pipe
2020-08-25T14:12:13Z I! resetting wavefront proxy connection
2020-08-25T14:12:13Z I! write tcp 10.x.xxx.xx:35042->172.xx.xx.xxx:2878: write: broken pipe

@prydin
Copy link
Contributor

prydin commented Aug 27, 2020

Running the Wavefront output in "plain socket mode" assumes a stable connection between Telegraf and the Wavefront proxy. That requirement is not satisfied if the load balancer resets the connection after 60 seconds. I would recommend using the HTTP protocol instead, as it is a lot more load balancer friendly and doesn't require a stable connection.

@randallt
Copy link
Author

@prydin, can you give the exact format we'd need to use for http mode, and since which telegraf version it has been supported?

@randallt
Copy link
Author

See my workaround #2 in the description above for how to use http mode. Essentially, using a url field instead of host and port. The wavefront output plugin doc has an example I believe.

@sspaink
Copy link
Contributor

sspaink commented Nov 2, 2022

I think this was resolved in: #11560

@sspaink sspaink closed this as completed Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/wavefront docs Issues related to Telegraf documentation and configuration descriptions
Projects
None yet
Development

No branches or pull requests

6 participants