Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use timeout smaller than 10 seconds #959

Closed
wants to merge 1 commit into from

Conversation

PierreF
Copy link
Contributor

@PierreF PierreF commented Apr 4, 2016

Mongo and Prometheus use timeout of 10 seconds. At least with Mongo, 10 seconds timeout raise strange behavior, probably because metrics collection took more time than the metric collection interval (10 second - default config):

Data wrote by telegraf are no longer "rounded" to 10 seconds and create hole if you round them using group by time(10s)
second:

> select usage_idle from cpu where cpu='cpu-total' and time >= '2016-04-04T09:56:37Z' and time <= '2016-04-04T09:57:23Z'
name: cpu
---------
time                    usage_idle
2016-04-04T09:56:37Z     88.7699366396944
2016-04-04T09:56:49Z     84.20123565754106
2016-04-04T09:57:00Z     86.28820960698596
2016-04-04T09:57:12Z     85.76434515993311
2016-04-04T09:57:23Z     82.51209854815667

> select mean(usage_idle) from cpu where cpu='cpu-total' and time >= '2016-04-04T09:56:37Z' and time <= '2016-04-04T09:57:23Z'  group by time(10s)
name: cpu
---------
time                    mean
2016-04-04T09:56:30Z     88.7699366396944
2016-04-04T09:56:40Z     84.20123565754106
2016-04-04T09:56:50Z
2016-04-04T09:57:00Z     86.28820960698596
2016-04-04T09:57:10Z     85.76434515993311
2016-04-04T09:57:20Z     82.51209854815667

Telegraf logs:

2016/04/04 11:55:31 Starting Telegraf (version 0.11.1-75-g357849c)
[...]
2016/04/04 11:56:40 Wrote 35 metrics to output influxdb in 5.402099ms
error dialing over ssl, no reachable servers
2016/04/04 11:56:49 Error in input [mongodb]: Unable to connect to MongoDB, no reachable servers
2016/04/04 11:56:49 Gathered metrics, (10s interval), from 11 inputs in 11.507262321s
2016/04/04 11:56:50 Wrote 35 metrics to output influxdb in 4.578914ms
error dialing over ssl, no reachable servers
2016/04/04 11:57:00 Error in input [mongodb]: Unable to connect to MongoDB, no reachable servers
2016/04/04 11:57:00 Gathered metrics, (10s interval), from 11 inputs in 11.513853869s
[NOTE: no data wrote at 11:57:00]
2016/04/04 11:57:10 Wrote 35 metrics to output influxdb in 4.51493ms

This PR use a slightly smaller timeout (8 seconds) which make Telegraf behave as usual: datapoint sent every 10 second and rounded to tenth of seconds (0, 10, 20, ...).

@sebito91
Copy link
Contributor

sebito91 commented Apr 4, 2016

Isn't this handled in the [agent] section of the config?

https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md

@PierreF
Copy link
Contributor Author

PierreF commented Apr 4, 2016

I don't know which option in [agent] section you are referring.

The timeout of Mongo and Prometheus (and all other timeout) are hard-coded value in .go file. My issue is that when timeout of metric gathering is the same as metric interval, I have the strange behavior described above.

I do not want to increase metric interval, I want to keep the default of 10 seconds.

@sparrc
Copy link
Contributor

sparrc commented Apr 4, 2016

maybe we should set it to 5s? since that's what we have all of our http/tcp dial timeouts set to

@sparrc sparrc closed this in 5fe8903 Apr 4, 2016
@PierreF PierreF deleted the timeout2 branch August 4, 2018 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants