[0.11.1] Ordering of config exposed potential race in addOutput() #1090

wrigtim · 2016-04-25T12:23:24Z

Currently running 0.11.1 with a few internal patches. I noticed our max_buffer_limit config was not being respected on some systems, and with some more testing noticed there seems to be a race on startup where the config isn't honored every time?

I've since re-ordered our standard config to define [agent] before [outputs], which seems to reduce how easy this is to reproduce (no scientific analysis here as yet!).

[root@host1 ~]# grep metric_buffer /etc/telegraf/telegraf.conf 
  metric_buffer_limit = 32768

Starting the agent twice in quick succession - first starts with metric_buffer_limit of 32768, and the second starts with zero...

[root@host1 ~]# telegraf -config /etc/telegraf/telegraf.conf 
2016/04/25 07:16:41 Creating runningoutput with metric_buffer_limit 32768
2016/04/25 07:16:41 Creating runningoutput with metric_buffer_limit 32768
2016/04/25 07:16:41 Starting Telegraf (version 0.11.3-4)
2016/04/25 07:16:41 Loaded outputs: influxdb influxdb
2016/04/25 07:16:41 Loaded inputs: cpu disk diskio mem swap system netif
2016/04/25 07:16:41 Agent Config: Interval:9s, Debug:false, Quiet:false, Hostname:"host1", Flush Interval:17.308906277s 
2016/04/25 07:16:41 Gathered metrics, (separate 5m0s interval), from disk in 442.32µs
2016/04/25 07:16:42 Hang on, flushing any cached metrics before shutdown
2016/04/25 07:16:42 Wrote 9 metrics to output influxdb in 78.334µs
2016/04/25 07:16:42 Wrote 9 metrics to output influxdb in 89.99µs
^C

[root@host1 ~]# telegraf -config /etc/telegraf/telegraf.conf 
2016/04/25 07:16:43 Creating runningoutput with metric_buffer_limit 0
2016/04/25 07:16:43 Creating runningoutput with metric_buffer_limit 0
2016/04/25 07:16:43 Starting Telegraf (version 0.11.3-4)
2016/04/25 07:16:43 Loaded outputs: influxdb influxdb
2016/04/25 07:16:43 Loaded inputs: disk diskio mem swap system netif cpu
2016/04/25 07:16:43 Agent Config: Interval:9s, Debug:false, Quiet:false, Hostname:"host1", Flush Interval:16.260703541s 
2016/04/25 07:16:43 Gathered metrics, (separate 5m0s interval), from disk in 426.628µs
2016/04/25 07:16:44 Hang on, flushing any cached metrics before shutdown
2016/04/25 07:16:44 Wrote 9 metrics to output influxdb in 82.83µs
2016/04/25 07:16:44 Wrote 9 metrics to output influxdb in 177.271µs
2016/04/25 07:16:45 Gathered metrics, (9s interval), from 6 inputs in 1.971490474s

PS: the 'created runningoutput' line was added with:

--- a/telegraf/src/github.com/influxdata/telegraf/internal/config/config.go
+++ b/telegraf/src/github.com/influxdata/telegraf/internal/config/config.go
@@ -422,6 +422,7 @@ func (c *Config) addOutput(name string, table *ast.Table) error {
        }

        ro := internal_models.NewRunningOutput(name, output, outputConfig)
+       log.Printf("Creating runningoutput with metric_buffer_limit %d\n", c.Agent.MetricBufferLimit)
        if c.Agent.MetricBufferLimit > 0 {
                ro.MetricBufferLimit = c.Agent.MetricBufferLimit
        }

The text was updated successfully, but these errors were encountered:

sparrc · 2016-04-25T15:42:25Z

thanks for the report, I'll dig into this

closes #1090

sparrc · 2016-04-29T22:19:45Z

Found the problem, I was relying on the map order to parse various portions of the config, which appears to usually be returned in-order, but sometimes (about 15% of the time) gets returned out-of-order.

When out-of-order, the output configs would get loaded before the agent config, resulting in outputs that didn't have the agent config applied!

thanks for the report, I'll have the fix merged soon (pr #1130)

closes #1090

sparrc · 2016-04-29T22:40:45Z

BTW @wrigtim @sebito91 there is also a change coming to the metric_buffer_limit config option that simplifies configuration and buffering. See the 2nd release note here: https://github.com/influxdata/telegraf/blob/master/CHANGELOG.md

sparrc added the bug unexpected problem or unintended behavior label Apr 25, 2016

sparrc added a commit that referenced this issue Apr 29, 2016

map return order agent config bug fix

561b5b4

closes #1090

sparrc mentioned this issue Apr 29, 2016

agent and tags configs sometimes not applied #1130

Closed

sparrc added a commit that referenced this issue Apr 29, 2016

map return order agent config bug fix

304fcd2

closes #1090

sparrc added a commit that referenced this issue Apr 29, 2016

agent and tags configs sometimes not applied

556ac60

closes #1090

sparrc closed this as completed in 4e9798d Apr 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.11.1] Ordering of config exposed potential race in addOutput() #1090

[0.11.1] Ordering of config exposed potential race in addOutput() #1090

wrigtim commented Apr 25, 2016 •

edited

Loading

sparrc commented Apr 25, 2016

sparrc commented Apr 29, 2016

sparrc commented Apr 29, 2016

[0.11.1] Ordering of config exposed potential race in addOutput() #1090

[0.11.1] Ordering of config exposed potential race in addOutput() #1090

Comments

wrigtim commented Apr 25, 2016 • edited Loading

sparrc commented Apr 25, 2016

sparrc commented Apr 29, 2016

sparrc commented Apr 29, 2016

wrigtim commented Apr 25, 2016 •

edited

Loading