Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic from multiple graphite outputs #1432

Closed
kostasb opened this issue Jul 1, 2016 · 2 comments
Closed

Panic from multiple graphite outputs #1432

kostasb opened this issue Jul 1, 2016 · 2 comments
Labels
bug unexpected problem or unintended behavior

Comments

@kostasb
Copy link

kostasb commented Jul 1, 2016

Version tested: 1.0-beta2

Running Telegraf with multiple graphite output plugins results to a panic under load (>1000 points per second).

Config file:

[agent]
  interval = "10s"
  flush_interval = "10s"
  metric_batch_size = 10000
  metric_buffer_limit = 100000
  omit_hostname = true

[[inputs.tcp_listener]]
  service_address = ":2003"
  allowed_pending_messages = 100000
  max_tcp_connections = 1000
  data_format = "graphite"
  templates=["measurement.host.field*"]

[[outputs.graphite]]
  servers = ["graphiteserver1:2004"]
  prefix = ""
  template = "measurement.host.field"
  timeout = 5

[[outputs.graphite]]
  servers = ["graphiteserver2:2003"]
  prefix = ""
  template = "measurement.host.field"
  timeout = 5 

Panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x46521c]

goroutine 208 [running]:
panic(0x10dc2e0, 0xc8200100a0)
/usr/local/go/src/runtime/panic.go:481 +0x3e6
strings.Join(0xc820e19380, 0x3, 0x4, 0x1297330, 0x1, 0x0, 0x0)
/usr/local/go/src/strings/strings.go:364 +0x172
github.com/influxdata/telegraf/plugins/serializers/graphite<http://github.com/influxdata/telegraf/plugins/serializers/graphite>.(*GraphiteSerializer).SerializeBucketName(0xc8204393e0, 0x0, 0x7, 0xc82233e960, 0x0, 0x0)
/root/go/src/github.com/influxdata/telegraf/plugins/serializers/graphite/graphite.go:100<http://github.com/influxdata/telegraf/plugins/serializers/graphite/graphite.go:100> +0x56b
github.com/influxdata/telegraf/plugins/serializers/graphite<http://github.com/influxdata/telegraf/plugins/serializers/graphite>.(*GraphiteSerializer).Serialize(0xc8204393e0, 0x7fc3bea6f530, 0xc82182e540, 0x0, 0x0, 0x0, 0x0, 0x0)
/root/go/src/github.com/influxdata/telegraf/plugins/serializers/graphite/graphite.go:28<http://github.com/influxdata/telegraf/plugins/serializers/graphite/graphite.go:28> +0x16d
github.com/influxdata/telegraf/plugins/outputs/graphite<http://github.com/influxdata/telegraf/plugins/outputs/graphite>.(*Graphite).Write(0xc820078300, 0xc82127e000, 0x1c43, 0x1c43, 0x0, 0x0)
/root/go/src/github.com/influxdata/telegraf/plugins/outputs/graphite/graphite.go:85<http://github.com/influxdata/telegraf/plugins/outputs/graphite/graphite.go:85> +0x275
github.com/influxdata/telegraf/internal/models<http://github.com/influxdata/telegraf/internal/models>.(*RunningOutput).write(0xc82000aff0, 0xc82127e000, 0x1c43, 0x1c43, 0x0, 0x0)
/root/go/src/github.com/influxdata/telegraf/internal/models/running_output.go:145<http://github.com/influxdata/telegraf/internal/models/running_output.go:145> +0xcd
github.com/influxdata/telegraf/internal/models<http://github.com/influxdata/telegraf/internal/models>.(*RunningOutput).Write(0xc82000aff0, 0x0, 0x0)
/root/go/src/github.com/influxdata/telegraf/internal/models/running_output.go:131<http://github.com/influxdata/telegraf/internal/models/running_output.go:131> +0x527
github.com/influxdata/telegraf/agent<http://github.com/influxdata/telegraf/agent>.(*Agent).flush.func1(0xc8215b1100, 0xc82000aff0)
/root/go/src/github.com/influxdata/telegraf/agent/agent.go:242<http://github.com/influxdata/telegraf/agent/agent.go:242> +0x6e
created by github.com/influxdata/telegraf/agent<http://github.com/influxdata/telegraf/agent>.(*Agent).flush
/root/go/src/github.com/influxdata/telegraf/agent/agent.go:247<http://github.com/influxdata/telegraf/agent/agent.go:247> +0xc0
2016/06/30 13:30:50 Starting Telegraf (version 1.1.0~n201606300825) 

Configuring both servers in the same output plugin seems stable so far.

[[outputs.graphite]]
servers = ["graphiteserver1:2004","graphiteserver2:2003"]
prefix = ""
template = "measurement.host.field"
timeout = 5

@sparrc
Copy link
Contributor

sparrc commented Jul 2, 2016

I haven't been able to reproduce even at very high loads, could you provide some log lines before so I can get an idea of the number of metrics being written?

@kostasb
Copy link
Author

kostasb commented Jul 4, 2016

These are the lines we captured during the latest test:

2016/06/30 13:30:48 Output [graphite] wrote batch of 10000 metrics in 78.87758ms
2016/06/30 13:30:48 Output [graphite] wrote batch of 10000 metrics in 85.623467ms
2016/06/30 13:30:50 Output [graphite] buffer fullness: 7235 / 100000 metrics. Total gathered metrics: 87235. Total dropped metrics: 0.
2016/06/30 13:30:50 Output [graphite] buffer fullness: 7235 / 100000 metrics. Total gathered metrics: 87235. Total dropped metrics: 0.

@sparrc sparrc added the bug unexpected problem or unintended behavior label Jul 12, 2016
sparrc added a commit that referenced this issue Jul 13, 2016
This is for better thread-safety when running with multiple outputs,
which can cause very odd panics at very high loads

primarily this is to address #1432
sparrc added a commit that referenced this issue Jul 13, 2016
This is for better thread-safety when running with multiple outputs,
which can cause very odd panics at very high loads

primarily this is to address #1432
sparrc added a commit that referenced this issue Jul 14, 2016
This is for better thread-safety when running with multiple outputs,
which can cause very odd panics at very high loads

primarily this is to address #1432
sparrc added a commit that referenced this issue Jul 14, 2016
This is for better thread-safety when running with multiple outputs,
which can cause very odd panics at very high loads

primarily this is to address #1432
sparrc added a commit that referenced this issue Jul 14, 2016
This is for better thread-safety when running with multiple outputs,
which can cause very odd panics at very high loads

primarily this is to address #1432

closes #1432
sparrc added a commit that referenced this issue Jul 14, 2016
This is for better thread-safety when running with multiple outputs,
which can cause very odd panics at very high loads

primarily this is to address #1432

closes #1432
sparrc added a commit that referenced this issue Jul 14, 2016
This is for better thread-safety when running with multiple outputs,
which can cause very odd panics at very high loads

primarily this is to address #1432

closes #1432
@sparrc sparrc closed this as completed in bfdd665 Jul 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants