Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraph writes 1 batch from MQTT to to InfluxDB and then restarts with memory error #6142

Closed
dogada opened this issue Jul 19, 2019 · 2 comments · Fixed by #6149
Closed
Labels
area/mqtt bug unexpected problem or unintended behavior upstream bug or issues that rely on dependency fixes
Milestone

Comments

@dogada
Copy link

dogada commented Jul 19, 2019

Relevant telegraf.conf:

[global_tags]
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "5s"
  precision = ""
  debug = true
  quiet = false
  hostname = "telegraf"
  omit_hostname = false

[[outputs.influxdb]]
  urls = ["http://influxdb:8086"] # required
  database = "telegraf" # default database
  retention_policy = ""
  namedrop = ["iot_*", "demo_*"]

[[outputs.influxdb]]
  urls = ["http://influxdb:8086"]
  database = "iot"
  retention_policy = ""
  namepass = ["iot_*"]

[[outputs.influxdb]]
  urls = ["http://influxdb:8086"]
  database = "demo" 
  retention_policy = ""
  namepass = ["demo_*"]


[[inputs.mqtt_consumer]]
  servers = ["ssl://hub:8883"]
  insecure_skip_verify = true   ## Use TLS but skip chain & host verification
  qos = 1
  topics = [
    "matrix/zenon/influx",
    "matrix/demo/influx"
  ]
  persistent_session = true
  client_id = "telegraf_matrix"
  username = "matrix"
  password = "..."
  data_format = "influx"

System info:

Telegraf 1.10.4 in Docker version 18.06.1-ce, build e68fc7a
Ubuntu 18.04 LTS

Steps to reproduce:

Start TICK stack, configure Telegraf to read from MQTT and write to Influx and see failures every 2-3 hours. It worked fine when traffic to MQTT was approx. 500 records/second, but then when MQTT traffic became 3000 records/seconds, telegraf started to fail constantly.

Expected behavior:

Should read from MQTT and write to Influx without stops.

Actual behavior:

Works 2-3 hours, than becomes unstable and constantly restarts.

telegraf_1    | 2019-07-19T10:10:10Z I! Starting Telegraf 1.10.4
telegraf_1    | 2019-07-19T10:10:10Z I! Using config file: /etc/telegraf/telegraf.conf
telegraf_1    | 2019-07-19T10:10:10Z I! Loaded inputs: mem diskio disk mqtt_consumer mqtt_consumer
telegraf_1    | 2019-07-19T10:10:10Z I! Loaded aggregators: 
telegraf_1    | 2019-07-19T10:10:10Z I! Loaded processors: 
telegraf_1    | 2019-07-19T10:10:10Z I! Loaded outputs: influxdb influxdb influxdb
telegraf_1    | 2019-07-19T10:10:10Z I! Tags enabled: host=telegraf
telegraf_1    | 2019-07-19T10:10:10Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"telegraf", Flush Interval:10s
telegraf_1    | 2019-07-19T10:10:10Z D! [agent] Connecting outputs
telegraf_1    | 2019-07-19T10:10:10Z D! [agent] Attempting connection to output: influxdb
telegraf_1    | 2019-07-19T10:10:10Z D! [agent] Successfully connected to output: influxdb
telegraf_1    | 2019-07-19T10:10:10Z D! [agent] Attempting connection to output: influxdb
telegraf_1    | 2019-07-19T10:10:10Z D! [agent] Successfully connected to output: influxdb
telegraf_1    | 2019-07-19T10:10:10Z D! [agent] Attempting connection to output: influxdb
telegraf_1    | 2019-07-19T10:10:10Z D! [agent] Successfully connected to output: influxdb
telegraf_1    | 2019-07-19T10:10:10Z D! [agent] Starting service inputs
telegraf_1    | 2019-07-19T10:10:10Z I! [inputs.mqtt_consumer] Connected [ssl://hub:8883]
telegraf_1    | 2019-07-19T10:10:10Z I! [inputs.mqtt_consumer] Connected [ssl://hub:8883]
telegraf_1    | 2019-07-19T10:10:20Z D! [outputs.influxdb] wrote batch of 1000 metrics in 68.453001ms
telegraf_1    | 2019-07-19T10:10:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
telegraf_1    | panic: runtime error: invalid memory address or nil pointer dereference
telegraf_1    | [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1247983]
telegraf_1    | 
telegraf_1    | goroutine 91 [running]:
telegraf_1    | github.com/influxdata/telegraf/vendor/github.com/eclipse/paho%2emqtt%2egolang.outgoing(0xc00040f500)
telegraf_1    |         /go/src/github.com/influxdata/telegraf/vendor/github.com/eclipse/paho.mqtt.golang/net.go:202 +0xb33
telegraf_1    | created by github.com/influxdata/telegraf/vendor/github.com/eclipse/paho%2emqtt%2egolang.(*client).Connect.func1
telegraf_1    |         /go/src/github.com/influxdata/telegraf/vendor/github.com/eclipse/paho.mqtt.golang/client.go:295 +0x6e7
telegraf_1    | 2019-07-19T10:11:20Z I! Starting Telegraf 1.10.4
telegraf_1    | 2019-07-19T10:11:20Z I! Using config file: /etc/telegraf/telegraf.conf
telegraf_1    | 2019-07-19T10:11:20Z I! Loaded inputs: mqtt_consumer mqtt_consumer mem diskio disk
telegraf_1    | 2019-07-19T10:11:20Z I! Loaded aggregators: 
telegraf_1    | 2019-07-19T10:11:20Z I! Loaded processors: 
telegraf_1    | 2019-07-19T10:11:20Z I! Loaded outputs: influxdb influxdb influxdb
telegraf_1    | 2019-07-19T10:11:20Z I! Tags enabled: host=telegraf
telegraf_1    | 2019-07-19T10:11:20Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"telegraf", Flush Interval:10s
telegraf_1    | 2019-07-19T10:11:20Z D! [agent] Connecting outputs
telegraf_1    | 2019-07-19T10:11:20Z D! [agent] Attempting connection to output: influxdb
telegraf_1    | 2019-07-19T10:11:20Z D! [agent] Successfully connected to output: influxdb
telegraf_1    | 2019-07-19T10:11:20Z D! [agent] Attempting connection to output: influxdb
telegraf_1    | 2019-07-19T10:11:20Z D! [agent] Successfully connected to output: influxdb
telegraf_1    | 2019-07-19T10:11:20Z D! [agent] Attempting connection to output: influxdb
telegraf_1    | 2019-07-19T10:11:20Z D! [agent] Successfully connected to output: influxdb
telegraf_1    | 2019-07-19T10:11:20Z D! [agent] Starting service inputs
telegraf_1    | 2019-07-19T10:11:20Z I! [inputs.mqtt_consumer] Connected [ssl://hub:8883]
telegraf_1    | 2019-07-19T10:11:20Z I! [inputs.mqtt_consumer] Connected [ssl://hub:8883]
telegraf_1    | 2019-07-19T10:11:30Z D! [outputs.influxdb] wrote batch of 1000 metrics in 25.164782ms
telegraf_1    | 2019-07-19T10:11:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
telegraf_1    | panic: runtime error: invalid memory address or nil pointer dereference
telegraf_1    | [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1247983]
telegraf_1    | 
telegraf_1    | goroutine 70 [running]:
telegraf_1    | github.com/influxdata/telegraf/vendor/github.com/eclipse/paho%2emqtt%2egolang.outgoing(0xc000090700)
telegraf_1    |         /go/src/github.com/influxdata/telegraf/vendor/github.com/eclipse/paho.mqtt.golang/net.go:202 +0xb33
telegraf_1    | created by github.com/influxdata/telegraf/vendor/github.com/eclipse/paho%2emqtt%2egolang.(*client).Connect.func1
telegraf_1    |         /go/src/github.com/influxdata/telegraf/vendor/github.com/eclipse/paho.mqtt.golang/client.go:295 +0x6e7
@dogada
Copy link
Author

dogada commented Jul 20, 2019

As it stated at eclipse-paho/paho.mqtt.golang#341 (comment) above error may be caused by outdated paho.mqtt.golang dependency. Can it be updated to 1.2 version?

@GeorgeMac GeorgeMac added area/mqtt bug unexpected problem or unintended behavior upstream bug or issues that rely on dependency fixes labels Jul 22, 2019
@danielnelson danielnelson added this to the 1.12.0 milestone Jul 22, 2019
@dogada
Copy link
Author

dogada commented Sep 3, 2019

We tested nightly builds as asked in #6149 in our environment and we don't see stack trace mentioned in #6142 but the problem still persists but with complete silence in logs now. Once per 3-4 days Telegraf just stops to receive data from MQTT broker (we use Mosqitto) and we don't see any diagnostic messages in the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/mqtt bug unexpected problem or unintended behavior upstream bug or issues that rely on dependency fixes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants