Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf 0.10.1 on CoreOS crashes after a few minutes #612

Closed
jjungnickel opened this issue Jan 29, 2016 · 2 comments · Fixed by #615
Closed

Telegraf 0.10.1 on CoreOS crashes after a few minutes #612

jjungnickel opened this issue Jan 29, 2016 · 2 comments · Fixed by #615

Comments

@jjungnickel
Copy link

Telegraf 0.10.1 is running in a docker container that has /var/run/docker.sock mounted.

2016-01-29T09:11:50+0000  docker[6043]: 2016/01/29 09:11:50 Wrote 152 metrics to output influxdb in 65.38961ms
2016-01-29T09:11:55+0000  docker[6043]: panic: runtime error: invalid memory address or nil pointer dereference
2016-01-29T09:11:55+0000  docker[6043]: [signal 0xb code=0x1 addr=0x0 pc=0x53436b]
2016-01-29T09:11:55+0000  docker[6043]: goroutine 1569 [running]:
2016-01-29T09:11:55+0000  docker[6043]: github.com/influxdata/telegraf/plugins/inputs/docker.gatherContainerStats(0x0, 0x7f3fc0753660, 0xc8203656c0, 0xc8206771d0)
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:135 +0x2c1b
2016-01-29T09:11:55+0000  docker[6043]: github.com/influxdata/telegraf/plugins/inputs/docker.(*Docker).gatherContainer(0xc8200d9170, 0xc8206bcec0, 0x40, 0xc8206bd000, 0x32, 0xc82055c2a0, 0x16, 0x56a73d65, 0xc820662390, 0x9, ...)
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:125 +0x679
2016-01-29T09:11:55+0000  docker[6043]: github.com/influxdata/telegraf/plugins/inputs/docker.(*Docker).Gather.func1(0xc820662830, 0xc8200d9170, 0x7f3fc0753660, 0xc8203656c0, 0xc8206bcec0, 0x40, 0xc8206bd000, 0x32, 0xc82055c2a0, 0x16, ...)
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:70 +0xa5
2016-01-29T09:11:55+0000  docker[6043]: created by github.com/influxdata/telegraf/plugins/inputs/docker.(*Docker).Gather
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:74 +0x322
2016-01-29T09:11:55+0000  docker[6043]: goroutine 1 [semacquire]:
2016-01-29T09:11:55+0000  docker[6043]: sync.runtime_Semacquire(0xc8205ada4c)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/runtime/sema.go:43 +0x26
2016-01-29T09:11:55+0000  docker[6043]: sync.(*WaitGroup).Wait(0xc8205ada40)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/sync/waitgroup.go:126 +0xb4
2016-01-29T09:11:55+0000  docker[6043]: github.com/influxdata/telegraf.(*Agent).gatherParallel(0xc820026048, 0xc8201a0840, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/agent.go:148 +0x145
2016-01-29T09:11:55+0000  docker[6043]: github.com/influxdata/telegraf.(*Agent).Run(0xc820026048, 0xc8201a0540, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/agent.go:372 +0x8e6
2016-01-29T09:11:55+0000  docker[6043]: main.main()
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:234 +0x1ca8
2016-01-29T09:11:55+0000  docker[6043]: goroutine 17 [syscall, 2 minutes, locked to thread]:
2016-01-29T09:11:55+0000  docker[6043]: runtime.goexit()
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/runtime/asm_amd64.s:1721 +0x1
2016-01-29T09:11:55+0000  docker[6043]: goroutine 5 [syscall, 2 minutes]:
2016-01-29T09:11:55+0000  docker[6043]: os/signal.loop()
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/os/signal/signal_unix.go:22 +0x18
2016-01-29T09:11:55+0000  docker[6043]: created by os/signal.init.1
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/os/signal/signal_unix.go:28 +0x37
2016-01-29T09:11:55+0000  docker[6043]: goroutine 13 [select, 2 minutes, locked to thread]:
2016-01-29T09:11:55+0000  docker[6043]: runtime.gopark(0x13416f0, 0xc82001ef28, 0x1117150, 0x6, 0xc820266018, 0x2)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/runtime/proc.go:185 +0x163
2016-01-29T09:11:55+0000  docker[6043]: runtime.selectgoImpl(0xc82001ef28, 0x0, 0x18)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/runtime/select.go:392 +0xa64
2016-01-29T09:11:55+0000  docker[6043]: runtime.selectgo(0xc82001ef28)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/runtime/select.go:212 +0x12
2016-01-29T09:11:55+0000  docker[6043]: runtime.ensureSigM.func1()
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/runtime/signal1_unix.go:227 +0x353
2016-01-29T09:11:55+0000  docker[6043]: runtime.goexit()
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/runtime/asm_amd64.s:1721 +0x1
2016-01-29T09:11:55+0000  docker[6043]: goroutine 14 [chan receive, 2 minutes]:
2016-01-29T09:11:55+0000  docker[6043]: main.main.func1(0xc8201a05a0, 0xc8201a0540, 0xc8201a4000)
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:206 +0x47
2016-01-29T09:11:55+0000  docker[6043]: created by main.main
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:216 +0x1575
2016-01-29T09:11:55+0000  docker[6043]: goroutine 16 [select]:
2016-01-29T09:11:55+0000  docker[6043]: github.com/influxdata/telegraf.(*Agent).flusher(0xc820026048, 0xc8201a0540, 0xc8201a0840, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/agent.go:275 +0x31a
2016-01-29T09:11:55+0000  docker[6043]: github.com/influxdata/telegraf.(*Agent).Run.func1(0xc8204ed400, 0xc820026048, 0xc8201a0540, 0xc8201a0840)
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/agent.go:337 +0x7f
2016-01-29T09:11:55+0000  docker[6043]: created by github.com/influxdata/telegraf.(*Agent).Run
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/agent.go:341 +0x4b3
2016-01-29T09:11:55+0000  docker[6043]: goroutine 64 [IO wait]:
2016-01-29T09:11:55+0000  docker[6043]: net.runtime_pollWait(0x7f3fc0752bd0, 0x72, 0xc820014140)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/runtime/netpoll.go:157 +0x60
2016-01-29T09:11:55+0000  docker[6043]: net.(*pollDesc).Wait(0xc8200535d0, 0x72, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
2016-01-29T09:11:55+0000  docker[6043]: net.(*pollDesc).WaitRead(0xc8200535d0, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
2016-01-29T09:11:55+0000  docker[6043]: net.(*netFD).Read(0xc820053570, 0xc820665000, 0x1000, 0x1000, 0x0, 0x7f3fc0746050, 0xc820014140)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/fd_unix.go:232 +0x23a
2016-01-29T09:11:55+0000  docker[6043]: net.(*conn).Read(0xc8200260b0, 0xc820665000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/net.go:172 +0xe4
2016-01-29T09:11:55+0000  docker[6043]: net/http.noteEOFReader.Read(0x7f3fbe114308, 0xc8200260b0, 0xc82015c528, 0xc820665000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/http/transport.go:1370 +0x67
2016-01-29T09:11:55+0000  docker[6043]: net/http.(*noteEOFReader).Read(0xc82065b5e0, 0xc820665000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: <autogenerated>:126 +0xd0
2016-01-29T09:11:55+0000  docker[6043]: bufio.(*Reader).fill(0xc82021cf00)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/bufio/bufio.go:97 +0x1e9
2016-01-29T09:11:55+0000  docker[6043]: bufio.(*Reader).Peek(0xc82021cf00, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/bufio/bufio.go:132 +0xcc
2016-01-29T09:11:55+0000  docker[6043]: net/http.(*persistConn).readLoop(0xc82015c4d0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/http/transport.go:876 +0xf7
2016-01-29T09:11:55+0000  docker[6043]: created by net/http.(*Transport).dialConn
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/http/transport.go:685 +0xc78
2016-01-29T09:11:55+0000  docker[6043]: goroutine 1576 [runnable]:
2016-01-29T09:11:55+0000  docker[6043]: sync.runtime_Semacquire(0xc82066283c)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/runtime/sema.go:43 +0x26
2016-01-29T09:11:55+0000  docker[6043]: sync.(*WaitGroup).Wait(0xc820662830)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/sync/waitgroup.go:126 +0xb4
2016-01-29T09:11:55+0000  docker[6043]: github.com/influxdata/telegraf/plugins/inputs/docker.(*Docker).Gather(0xc8200d9170, 0x7f3fc0753660, 0xc8203656c0, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:76 +0x35d
2016-01-29T09:11:55+0000  docker[6043]: github.com/influxdata/telegraf.(*Agent).gatherParallel.func1(0xc8205ada40, 0xc8201a0840, 0xc820026048, 0x0, 0xc8200d91d0)
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/agent.go:137 +0x490
2016-01-29T09:11:55+0000  docker[6043]: created by github.com/influxdata/telegraf.(*Agent).gatherParallel
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/influxdata/telegraf/agent.go:141 +0x442
2016-01-29T09:11:55+0000  docker[6043]: goroutine 172 [select]:
2016-01-29T09:11:55+0000  docker[6043]: net/http.(*persistConn).writeLoop(0xc82015cb00)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/http/transport.go:1009 +0x40c
2016-01-29T09:11:55+0000  docker[6043]: created by net/http.(*Transport).dialConn
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/http/transport.go:686 +0xc9d
2016-01-29T09:11:55+0000  docker[6043]: goroutine 65 [select]:
2016-01-29T09:11:55+0000  docker[6043]: net/http.(*persistConn).writeLoop(0xc82015c4d0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/http/transport.go:1009 +0x40c
2016-01-29T09:11:55+0000  docker[6043]: created by net/http.(*Transport).dialConn
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/http/transport.go:686 +0xc9d
2016-01-29T09:11:55+0000  docker[6043]: goroutine 171 [IO wait]:
2016-01-29T09:11:55+0000  docker[6043]: net.runtime_pollWait(0x7f3fc07545b8, 0x72, 0xc820014140)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/runtime/netpoll.go:157 +0x60
2016-01-29T09:11:55+0000  docker[6043]: net.(*pollDesc).Wait(0xc8201a45a0, 0x72, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
2016-01-29T09:11:55+0000  docker[6043]: net.(*pollDesc).WaitRead(0xc8201a45a0, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
2016-01-29T09:11:55+0000  docker[6043]: net.(*netFD).Read(0xc8201a4540, 0xc8206d2000, 0x1000, 0x1000, 0x0, 0x7f3fc0746050, 0xc820014140)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/fd_unix.go:232 +0x23a
2016-01-29T09:11:55+0000  docker[6043]: net.(*conn).Read(0xc820026030, 0xc8206d2000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/net.go:172 +0xe4
2016-01-29T09:11:55+0000  docker[6043]: net/http.noteEOFReader.Read(0x7f3fc0753050, 0xc820026030, 0xc82015cb58, 0xc8206d2000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/http/transport.go:1370 +0x67
2016-01-29T09:11:55+0000  docker[6043]: net/http.(*noteEOFReader).Read(0xc8206a80a0, 0xc8206d2000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: <autogenerated>:126 +0xd0
2016-01-29T09:11:55+0000  docker[6043]: bufio.(*Reader).fill(0xc8206963c0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/bufio/bufio.go:97 +0x1e9
2016-01-29T09:11:55+0000  docker[6043]: bufio.(*Reader).Peek(0xc8206963c0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/bufio/bufio.go:132 +0xcc
2016-01-29T09:11:55+0000  docker[6043]: net/http.(*persistConn).readLoop(0xc82015cb00)
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/http/transport.go:876 +0xf7
2016-01-29T09:11:55+0000  docker[6043]: created by net/http.(*Transport).dialConn
2016-01-29T09:11:55+0000  docker[6043]: /usr/local/go/src/net/http/transport.go:685 +0xc78
2016-01-29T09:11:55+0000  docker[6043]: goroutine 1645 [runnable]:
2016-01-29T09:11:55+0000  docker[6043]: github.com/fsouza/go-dockerclient.(*Client).Stats.func3(0xc8206bcec0, 0x40, 0xc8201a0180, 0xc8201a0100, 0xc8201a01e0, 0x12a05f200, 0xc8200261f8, 0xc8201a02a0)
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/fsouza/go-dockerclient/container.go:800 +0xe3
2016-01-29T09:11:55+0000  docker[6043]: created by github.com/fsouza/go-dockerclient.(*Client).Stats
2016-01-29T09:11:55+0000  docker[6043]: /home/telebuild/go/src/github.com/fsouza/go-dockerclient/container.go:806 +0x3a7
2016-01-29T09:11:55+0000  systemd[1]: telegraf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
2016-01-29T09:11:55+0000  docker[7221]: telegraf
2016-01-29T09:11:55+0000  systemd[1]: telegraf.service: Unit entered failed state.
2016-01-29T09:11:55+0000  systemd[1]: telegraf.service: Failed with result 'exit-code'.
@jjungnickel
Copy link
Author

Configuration:

# Telegraf configuration

# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs.

# Even if a plugin has no configuration, it must be declared in here
# to be active. Declaring a plugin means just specifying the name
# as a section with no variables. To deactivate a plugin, comment
# out the name and any variables.

# Use 'telegraf -config telegraf.toml -test' to see what metrics a config
# file would generate.

# One rule that plugins conform to is wherever a connection string
# can be passed, the values '' and 'localhost' are treated specially.
# They indicate to the plugin to use their own builtin configuration to
# connect to the local system.

# NOTE: The configuration has a few required parameters. They are marked
# with 'required'. Be sure to edit those to make this configuration work.

# Tags can also be specified via a normal map, but only one form at a time:
[tags]
  # dc = "us-east-1"
  datacenter="eu-central-1"
  stack="****anonymized****"



# Configuration for telegraf agent
[agent]
  # Default data collection interval for all plugins
  interval = "10s"
  # Rounds collection interval to 'interval'
  # ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  # Default data flushing interval for all outputs. You should not set this below
  # interval. Maximum flush_interval will be flush_interval + flush_jitter
  flush_interval = "10s"
  # Jitter the flush interval by a random amount. This is primarily to avoid
  # large write spikes for users running a large number of telegraf instances.
  # ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  # Run telegraf in debug mode
  debug = false
  # Override default hostname, if empty use os.Hostname()
  hostname = "****anonymized****"


###############################################################################
#                                  OUTPUTS                                    #
###############################################################################

# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
  # The full HTTP or UDP endpoint URL for your InfluxDB instance.
  # Multiple urls can be specified but it is assumed that they are part of the same
  # cluster, this means that only ONE of the urls will be written to each interval.
  # urls = ["udp://localhost:8089"] # UDP endpoint example
  urls = ["****anonymized****"] # required
  # The target database for metrics (telegraf will create it if not exists)
  database = "telegraf" # required
  # Precision of writes, valid values are n, u, ms, s, m, and h
  # note: using second precision greatly helps InfluxDB compression
  precision = "s"

  # Connection timeout (for the connection with InfluxDB), formatted as a string.
  # If not provided, will default to 0 (no timeout)
  # timeout = "5s"
  # username = "telegraf"
  # password = "metricsmetricsmetricsmetrics"
  # Set the user agent for HTTP POSTs (can be useful for log differentiation)
  # user_agent = "telegraf"
  # Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes)
  # udp_payload = 512


###############################################################################
#                                  INPUTS                                     #
###############################################################################

# Read metrics about cpu usage
[[inputs.cpu]]
  # Whether to report per-cpu stats or not
  percpu = true
  # Whether to report total system cpu stats or not
  totalcpu = true
  # Comment this line if you want the raw CPU time metrics
  drop = ["cpu_time"]

# Read metrics about disk usage by mount point
[[inputs.disk]]
  # By default, telegraf gather stats for all mountpoints.
  # Setting mountpoints will restrict the stats to the specified mountpoints.
  # Mountpoints=["/"]

# Read metrics about disk IO by device
[[inputs.diskio]]
  # By default, telegraf will gather stats for all devices including
  # disk partitions.
  # Setting devices will restrict the stats to the specified devices.
  # Devices=["sda","sdb"]
  # Uncomment the following line if you do not need disk serial numbers.
  # SkipSerialNumber = true

[[inputs.mem]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.docker]]
  # Docker Endpoint
  #   To use TCP, set endpoint = "tcp://[ip]:[port]"
  #   To use environment variables (ie, docker-machine), set endpoint = "ENV"
  endpoint = "unix:///var/run/docker.sock"
  # Only collect metrics for these containers, collect all if empty
  container_names = []

[[inputs.ping]]
  urls = ["****anonymized****"]
  count = 1

sparrc added a commit that referenced this issue Jan 29, 2016
@sparrc
Copy link
Contributor

sparrc commented Jan 29, 2016

thanks for the report @jjungnickel, I can see the problem here, I'll have a fix in 0.10.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants