Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random crash [inputs.ethtool] SIGSEGV: segmentation violation #11285

Closed
Derek-K opened this issue Jun 13, 2022 · 10 comments
Closed

Random crash [inputs.ethtool] SIGSEGV: segmentation violation #11285

Derek-K opened this issue Jun 13, 2022 · 10 comments
Labels
bug unexpected problem or unintended behavior

Comments

@Derek-K
Copy link

Derek-K commented Jun 13, 2022

Relevant telegraf.conf

# Returns ethtool statistics for given interfaces
[[inputs.ethtool]]
  ## List of interfaces to pull metrics for
  interface_include = ["eth0"]

Logs from Telegraf

2022-06-12T17:00:35Z W! [agent] ["outputs.influxdb"] did not complete within its flush interval
2022-06-12T17:00:35Z W! [inputs.system] Collection took longer than expected; not complete after interval of 10s
2022-06-12T17:00:35Z W! [inputs.netstat] Collection took longer than expected; not complete after interval of 10s
2022-06-12T17:00:35Z W! [inputs.ethtool] Collection took longer than expected; not complete after interval of 10s
2022-06-12T17:00:35Z W! [inputs.kernel] Collection took longer than expected; not complete after interval of 10s
2022-06-12T17:00:35Z W! [inputs.net] Collection took longer than expected; not complete after interval of 10s
2022-06-12T21:40:50Z E! [inputs.ethtool] Error in plugin: eth0 driver: operation not permitted
SIGSEGV: segmentation violation
PC=0x85ecc m=9 sigcode=1

goroutine 56649 [running]:
runtime: unexpected return pc for runtime.exitsyscall called from 0x0
stack: frame={sp:0x9c9d418, fp:0x9c9d430} stack=[0x9c9d000,0x9c9d800)
0x09c9d398:  0x5f787833  0x00687465 <github.com/jhump/protoreflect/desc/protoparse.setOptionField+0x00000051>  0x00000000  0x00000000
0x09c9d3a8:  0x00000000  0x00000000  0x00000000  0x00342e31 <net/http.(*Transport).removeIdleConnLocked+0x00000159>
0x09c9d3b8:  0x6b2d362e  0x776b7269  0x2d646f6f  0x2d646c74
0x09c9d3c8:  0x00000031  0x00000000  0x00000000  0x00412f4e <github.com/tidwall/gjson.parseArray.func1+0x000003ee>
0x09c9d3d8:  0x00000000  0x00000000  0x00000000  0x00000000
0x09c9d3e8:  0x00000000  0x00000000  0x00000000  0x74616c70
0x09c9d3f8:  0x6d726f66  0x00000000  0x00000000  0x00000000
0x09c9d408:  0x00000000  0x00000000  0x00000000  0x00000000
0x09c9d418: <0x00000000  0x00000000  0x00000000  0x00000000
0x09c9d428:  0x00000000  0x00000000 >0x00000000  0x00000000
0x09c9d438:  0x00000000  0x00000000  0x00000000  0x00000028
0x09c9d448:  0x00000000  0x00000000  0x00000000  0x00000000
0x09c9d458:  0x0a1ea000  0x00113d58 <os.(*File).Read+0x00000070>  0x00000009  0x0a1ea000
0x09c9d468:  0x00001000  0x00001000  0x00000000  0x000265c4 <runtime.heapBits.initSpan+0x00000064>
0x09c9d478:  0x66de0b90  0x01000000 <github.com/miekg/dns.(*NID).unpack+0x00000034>  0x00000009  0x00001000
0x09c9d488:  0x099aea14  0x0010942c <internal/poll.(*FD).Read.func1+0x00000000>  0x099aea00  0x09c9d48c
0x09c9d498:  0x001ae330 <bufio.(*Reader).fill+0x0000010c>  0x099aea00  0x0a1ea000  0x00001000
0x09c9d4a8:  0x00001000  0x00000000
runtime.exitsyscall()
        /usr/local/go/src/runtime/proc.go:3813 +0x1a8 fp=0x9c9d430 sp=0x9c9d418 pc=0x85ecc
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce
        /go/src/github.com/influxdata/telegraf/agent/agent.go:486 +0xe4

goroutine 1 [semacquire, 402 minutes]:
sync.runtime_Semacquire(0x9bc12a8)
        /usr/local/go/src/runtime/sema.go:56 +0x34
sync.(*WaitGroup).Wait(0x9bc12a0)
        /usr/local/go/src/sync/waitgroup.go:136 +0x94
github.com/influxdata/telegraf/agent.(*Agent).Run(0x9f29be8, {0x52fad18, 0x9f4fd70})
        /go/src/github.com/influxdata/telegraf/agent/agent.go:181 +0x8a8
main.runAgent({0x52fad18, 0x9f4fd70}, {0x7ac5804, 0x0, 0x0}, {0x7ac5804, 0x0, 0x0})
        /go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:328 +0x13e0
main.reloadLoop({0x7ac5804, 0x0, 0x0}, {0x7ac5804, 0x0, 0x0})
        /go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:158 +0x21c
main.run(...)
        /go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf_posix.go:8
main.main()
        /go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:501 +0xcb0

goroutine 7 [select]:
go.opencensus.io/stats/view.(*worker).start(0x9bac360)
        /go/pkg/mod/go.opencensus.io@v0.23.0/stats/view/worker.go:276 +0xa0
created by go.opencensus.io/stats/view.init.0
        /go/pkg/mod/go.opencensus.io@v0.23.0/stats/view/worker.go:34 +0x98

goroutine 8 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x7aa9c40)
        /go/pkg/mod/k8s.io/klog/v2@v2.30.0/klog.go:1181 +0x70
created by k8s.io/klog/v2.init.0
        /go/pkg/mod/k8s.io/klog/v2@v2.30.0/klog.go:420 +0x128

goroutine 10 [chan receive]:
github.com/ClickHouse/clickhouse-go.init.0.func1()
        /go/pkg/mod/github.com/!click!house/clickhouse-go@v1.5.4/bootstrap.go:48 +0x40
created by github.com/ClickHouse/clickhouse-go.init.0
        /go/pkg/mod/github.com/!click!house/clickhouse-go@v1.5.4/bootstrap.go:45 +0x40

goroutine 19 [syscall, 402 minutes]:
os/signal.signal_recv()
        /usr/local/go/src/runtime/sigqueue.go:151 +0x34
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:23 +0x14
created by os/signal.Notify.func1.1
        /usr/local/go/src/os/signal/signal.go:151 +0x28

goroutine 20 [select, 402 minutes]:
main.reloadLoop.func1()
        /go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:145 +0x8c
created by main.reloadLoop
        /go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:144 +0x1d8

goroutine 23 [runnable]:
github.com/influxdata/telegraf/agent.(*Agent).runOutputs(0x9f29be8, 0x99524c0)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:773 +0x2b8
github.com/influxdata/telegraf/agent.(*Agent).Run.func1()
        /go/src/github.com/influxdata/telegraf/agent/agent.go:150 +0x64
created by github.com/influxdata/telegraf/agent.(*Agent).Run
        /go/src/github.com/influxdata/telegraf/agent/agent.go:148 +0x4f0

goroutine 24 [semacquire, 402 minutes]:
sync.runtime_Semacquire(0x9bc12b8)
        /usr/local/go/src/runtime/sema.go:56 +0x34
sync.(*WaitGroup).Wait(0x9bc12b0)
        /usr/local/go/src/sync/waitgroup.go:136 +0x94
github.com/influxdata/telegraf/agent.(*Agent).runInputs(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0xc0a19fe14272aa50, 0x2b9fe094e, 0x7aa98b0}, 0x99524d0)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:330 +0x5d0
github.com/influxdata/telegraf/agent.(*Agent).Run.func5()
        /go/src/github.com/influxdata/telegraf/agent/agent.go:178 +0x80
created by github.com/influxdata/telegraf/agent.(*Agent).Run
        /go/src/github.com/influxdata/telegraf/agent/agent.go:176 +0x89c

goroutine 25 [select]:
github.com/influxdata/telegraf/agent.(*Agent).flushLoop(0x9f29be8, {0x52fad18, 0x9fb4a50}, 0x9f4c0f0, {0x52e5a80, 0x9fb4ab0})
        /go/src/github.com/influxdata/telegraf/agent/agent.go:818 +0x190
github.com/influxdata/telegraf/agent.(*Agent).runOutputs.func1(0x9f4c0f0)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:769 +0x138
created by github.com/influxdata/telegraf/agent.(*Agent).runOutputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:763 +0xa8

goroutine 26 [select]:
github.com/influxdata/telegraf/agent.(*RollingTicker).run(0x9fb4ab0, {0x52fad18, 0x9fb4ae0}, 0x9fb4b10)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:261 +0xdc
github.com/influxdata/telegraf/agent.(*RollingTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:249 +0x78
created by github.com/influxdata/telegraf/agent.(*RollingTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:247 +0x1a8

goroutine 27 [select]:
github.com/influxdata/telegraf/agent.(*AlignedTicker).run(0x982eac0, {0x52fad18, 0x9fb4b40}, 0x9fb4b70)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:86 +0xc4
github.com/influxdata/telegraf/agent.(*AlignedTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:64 +0x78
created by github.com/influxdata/telegraf/agent.(*AlignedTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:62 +0x18c

goroutine 28 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherLoop(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0x5307898, 0x9f54a20}, 0x9fb4810, {0x52e5a68, 0x982eac0}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:465 +0xd4
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0x9fb4810)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:326 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:324 +0x4c

goroutine 29 [select]:
github.com/influxdata/telegraf/agent.(*AlignedTicker).run(0x982ec40, {0x52fad18, 0x9fb4c00}, 0x9fb4c30)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:86 +0xc4
github.com/influxdata/telegraf/agent.(*AlignedTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:64 +0x78
created by github.com/influxdata/telegraf/agent.(*AlignedTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:62 +0x18c

goroutine 30 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce(0x9f29be8, {0x5307898, 0x9f54a80}, 0x9fb4840, {0x52e5a68, 0x982ec40}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:497 +0x1c4
github.com/influxdata/telegraf/agent.(*Agent).gatherLoop(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0x5307898, 0x9f54a80}, 0x9fb4840, {0x52e5a68, 0x982ec40}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:467 +0x12c
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0x9fb4840)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:326 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:324 +0x4c

goroutine 31 [select]:
github.com/influxdata/telegraf/agent.(*AlignedTicker).run(0x982ed80, {0x52fad18, 0x9fb4cc0}, 0x9fb4cf0)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:86 +0xc4
github.com/influxdata/telegraf/agent.(*AlignedTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:64 +0x78
created by github.com/influxdata/telegraf/agent.(*AlignedTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:62 +0x18c

goroutine 32 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherLoop(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0x5307898, 0x9f54ae0}, 0x9fb4870, {0x52e5a68, 0x982ed80}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:465 +0xd4
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0x9fb4870)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:326 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:324 +0x4c

goroutine 33 [select]:
github.com/influxdata/telegraf/agent.(*AlignedTicker).run(0x982ee80, {0x52fad18, 0x9fb4d80}, 0x9fb4db0)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:86 +0xc4
github.com/influxdata/telegraf/agent.(*AlignedTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:64 +0x78
created by github.com/influxdata/telegraf/agent.(*AlignedTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:62 +0x18c

goroutine 34 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce(0x9f29be8, {0x5307898, 0x9f54b40}, 0x9fb48a0, {0x52e5a68, 0x982ee80}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:497 +0x1c4
github.com/influxdata/telegraf/agent.(*Agent).gatherLoop(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0x5307898, 0x9f54b40}, 0x9fb48a0, {0x52e5a68, 0x982ee80}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:467 +0x12c
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0x9fb48a0)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:326 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:324 +0x4c

goroutine 35 [select]:
github.com/influxdata/telegraf/agent.(*AlignedTicker).run(0x982ef80, {0x52fad18, 0x9fb4e40}, 0x9fb4e70)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:86 +0xc4
github.com/influxdata/telegraf/agent.(*AlignedTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:64 +0x78
created by github.com/influxdata/telegraf/agent.(*AlignedTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:62 +0x18c

goroutine 36 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce(0x9f29be8, {0x5307898, 0x9f54ba0}, 0x9fb48d0, {0x52e5a68, 0x982ef80}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:497 +0x1c4
github.com/influxdata/telegraf/agent.(*Agent).gatherLoop(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0x5307898, 0x9f54ba0}, 0x9fb48d0, {0x52e5a68, 0x982ef80}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:467 +0x12c
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0x9fb48d0)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:326 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:324 +0x4c

goroutine 37 [select]:
github.com/influxdata/telegraf/agent.(*AlignedTicker).run(0x982f080, {0x52fad18, 0x9fb4f00}, 0x9fb4f30)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:86 +0xc4
github.com/influxdata/telegraf/agent.(*AlignedTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:64 +0x78
created by github.com/influxdata/telegraf/agent.(*AlignedTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:62 +0x18c

goroutine 38 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherLoop(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0x5307898, 0x9f54c00}, 0x9fb4900, {0x52e5a68, 0x982f080}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:465 +0xd4
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0x9fb4900)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:326 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:324 +0x4c

goroutine 39 [select]:
github.com/influxdata/telegraf/agent.(*AlignedTicker).run(0x982f1c0, {0x52fad18, 0x9fb4fc0}, 0x9fb4ff0)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:86 +0xc4
github.com/influxdata/telegraf/agent.(*AlignedTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:64 +0x78
created by github.com/influxdata/telegraf/agent.(*AlignedTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:62 +0x18c

goroutine 40 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherLoop(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0x5307898, 0x9f54c60}, 0x9fb4930, {0x52e5a68, 0x982f1c0}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:465 +0xd4
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0x9fb4930)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:326 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:324 +0x4c

goroutine 41 [select]:
github.com/influxdata/telegraf/agent.(*AlignedTicker).run(0x982f2c0, {0x52fad18, 0x9fb5080}, 0x9fb50b0)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:86 +0xc4
github.com/influxdata/telegraf/agent.(*AlignedTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:64 +0x78
created by github.com/influxdata/telegraf/agent.(*AlignedTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:62 +0x18c

goroutine 42 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce(0x9f29be8, {0x5307898, 0x9f54cc0}, 0x9fb4960, {0x52e5a68, 0x982f2c0}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:497 +0x1c4
github.com/influxdata/telegraf/agent.(*Agent).gatherLoop(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0x5307898, 0x9f54cc0}, 0x9fb4960, {0x52e5a68, 0x982f2c0}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:467 +0x12c
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0x9fb4960)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:326 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:324 +0x4c

goroutine 43 [select]:
github.com/influxdata/telegraf/agent.(*AlignedTicker).run(0x982f3c0, {0x52fad18, 0x9fb5140}, 0x9fb5170)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:86 +0xc4
github.com/influxdata/telegraf/agent.(*AlignedTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:64 +0x78
created by github.com/influxdata/telegraf/agent.(*AlignedTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:62 +0x18c

goroutine 44 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherLoop(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0x5307898, 0x9f54d20}, 0x9fb4990, {0x52e5a68, 0x982f3c0}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:465 +0xd4
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0x9fb4990)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:326 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:324 +0x4c

goroutine 45 [select]:
github.com/influxdata/telegraf/agent.(*AlignedTicker).run(0x982f4c0, {0x52fad18, 0x9fb5200}, 0x9fb5230)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:86 +0xc4
github.com/influxdata/telegraf/agent.(*AlignedTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:64 +0x78
created by github.com/influxdata/telegraf/agent.(*AlignedTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:62 +0x18c

goroutine 46 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce(0x9f29be8, {0x5307898, 0x9f54d80}, 0x9fb49c0, {0x52e5a68, 0x982f4c0}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:497 +0x1c4
github.com/influxdata/telegraf/agent.(*Agent).gatherLoop(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0x5307898, 0x9f54d80}, 0x9fb49c0, {0x52e5a68, 0x982f4c0}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:467 +0x12c
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0x9fb49c0)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:326 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:324 +0x4c

goroutine 47 [select]:
github.com/influxdata/telegraf/agent.(*AlignedTicker).run(0x982f640, {0x52fad18, 0x9fb52c0}, 0x9fb52f0)
        /go/src/github.com/influxdata/telegraf/agent/tick.go:86 +0xc4
github.com/influxdata/telegraf/agent.(*AlignedTicker).start.func1()
        /go/src/github.com/influxdata/telegraf/agent/tick.go:64 +0x78
created by github.com/influxdata/telegraf/agent.(*AlignedTicker).start
        /go/src/github.com/influxdata/telegraf/agent/tick.go:62 +0x18c

goroutine 48 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherLoop(0x9f29be8, {0x52fad18, 0x9f4fd70}, {0x5307898, 0x9f54de0}, 0x9fb49f0, {0x52e5a68, 0x982f640}, 0x2540be400)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:465 +0xd4
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0x9fb49f0)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:326 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /go/src/github.com/influxdata/telegraf/agent/agent.go:324 +0x4c

goroutine 56653 [runnable]:
syscall.Syscall(0x3, 0xa, 0x9cb3000, 0x200)
        /usr/local/go/src/syscall/asm_linux_arm.s:14 +0x8
syscall.read(0xa, {0x9cb3000, 0x200, 0x200})
        /usr/local/go/src/syscall/zsyscall_linux_arm.go:696 +0x48
syscall.Read(...)
        /usr/local/go/src/syscall/syscall_unix.go:188
internal/poll.ignoringEINTRIO(...)
        /usr/local/go/src/internal/poll/fd_unix.go:794
internal/poll.(*FD).Read(0x989a3c0, {0x9cb3000, 0x200, 0x200})
        /usr/local/go/src/internal/poll/fd_unix.go:163 +0x24c
os.(*File).read(...)
        /usr/local/go/src/os/file_posix.go:31
os.(*File).Read(0xa158658, {0x9cb3000, 0x200, 0x200})
        /usr/local/go/src/os/file.go:119 +0x70
os.ReadFile({0x487ba21, 0xa})
        /usr/local/go/src/os/file.go:699 +0x210
github.com/influxdata/telegraf/plugins/inputs/kernel.(*Kernel).getProcStat(0x9a80ef0)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/kernel/kernel.go:111 +0x94
github.com/influxdata/telegraf/plugins/inputs/kernel.(*Kernel).Gather(0x9a80ef0, {0x5307898, 0x9f54a80})
        /go/src/github.com/influxdata/telegraf/plugins/inputs/kernel/kernel.go:38 +0x1c
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x9fb4840, {0x5307898, 0x9f54a80})
        /go/src/github.com/influxdata/telegraf/models/running_input.go:118 +0x48
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
        /go/src/github.com/influxdata/telegraf/agent/agent.go:487 +0x34
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce
        /go/src/github.com/influxdata/telegraf/agent/agent.go:486 +0xe4

goroutine 10867 [select]:
net/http.(*persistConn).writeLoop(0x9d17e00)
        /usr/local/go/src/net/http/transport.go:2392 +0xd0
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1751 +0x14bc

goroutine 56656 [runnable]:
syscall.Syscall(0x3, 0xc, 0x9923000, 0x1000)
        /usr/local/go/src/syscall/asm_linux_arm.s:14 +0x8
syscall.read(0xc, {0x9923000, 0x1000, 0x1000})
        /usr/local/go/src/syscall/zsyscall_linux_arm.go:696 +0x48
syscall.Read(...)
        /usr/local/go/src/syscall/syscall_unix.go:188
internal/poll.ignoringEINTRIO(...)
        /usr/local/go/src/internal/poll/fd_unix.go:794
internal/poll.(*FD).Read(0x989a840, {0x9923000, 0x1000, 0x1000})
        /usr/local/go/src/internal/poll/fd_unix.go:163 +0x24c
os.(*File).read(...)
        /usr/local/go/src/os/file_posix.go:31
os.(*File).Read(0xa158a20, {0x9923000, 0x1000, 0x1000})
        /usr/local/go/src/os/file.go:119 +0x70
bufio.(*Reader).fill(0xa188afc)
        /usr/local/go/src/bufio/bufio.go:106 +0x10c
bufio.(*Reader).ReadSlice(0xa188afc, 0xa)
        /usr/local/go/src/bufio/bufio.go:371 +0x2c
bufio.(*Reader).collectFragments(0xa188afc, 0xa)
        /usr/local/go/src/bufio/bufio.go:446 +0x58
bufio.(*Reader).ReadString(0xa188afc, 0xa)
        /usr/local/go/src/bufio/bufio.go:494 +0x28
github.com/shirou/gopsutil/v3/internal/common.ReadLinesOffsetN({0x9cb5788, 0x11}, 0x0, 0xffffffff)
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/internal/common/common.go:130 +0x1dc
github.com/shirou/gopsutil/v3/internal/common.ReadLines(...)
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/internal/common/common.go:111
github.com/shirou/gopsutil/v3/disk.readMountFile({0xa078006, 0x7})
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/disk/disk_linux.go:242 +0x84
github.com/shirou/gopsutil/v3/disk.PartitionsWithContext({0x52fad38, 0x9856078}, 0x1)
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/disk/disk_linux.go:270 +0x104
github.com/shirou/gopsutil/v3/disk.Partitions(...)
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/disk/disk.go:77
github.com/influxdata/telegraf/plugins/inputs/system.(*SystemPSDisk).Partitions(0x7ac5804, 0x1)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/system/ps.go:212 +0x30
github.com/influxdata/telegraf/plugins/inputs/system.(*SystemPS).DiskUsage(0x9952490, {0x0, 0x0, 0x0}, {0x982e440, 0x7, 0x7})
        /go/src/github.com/influxdata/telegraf/plugins/inputs/system/ps.go:96 +0x34
github.com/influxdata/telegraf/plugins/inputs/disk.(*DiskStats).Gather(0x982e400, {0x5307898, 0x9f54d80})
        /go/src/github.com/influxdata/telegraf/plugins/inputs/disk/disk.go:55 +0x5c
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x9fb49c0, {0x5307898, 0x9f54d80})
        /go/src/github.com/influxdata/telegraf/models/running_input.go:118 +0x48
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
        /go/src/github.com/influxdata/telegraf/agent/agent.go:487 +0x34
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce
        /go/src/github.com/influxdata/telegraf/agent/agent.go:486 +0xe4

goroutine 56657 [runnable]:
syscall.Syscall(0xd9, 0xd, 0x9ed6000, 0x2000)
        /usr/local/go/src/syscall/asm_linux_arm.s:14 +0x8
syscall.Getdents(0xd, {0x9ed6000, 0x2000, 0x2000})
        /usr/local/go/src/syscall/zsyscall_linux_arm.go:439 +0x48
syscall.ReadDirent(...)
        /usr/local/go/src/syscall/syscall_linux.go:870
internal/poll.ignoringEINTRIO(...)
        /usr/local/go/src/internal/poll/fd_unix.go:794
internal/poll.(*FD).ReadDirent(0x989b180, {0x9ed6000, 0x2000, 0x2000})
        /usr/local/go/src/internal/poll/fd_unix.go:646 +0x180
os.(*File).readdir(0xa158b38, 0xffffffff, 0x0)
        /usr/local/go/src/os/dir_unix.go:70 +0x194
os.(*File).Readdirnames(0xa158b38, 0xffffffff)
        /usr/local/go/src/os/dir.go:70 +0x34
path/filepath.glob({0xa078410, 0xa}, {0x9cb57dd, 0x4}, {0xa164e00, 0xe, 0x10})
        /usr/local/go/src/path/filepath/match.go:337 +0x110
path/filepath.Glob({0x9cb57d0, 0x11})
        /usr/local/go/src/path/filepath/match.go:278 +0x3b0
github.com/influxdata/telegraf/plugins/inputs/processes.(*Processes).gatherFromProc(0x9f545e8, 0xa117060)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/processes/processes_notwindows.go:131 +0x54
github.com/influxdata/telegraf/plugins/inputs/processes.(*Processes).Gather(0x9f545e8, {0x5307898, 0x9f54b40})
        /go/src/github.com/influxdata/telegraf/plugins/inputs/processes/processes_notwindows.go:52 +0x64
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x9fb48a0, {0x5307898, 0x9f54b40})
        /go/src/github.com/influxdata/telegraf/models/running_input.go:118 +0x48
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
        /go/src/github.com/influxdata/telegraf/agent/agent.go:487 +0x34
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce
        /go/src/github.com/influxdata/telegraf/agent/agent.go:486 +0xe4

goroutine 56654 [runnable]:
syscall.Syscall6(0x14c, 0xffffff9c, 0x9cb57b8, 0x9fe8400, 0x80, 0x0, 0x0)
        /usr/local/go/src/syscall/asm_linux_arm.s:45 +0x8
syscall.readlinkat(0xffffff9c, {0xa078020, 0x10}, {0x9fe8400, 0x80, 0x80})
        /usr/local/go/src/syscall/zsyscall_linux_arm.go:100 +0x108
syscall.Readlink(...)
        /usr/local/go/src/syscall/syscall_linux.go:186
os.Readlink({0xa078020, 0x10})
        /usr/local/go/src/os/file_unix.go:376 +0x8c
github.com/shirou/gopsutil/v3/net.getProcInodes({0x486aed5, 0x5}, 0x3210, 0x0)
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/net/net_linux.go:563 +0x2a0
github.com/shirou/gopsutil/v3/net.getProcInodesAllWithContext({0x52fad38, 0x9856078}, {0x486aed5, 0x5}, 0x0)
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/net/net_linux.go:684 +0xb8
github.com/shirou/gopsutil/v3/net.connectionsPidMaxWithoutUidsWithContext({0x52fad38, 0x9856078}, {0x48690b1, 0x3}, 0x0, 0x0, 0x0)
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/net/net_linux.go:466 +0xd8
github.com/shirou/gopsutil/v3/net.ConnectionsPidMaxWithContext(...)
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/net/net_linux.go:450
github.com/shirou/gopsutil/v3/net.ConnectionsPidWithContext({0x52fad38, 0x9856078}, {0x48690b1, 0x3}, 0x0)
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/net/net_linux.go:433 +0x4c
github.com/shirou/gopsutil/v3/net.ConnectionsWithContext(...)
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/net/net_linux.go:395
github.com/shirou/gopsutil/v3/net.Connections({0x48690b1, 0x3})
        /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.22.3/net/net_linux.go:391 +0x40
github.com/influxdata/telegraf/plugins/inputs/system.(*SystemPS).NetConnections(0x9a81670)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/system/ps.go:180 +0x24
github.com/influxdata/telegraf/plugins/inputs/net.(*NetStats).Gather(0x9f29a28, {0x5307898, 0x9f54cc0})
        /go/src/github.com/influxdata/telegraf/plugins/inputs/net/netstat.go:27 +0x2c
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x9fb4960, {0x5307898, 0x9f54cc0})
        /go/src/github.com/influxdata/telegraf/models/running_input.go:118 +0x48
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
        /go/src/github.com/influxdata/telegraf/agent/agent.go:487 +0x34
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce
        /go/src/github.com/influxdata/telegraf/agent/agent.go:486 +0xe4

goroutine 10866 [IO wait]:
internal/poll.runtime_pollWait(0x66c1c258, 0x72)
        /usr/local/go/src/runtime/netpoll.go:302 +0x54
internal/poll.(*pollDesc).wait(0x9f4d874, 0x72, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:83 +0x30
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Read(0x9f4d860, {0x98ca900, 0x894, 0x894})
        /usr/local/go/src/internal/poll/fd_unix.go:167 +0x210
net.(*netFD).Read(0x9f4d860, {0x98ca900, 0x894, 0x894})
        /usr/local/go/src/net/fd_posix.go:55 +0x38
net.(*conn).Read(0x9f29e68, {0x98ca900, 0x894, 0x894})
        /usr/local/go/src/net/net.go:183 +0x48
crypto/tls.(*atLeastReader).Read(0xa138000, {0x98ca900, 0x894, 0x894})
        /usr/local/go/src/crypto/tls/conn.go:784 +0x7c
bytes.(*Buffer).ReadFrom(0xa07476c, {0x52cc980, 0xa138000})
        /usr/local/go/src/bytes/buffer.go:204 +0xa4
crypto/tls.(*Conn).readFromUntil(0xa074600, {0x52d5328, 0x9f29e68}, 0x5)
        /usr/local/go/src/crypto/tls/conn.go:806 +0xd4
crypto/tls.(*Conn).readRecordOrCCS(0xa074600, 0x0)
        /usr/local/go/src/crypto/tls/conn.go:613 +0x11c
crypto/tls.(*Conn).readRecord(...)
        /usr/local/go/src/crypto/tls/conn.go:581
crypto/tls.(*Conn).Read(0xa074600, {0xa0b2000, 0x1000, 0x1000})
        /usr/local/go/src/crypto/tls/conn.go:1284 +0x168
net/http.(*persistConn).Read(0x9d17e00, {0xa0b2000, 0x1000, 0x1000})
        /usr/local/go/src/net/http/transport.go:1929 +0x16c
bufio.(*Reader).fill(0x9f4ea80)
        /usr/local/go/src/bufio/bufio.go:106 +0x10c
bufio.(*Reader).Peek(0x9f4ea80, 0x1)
        /usr/local/go/src/bufio/bufio.go:144 +0x68
net/http.(*persistConn).readLoop(0x9d17e00)
        /usr/local/go/src/net/http/transport.go:2093 +0x190
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1750 +0x146c

trap    0xe
error   0x817
oldmask 0x0
r0      0x0
r1      0x0
r2      0x9c9d000
r3      0x85383
r4      0x0
r5      0xa1841e0
r6      0xa184228
r7      0x0
r8      0x5
r9      0x2
r10     0xa1841e0
fp      0x85ec4
ip      0x98fbb2c
sp      0x9c9d418
lr      0x0
pc      0x85ecc
cpsr    0x60000010
fault   0x38

System info

Linux 5.13.6-kirkwood-tld-1 #1.0 PREEMPT Sat Jul 31 22:10:39 PDT 2021 armv5tel GNU/Linux // Telegraf 1.22.4 (git: HEAD acf6706)

Docker

No response

Steps to reproduce

  1. Using the sample config file from installation
  2. Start telegraf and let it run
  3. Eventually in a few hours it will crash

Additional info

/proc/cpuinfo
processor       : 0
model name      : Feroceon 88FR131 rev 1 (v5l)
BogoMIPS        : 333.33
Features        : swp half thumb fastmult edsp
CPU implementer : 0x56
CPU architecture: 5TE
CPU variant     : 0x2
CPU part        : 0x131
CPU revision    : 1

Hardware        : Marvell Kirkwood (Flattened Device Tree)
Revision        : 0000
Serial          : 0000000000000000

Expected behavior

It should not crash

Actual behavior

It crashes from within an hour to a few hours.

Additional info

In my example log above, it crashes at 21:40. Telegraf was started at 15:36.

Only occasionally a few warnings about "Collection took longer than expected; not complete after interval of 10s" And the last log entry before it crashed was 17:00... And there was nothing in between.

@Derek-K Derek-K added the bug unexpected problem or unintended behavior label Jun 13, 2022
@powersj
Copy link
Contributor

powersj commented Jun 13, 2022

armv5tel
model name : Feroceon 88FR131 rev 1 (v5l)

Interesting, is this a NAS device?

0x09c9d398: 0x5f787833 0x00687465 <github.com/jhump/protoreflect/desc/protoparse.setOptionField+0x00000051> 0x00000000 0x00000000

While the last message is about eth0, I do no believe the seg fault is from the ethtool plugin. The protoreflect library is used by the XPath parser.

Could you please include the rest of your config? It would be good to identify which plugin is causing this to figure out what to do next.

@powersj powersj added the waiting for response waiting for response from contributor label Jun 13, 2022
@Derek-K
Copy link
Author

Derek-K commented Jun 13, 2022

This is a Pogoplug Mobile

The config is just the generic telegraf.conf from install, nothing fancy. Just tracking regular CPU, mem usage etc to a remote InfluxDB. As I have done with many linux servers before.

The only 'difference' is this time I am doing it on an ARM processor with only 128MB of RAM... :/

I tried dpkg-reconfigure it but the result was the same.

I have commented out some lines in /etc/sysctl.d/sysctl.conf for now (network related) so far has been running for about 10 hours without issues.

#net.core.rmem_max = 2801664
#net.core.wmem_max = 2097152

#net.ipv4.tcp_rmem = 4096        87380   2801664
#net.ipv4.tcp_wmem = 4096        16384   2097152

#net.ipv4.tcp_timestamps = 0

#net.core.optmem_max = 65535

#net.core.netdev_max_backlog = 5000

vm.swappiness = 10 # Default is 60
vm.vfs_cache_pressure = 1000 # Default is 100

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jun 13, 2022
@powersj
Copy link
Contributor

powersj commented Jun 13, 2022

Pogoplug Mobile
generic telegraf.conf from install

And your output is influxdb?

As of right now, I am not sure what we can do. As a next step can you please try to narrow down which plugin is causing the crash?

@powersj powersj added the waiting for response waiting for response from contributor label Jun 13, 2022
@Derek-K
Copy link
Author

Derek-K commented Jun 13, 2022

Correct. My only output is InfluxDB.

[global_tags]

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  hostname = ""
  omit_hostname = false
[[outputs.influxdb]]
  urls = ["https://xxx.xxx.xxx.xxx:xxxx"]
  database = "xxxxxxxxxx"

  skip_database_creation = true
  username = "xxxxxxxxxx"
  password = "xxxxxxxxxx"

And every time it crashes, the last entry always seems to be "[inputs.ethtool] Error in plugin: eth0 driver: operation not permitted". Hence I am suspecting is something to do with the network, that's why I looked into sysctl.conf and commented out those network related overrides.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jun 13, 2022
@powersj
Copy link
Contributor

powersj commented Jun 13, 2022

And every time it crashes, the last entry always seems to be "[inputs.ethtool] Error in plugin: eth0 driver: operation not permitted". Hence I am suspecting is something to do with the network, that's why I looked into sysctl.conf and commented out those network related overrides.

ok

so far has been running for about 10 hours without issues.

If this does ultimately fail, my request would be that you start reducing the number of input plugins, start with ethtool and see if it reproduces. As I said above, I am not certain what our next step is from the project's side.

@powersj powersj added the waiting for response waiting for response from contributor label Jun 13, 2022
@Derek-K
Copy link
Author

Derek-K commented Jun 14, 2022

Darn, still crashed last night with the same error message. But this time it didn't crash right away.

2022-06-13T15:09:50Z E! [inputs.ethtool] Error in plugin: eth0 driver: operation not permitted
2022-06-13T19:46:20Z E! [inputs.ethtool] Error in plugin: eth0 driver: operation not permitted
2022-06-13T20:40:10Z E! [inputs.ethtool] Error in plugin: eth0 driver: operation not permitted
2022-06-13T21:56:10Z E! [inputs.ethtool] Error in plugin: eth0 driver: operation not permitted
runtime: unexpected return pc for runtime.sigpanic called from 0x0
stack: frame={sp:0x91aec18, fp:0x91aec30} stack=[0x91ae800,0x91af000)
0x091aeb98:  0x0004f9fc <runtime.addOneOpenDeferFrame.func1+0x00000000>  0x0004ecc8 <runtime.panicmem+0x0000004c>  0x091aec0c  0x08c9
c5a0
0x091aeba8:  0x00000000  0x0004ecc8 <runtime.panicmem+0x0000004c>  0x08c9c5a0  0x0004ecc8 <runtime.panicmem+0x0000004c>
0x091aebb8:  0x091aec0c  0x776b7201  0x2d646f6f  0x2d646c74
0x091aebc8:  0x00000031  0x00000000  0x00000000  0x08c9c5a0
0x091aebd8:  0x00000000  0x00000000  0x00000000  0x00000000
0x091aebe8:  0x00000000  0x08c9c5b0  0x00000000  0x04246750
0x091aebf8:  0x07a6bdb8  0x00000000  0x00000000  0x00000000
0x091aec08:  0x00000000  0x0006a598 <runtime.sigpanic+0x000001e8>  0x04246750  0x07a6bdb8
0x091aec18: <0x00000000  0x08c9c5a0  0x00000001  0x00000000
0x091aec28:  0x08c9c5a0  0x08c9c5a0 >0x00000000  0x00000000
0x091aec38:  0x00000000  0x00000000  0x00000000  0x00000028
0x091aec48:  0x000009b4  0x00000000  0x00000000  0x00000000
0x091aec58:  0x0911b000  0x00113d58 <os.(*File).Read+0x00000070>  0x0000000a  0x0911b000
0x091aec68:  0x00001000  0x00001000  0x00000000  0x00000000
0x091aec78:  0x00000000  0x01000000 <github.com/miekg/dns.(*NID).unpack+0x00000034>  0x0000000a  0x00001000
0x091aec88:  0x08b70cd4  0x0010942c <internal/poll.(*FD).Read.func1+0x00000000>  0x08b70cc0  0x091aec8c
0x091aec98:  0x001ae330 <bufio.(*Reader).fill+0x0000010c>  0x08b70cc0  0x0911b000  0x00001000
0x091aeca8:  0x00001000  0x00000000
fatal error: unknown caller pc

runtime stack:
runtime.throw({0x48a9ca9, 0x11})
        /usr/local/go/src/runtime/panic.go:992 +0x5c
runtime.gentraceback(0x4ecc8, 0x91aec0c, 0x0, 0x8c9c5a0, 0x0, 0x0, 0x7fffffff, 0x8f91fdc, 0x0, 0x0)
        /usr/local/go/src/runtime/traceback.go:254 +0x172c
runtime.addOneOpenDeferFrame.func1()
        /usr/local/go/src/runtime/panic.go:599 +0x8c
runtime.systemstack()
        /usr/local/go/src/runtime/asm_arm.s:317 +0x60

goroutine 200699 [running]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_arm.s:274 +0x4 fp=0x91aeb90 sp=0x91aeb8c pc=0x88724
.... 

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jun 14, 2022
@powersj
Copy link
Contributor

powersj commented Jun 14, 2022

Can you please grab go env and uname -a from the system?

The new stack is a different stack from the first crash. It is still not clear that this is specific to code in Telegraf and not either a) system related and/or b) related to go itself. Per golang/go#44096 there is a comment that "The call to runtime.addOneOpenDeferFrame in the traceback suggests corruption of the defer stack".

Given you saw an improvement with vm.swappiness = 10 it does seem to indicate that reducing when swapping happens may have improved the situation. That points at something system related as well and I would not be surprised if that is the case with telegraf doing too much on the little system with limited memory.

I would suggest the following:

  1. reduce the number of plugins used
  2. reduce the interval
  3. collect and post any further traces you see

@powersj powersj added the waiting for response waiting for response from contributor label Jun 14, 2022
@Derek-K
Copy link
Author

Derek-K commented Jun 15, 2022

I don't have 'go' installed (?)

-bash: go: command not found

and 'uname -a'

Linux mybox 5.13.6-kirkwood-tld-1 #1.0 PREEMPT Sat Jul 31 22:10:39 PDT 2021 armv5tel GNU/Linux

I agree with you, this time the stack is different. And right now I turned off swap completely and let's see how long it last without crashing.

As for the 'improvement' vm.swappiness is always 10. The only difference is I am back to the default values for anything that's network-related.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jun 15, 2022
@powersj powersj added the waiting for response waiting for response from contributor label Jun 15, 2022
@telegraf-tiger
Copy link
Contributor

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!

@jidongli1
Copy link

I met the same problem. :(
I wonder if the problem was fixed in the latest version.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants