Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metricbeat: using all of a single core #2747

Closed
kevinkluge opened this issue Oct 11, 2016 · 4 comments · Fixed by #2753
Closed

metricbeat: using all of a single core #2747

kevinkluge opened this issue Oct 11, 2016 · 4 comments · Fixed by #2753
Labels

Comments

@kevinkluge
Copy link
Member

Version 5.0.0 RC1
MacOS 10.12

With local kibana 5.0 RC1 and Elasticsearch 5.0 RC1, I started Kibana, then ES, then imported metricbeat dashboard, then started metricbeat. After 30-60 minutes, metricbeat was taking all of a single core. I used the default configuration. Nothing unusual was in the logs.

I killed metricbeat with SIGQUIT and got the following.

goroutine 0 [idle]:
runtime.mach_semaphore_wait(0x1103, 0xe, 0x0, 0x7fff5fbff6f8, 0x4029d01, 0x494c6e0, 0x7fff5fbff720, 0x40555f3, 0xffffffffffffffff, 0xc42005e000, ...)
    /usr/local/go/src/runtime/sys_darwin_amd64.s:418 +0xb
runtime.semasleep1(0xffffffffffffffff, 0xc42005e000)
    /usr/local/go/src/runtime/os_darwin.go:435 +0x4b
runtime.semasleep.func1()
    /usr/local/go/src/runtime/os_darwin.go:451 +0x33
runtime.systemstack(0x7fff5fbff748)
    /usr/local/go/src/runtime/asm_amd64.s:314 +0xab
runtime.semasleep(0xffffffffffffffff, 0x0)
    /usr/local/go/src/runtime/os_darwin.go:452 +0x44
runtime.notesleep(0x494d030)
    /usr/local/go/src/runtime/lock_sema.go:166 +0x9f
runtime.stopm()
    /usr/local/go/src/runtime/proc.go:1594 +0xad
runtime.findrunnable(0xc42001b500, 0x0)
    /usr/local/go/src/runtime/proc.go:2021 +0x228
runtime.schedule()
    /usr/local/go/src/runtime/proc.go:2120 +0x14c
runtime.park_m(0xc42007c1a0)
    /usr/local/go/src/runtime/proc.go:2183 +0x123
runtime.mcall(0x7fff5fbff940)
    /usr/local/go/src/runtime/asm_amd64.s:240 +0x5b

goroutine 1 [semacquire, 138 minutes]:
sync.runtime_Semacquire(0xc420180e3c)
    /usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*WaitGroup).Wait(0xc420180e30)
    /usr/local/go/src/sync/waitgroup.go:131 +0x97
github.com/elastic/beats/metricbeat/beater.(*Metricbeat).Run(0xc420017860, 0xc4201ec000, 0xc4201ec0a8, 0xb)
    /go/src/github.com/elastic/beats/metricbeat/beater/metricbeat.go:72 +0x246
github.com/elastic/beats/libbeat/beat.(*Beat).launch(0xc4201ec000, 0x46c1b78, 0x0, 0x0)
    /go/src/github.com/elastic/beats/libbeat/beat/beat.go:208 +0x687
github.com/elastic/beats/libbeat/beat.Run(0x466792b, 0xa, 0x0, 0x0, 0x46c1b78, 0xc42004bf90, 0xc4200001a0)
    /go/src/github.com/elastic/beats/libbeat/beat/beat.go:133 +0x65
main.main()
    /go/src/github.com/elastic/beats/metricbeat/main.go:23 +0x57

goroutine 17 [syscall, 138 minutes, locked to thread]:
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

goroutine 4 [syscall, 9 minutes]:
os/signal.signal_recv(0x4923080)
    /usr/local/go/src/runtime/sigqueue.go:116 +0x157
os/signal.loop()
    /usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
    /usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 29 [chan receive]:
github.com/elastic/beats/libbeat/logp.logExpvars(0xc4201ec0e8)
    /go/src/github.com/elastic/beats/libbeat/logp/logp.go:224 +0x145
created by github.com/elastic/beats/libbeat/logp.Init
    /go/src/github.com/elastic/beats/libbeat/logp/logp.go:137 +0x294

goroutine 31 [select, 9 minutes]:
github.com/elastic/beats/libbeat/publisher.(*messageWorker).run(0xc420302b90)
    /go/src/github.com/elastic/beats/libbeat/publisher/worker.go:63 +0x2a6
created by github.com/elastic/beats/libbeat/publisher.(*messageWorker).init
    /go/src/github.com/elastic/beats/libbeat/publisher/worker.go:57 +0x103

goroutine 50 [select]:
github.com/elastic/beats/libbeat/publisher.(*bulkWorker).run(0xc42049c000)
    /go/src/github.com/elastic/beats/libbeat/publisher/bulk.go:54 +0x34d
created by github.com/elastic/beats/libbeat/publisher.newBulkWorker
    /go/src/github.com/elastic/beats/libbeat/publisher/bulk.go:42 +0x225

goroutine 7 [select, 138 minutes, locked to thread]:
runtime.gopark(0x46c2830, 0x0, 0x466387d, 0x6, 0x18, 0x2)
    /usr/local/go/src/runtime/proc.go:259 +0x13a
runtime.selectgoImpl(0xc420088730, 0x0, 0x18)
    /usr/local/go/src/runtime/select.go:423 +0x11d9
runtime.selectgo(0xc420088730)
    /usr/local/go/src/runtime/select.go:238 +0x1c
runtime.ensureSigM.func1()
    /usr/local/go/src/runtime/signal1_unix.go:304 +0x2d1
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

goroutine 15 [runnable]:
bytes.(*Buffer).Truncate(0xc4201ea070, 0x0)
    /usr/local/go/src/bytes/buffer.go:67
bytes.(*Buffer).ReadByte(0xc4201ea070, 0xc4201ea000, 0x491d180, 0xc420074020)
    /usr/local/go/src/bytes/buffer.go:300 +0x8c
github.com/elastic/beats/vendor/github.com/elastic/gosigar.kern_procargs(0x552b, 0x0, 0xc4204b17b0, 0x0, 0xc4204b17d8, 0x4365f53)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/gosigar/sigar_darwin.go:370 +0x2c4
github.com/elastic/beats/vendor/github.com/elastic/gosigar.(*ProcArgs).Get(0xc4204b1868, 0x552b, 0x0, 0x0)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/gosigar/sigar_darwin.go:322 +0x88
github.com/elastic/beats/metricbeat/module/system/process.(*Process).getDetails(0xc4206781c0, 0x0, 0x0, 0x1, 0xc42021c960)
    /go/src/github.com/elastic/beats/metricbeat/module/system/process/helper.go:80 +0x442
github.com/elastic/beats/metricbeat/module/system/process.(*ProcStats).GetProcStats(0xc42006e140, 0xffffffffffffffff, 0x3f00000000000000, 0x8, 0x100000000000002, 0x0)
    /go/src/github.com/elastic/beats/metricbeat/module/system/process/helper.go:313 +0x26b
github.com/elastic/beats/metricbeat/module/system/process.(*MetricSet).Fetch(0xc420012180, 0x1ee5c11c, 0x494bfc0, 0x400dfef, 0x494ff00, 0x4d59af0)
    /go/src/github.com/elastic/beats/metricbeat/module/system/process/process.go:80 +0x44
github.com/elastic/beats/metricbeat/beater.(*metricSetWrapper).multiEventFetch(0xc420476b40, 0x4d59af0, 0xc420012180, 0xc4204b1c90, 0x1, 0xc42020c180, 0x40, 0xc42021beac)
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:246 +0x6c
github.com/elastic/beats/metricbeat/beater.(*metricSetWrapper).fetch(0xc420476b40, 0xc4204786c0, 0xc420478c00, 0x0, 0x0)
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:208 +0x331
github.com/elastic/beats/metricbeat/beater.(*metricSetWrapper).startFetching(0xc420476b40, 0xc4204786c0, 0xc420478c00)
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:182 +0x30b
github.com/elastic/beats/metricbeat/beater.(*ModuleWrapper).Start.func1(0xc420180e20, 0xc4204786c0, 0xc420478c00, 0xc420476b40)
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:137 +0x67
created by github.com/elastic/beats/metricbeat/beater.(*ModuleWrapper).Start
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:138 +0x1f9

goroutine 16 [semacquire, 138 minutes]:
sync.runtime_Semacquire(0xc420180e2c)
    /usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*WaitGroup).Wait(0xc420180e20)
    /usr/local/go/src/sync/waitgroup.go:131 +0x97
github.com/elastic/beats/metricbeat/beater.(*ModuleWrapper).Start.func2(0xc420180e20, 0xc420478c00)
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:143 +0x2d
created by github.com/elastic/beats/metricbeat/beater.(*ModuleWrapper).Start
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:145 +0x253

goroutine 66 [semacquire, 138 minutes]:
sync.runtime_Semacquire(0xc420180e4c)
    /usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*WaitGroup).Wait(0xc420180e40)
    /usr/local/go/src/sync/waitgroup.go:131 +0x97
github.com/elastic/beats/metricbeat/beater.PublishChannels(0x4923b80, 0xc420012bc0, 0xc420084050, 0x1, 0x1)
    /go/src/github.com/elastic/beats/metricbeat/beater/publish.go:35 +0x117
github.com/elastic/beats/metricbeat/beater.(*Metricbeat).Run.func1(0xc420180e30, 0xc420017860, 0xc420084050, 0x1, 0x1)
    /go/src/github.com/elastic/beats/metricbeat/beater/metricbeat.go:68 +0x82
created by github.com/elastic/beats/metricbeat/beater.(*Metricbeat).Run
    /go/src/github.com/elastic/beats/metricbeat/beater/metricbeat.go:69 +0x235

goroutine 67 [chan receive, 9 minutes]:
github.com/elastic/beats/metricbeat/beater.PublishChannels.func1(0xc420478c00)
    /go/src/github.com/elastic/beats/metricbeat/beater/publish.go:24 +0x8b
created by github.com/elastic/beats/metricbeat/beater.PublishChannels
    /go/src/github.com/elastic/beats/metricbeat/beater/publish.go:32 +0xf5

goroutine 85 [IO wait, 9 minutes]:
net.runtime_pollWait(0x4efb4a8, 0x72, 0x7)
    /usr/local/go/src/runtime/netpoll.go:160 +0x59
net.(*pollDesc).wait(0xc4200a4760, 0x72, 0xc420036970, 0xc420074080)
    /usr/local/go/src/net/fd_poll_runtime.go:73 +0x38
net.(*pollDesc).waitRead(0xc4200a4760, 0x4920bc0, 0xc420074080)
    /usr/local/go/src/net/fd_poll_runtime.go:78 +0x34
net.(*netFD).Read(0xc4200a4700, 0xc42027c000, 0x1000, 0x1000, 0x0, 0x4920bc0, 0xc420074080)
    /usr/local/go/src/net/fd_unix.go:243 +0x1a1
net.(*conn).Read(0xc42002a360, 0xc42027c000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:173 +0x70
github.com/elastic/beats/libbeat/outputs/transport.(*statsConn).Read(0xc4201f5040, 0xc42027c000, 0x1000, 0x1000, 0xc4200a6000, 0xc420036bbd, 0xc420036af8)
    /go/src/github.com/elastic/beats/libbeat/outputs/transport/stats.go:24 +0x55
net/http.(*persistConn).Read(0xc4204a0500, 0xc42027c000, 0x1000, 0x1000, 0x42939d0, 0xc420036b58, 0x4005c4d)
    /usr/local/go/src/net/http/transport.go:1261 +0x154
bufio.(*Reader).fill(0xc42018b020)
    /usr/local/go/src/bufio/bufio.go:97 +0x10c
bufio.(*Reader).Peek(0xc42018b020, 0x1, 0xc420036bbd, 0x1, 0x1, 0xc4204ca0c0, 0x0)
    /usr/local/go/src/bufio/bufio.go:129 +0x62
net/http.(*persistConn).readLoop(0xc4204a0500)
    /usr/local/go/src/net/http/transport.go:1418 +0x1a1
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:1062 +0x4e9

goroutine 86 [select, 9 minutes]:
net/http.(*persistConn).writeLoop(0xc4204a0500)
    /usr/local/go/src/net/http/transport.go:1646 +0x3bd
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:1063 +0x50e


@tsg tsg added bug Metricbeat Metricbeat labels Oct 11, 2016
@tsg
Copy link
Contributor

tsg commented Oct 11, 2016

This is probably the thread taking the CPU:

goroutine 15 [runnable]:
bytes.(*Buffer).Truncate(0xc4201ea070, 0x0)
    /usr/local/go/src/bytes/buffer.go:67
bytes.(*Buffer).ReadByte(0xc4201ea070, 0xc4201ea000, 0x491d180, 0xc420074020)
    /usr/local/go/src/bytes/buffer.go:300 +0x8c
github.com/elastic/beats/vendor/github.com/elastic/gosigar.kern_procargs(0x552b, 0x0, 0xc4204b17b0, 0x0, 0xc4204b17d8, 0x4365f53)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/gosigar/sigar_darwin.go:370 +0x2c4
github.com/elastic/beats/vendor/github.com/elastic/gosigar.(*ProcArgs).Get(0xc4204b1868, 0x552b, 0x0, 0x0)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/gosigar/sigar_darwin.go:322 +0x88
github.com/elastic/beats/metricbeat/module/system/process.(*Process).getDetails(0xc4206781c0, 0x0, 0x0, 0x1, 0xc42021c960)
    /go/src/github.com/elastic/beats/metricbeat/module/system/process/helper.go:80 +0x442
github.com/elastic/beats/metricbeat/module/system/process.(*ProcStats).GetProcStats(0xc42006e140, 0xffffffffffffffff, 0x3f00000000000000, 0x8, 0x100000000000002, 0x0)
    /go/src/github.com/elastic/beats/metricbeat/module/system/process/helper.go:313 +0x26b
github.com/elastic/beats/metricbeat/module/system/process.(*MetricSet).Fetch(0xc420012180, 0x1ee5c11c, 0x494bfc0, 0x400dfef, 0x494ff00, 0x4d59af0)
    /go/src/github.com/elastic/beats/metricbeat/module/system/process/process.go:80 +0x44
github.com/elastic/beats/metricbeat/beater.(*metricSetWrapper).multiEventFetch(0xc420476b40, 0x4d59af0, 0xc420012180, 0xc4204b1c90, 0x1, 0xc42020c180, 0x40, 0xc42021beac)
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:246 +0x6c
github.com/elastic/beats/metricbeat/beater.(*metricSetWrapper).fetch(0xc420476b40, 0xc4204786c0, 0xc420478c00, 0x0, 0x0)
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:208 +0x331
github.com/elastic/beats/metricbeat/beater.(*metricSetWrapper).startFetching(0xc420476b40, 0xc4204786c0, 0xc420478c00)
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:182 +0x30b
github.com/elastic/beats/metricbeat/beater.(*ModuleWrapper).Start.func1(0xc420180e20, 0xc4204786c0, 0xc420478c00, 0xc420476b40)
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:137 +0x67
created by github.com/elastic/beats/metricbeat/beater.(*ModuleWrapper).Start
    /go/src/github.com/elastic/beats/metricbeat/beater/module.go:138 +0x1f9

@kevinkluge to guide us during reproducing, did you suspend to ram during that time?

@tsg
Copy link
Contributor

tsg commented Oct 11, 2016

Ok, that gosigar function is missing some error checks, which could explain an infinite loop. I'm going to do some refactoring on it.

I couldn't reproduce the issue yet on my computer, but i didn't upgrade to Sierra yet.

tsg pushed a commit to tsg/gosigar that referenced this issue Oct 11, 2016
Errors were ignored in fairly dangerous places, which could result
in infinite loops. Potential fix for elastic/beats#2747.
tsg pushed a commit to tsg/beats that referenced this issue Oct 11, 2016
In an attempt to fix elastic#2747.
monicasarbu pushed a commit that referenced this issue Oct 11, 2016
In an attempt to fix #2747.
tsg added a commit to tsg/beats that referenced this issue Oct 11, 2016
ruflin pushed a commit that referenced this issue Oct 12, 2016
@tsg
Copy link
Contributor

tsg commented Oct 17, 2016

@kevinkluge I never could reproduce this on my computer, even after upgrading to Sierra. Would it be possible to try the latest build and see if this seems fixed? I do hope it was caused by the bad error handling, which is fixed by the referenced PRs. Thanks!

@kevinkluge
Copy link
Member Author

@tsg I've been running the latest metricbeat for a few hours on my laptop without seeing 100% cpu. I'll leave metricbeat running and re-open this issue if I see the problem again. Otherwise, I assume it's fixed as well as it was easily reproducible for me with the earlier build.

leweafan pushed a commit to leweafan/beats that referenced this issue Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants