Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat] Crashing on concurrent map write #6616

Closed
andrewkroh opened this issue Mar 21, 2018 · 7 comments · Fixed by #6950
Closed

[Heartbeat] Crashing on concurrent map write #6616

andrewkroh opened this issue Mar 21, 2018 · 7 comments · Fixed by #6950
Assignees

Comments

@andrewkroh
Copy link
Member

Heartbeat is crashing after ~20 minutes with fatal map errors ("concurrent map writes" or "concurrent map iteration and map write").

For confirmed bugs, please report:

heartbeat.monitors:

- type: tcp
  hosts: ["s1234", "s1234", "s1234", "s1234", "s1234", "s1234"]
  ports: [5555,9000]
  schedule: '@every 180s'
  timeout: 30s

- type: tcp
  hosts: ["s1234", "s12345"]
  ports: [12503]
  schedule: '@every 180s'
  timeout: 30s

#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
hostname: smonitor

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5045"]
@tsg
Copy link
Contributor

tsg commented Mar 29, 2018

@ruflin FYI, I added this one to the Heartbeat GA checklist.

@JoostV
Copy link

JoostV commented Apr 3, 2018

RHEL 7 64bit:

2018-04-03T15:03:03.282+0200    DEBUG   [publish]       pipeline/processor.go:275       Publish event: {
  "@timestamp": "2018-04-03T13:03:03.269Z",
  "@metadata": {
    "beat": "heartbeat",
    "type": "doc",
    "version": "6.2.3"
  },
  "tcp": {
    "port": 7224,
    "rtt": {
      "connect": {
        "us": 641
      }
    }
  },
  "monitor": {
    "ip": "127.0.0.1",
    "duration": {
      "us": 13544
    },
    "status": "up",
    "id": "tcp-tcp@localhost:[7224 7226 7228 7230]",
    "host": "localhost",
    "scheme": "tcp",
    "name": "tcp",
    "type": "tcp"
  },
  "resolve": {
    "host": "localhost",
    "ip": "127.0.0.1",
    "rtt": {
      "us": 12174
    }
  },
  "type": "monitor",
  "beat": {
    "name": "xx",
    "hostname": "xx",
    "version": "6.2.3"
  }
}
2018-04-03T15:03:03.282+0200    DEBUG   [scheduler]     scheduler/scheduler.go:208      Job 'tcp-tcp@localhost:[7224 7226 7228 7230]' returned at 2018-04-03 15:03:03.282935583 +0200 CEST m=+545.026809274 (cont=0).
2018-04-03T15:03:03.282+0200    DEBUG   [scheduler]     scheduler/scheduler.go:170      Next wakeup time: 2018-04-03 15:03:08.268937414 +0200 CEST
fatal error: concurrent map writes

goroutine 1232 [running]:
runtime.throw(0x168937c, 0x15)
        /usr/local/go/src/runtime/panic.go:605 +0x95 fp=0xc4202cba98 sp=0xc4202cba78 pc=0xac4945
runtime.mapassign_faststr(0x1619100, 0xc42012ad50, 0x167c0a4, 0x4, 0xc42030ce38)
        /usr/local/go/src/runtime/hashmap_fast.go:607 +0x4f5 fp=0xc4202cbb18 sp=0xc4202cba98 pc=0xaa5f75
github.com/elastic/beats/libbeat/common.MapStr.DeepUpdate(0xc42012ad50, 0xc420408270)
        /go/src/github.com/elastic/beats/libbeat/common/mapstr.go:58 +0x12c fp=0xc4202cbbf0 sp=0xc4202cbb18 pc=0xe1501c
github.com/elastic/beats/libbeat/common.deepUpdateValue(0x1619100, 0xc42012ad50, 0xc420408270, 0x7, 0xc4205233b8)
        /go/src/github.com/elastic/beats/libbeat/common/mapstr.go:70 +0xa3 fp=0xc4202cbc20 sp=0xc4202cbbf0 pc=0xe15313
github.com/elastic/beats/libbeat/common.MapStr.DeepUpdate(0xc42012b110, 0xc4204082d0)
        /go/src/github.com/elastic/beats/libbeat/common/mapstr.go:56 +0x1b6 fp=0xc4202cbcf8 sp=0xc4202cbc20 pc=0xe150a6
github.com/elastic/beats/heartbeat/monitors.annotated.func1(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc420493e98, 0xc420146778, 0xaf06f0, ...)
        /go/src/github.com/elastic/beats/heartbeat/monitors/util.go:133 +0x727 fp=0xc4202cbe40 sp=0xc4202cbcf8 pc=0xe3e1a7
github.com/elastic/beats/heartbeat/beater.(*monitorTask).prepareSchedulerJob.func1(0x10, 0x16c1038, 0xc4202de870)
        /go/src/github.com/elastic/beats/heartbeat/beater/manager.go:287 +0x9d fp=0xc4202cbf88 sp=0xc4202cbe40 pc=0xe8914d
github.com/elastic/beats/heartbeat/scheduler.(*Scheduler).runTask.func1(0xc4202de870, 0xc4203c8f00, 0xc4202de870, 0xc42012b0b0)
        /go/src/github.com/elastic/beats/heartbeat/scheduler/scheduler.go:322 +0x5b fp=0xc4202cbfc0 sp=0xc4202cbf88 pc=0xe70b2b
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc4202cbfc8 sp=0xc4202cbfc0 pc=0xaf65c1
created by github.com/elastic/beats/heartbeat/scheduler.(*Scheduler).runTask
        /go/src/github.com/elastic/beats/heartbeat/scheduler/scheduler.go:312 +0x63

goroutine 1 [chan receive, 8 minutes]:
github.com/elastic/beats/heartbeat/beater.(*Heartbeat).Run(0xc4203cdb00, 0xc420162380, 0x0, 0x0)
        /go/src/github.com/elastic/beats/heartbeat/beater/heartbeat.go:63 +0xd4
github.com/elastic/beats/libbeat/cmd/instance.(*Beat).launch(0xc420162380, 0x16c0fa0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go:308 +0x3f6
github.com/elastic/beats/libbeat/cmd/instance.Run.func1(0x16809ac, 0x9, 0x16809ac, 0x9, 0x0, 0x0, 0x16c0fa0, 0xc4200100d0, 0xc420215d40)
        /go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go:135 +0xa8
github.com/elastic/beats/libbeat/cmd/instance.Run(0x16809ac, 0x9, 0x16809ac, 0x9, 0x0, 0x0, 0x16c0fa0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go:136 +0x71
github.com/elastic/beats/libbeat/cmd.genRunCmd.func1(0xc42033ac80, 0xc42024dd10, 0x0, 0x3)
        /go/src/github.com/elastic/beats/libbeat/cmd/run.go:19 +0x60
github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).execute(0xc42033ac80, 0xc4200100d0, 0x3, 0x3, 0xc42033ac80, 0xc4200100d0)
        /go/src/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:704 +0x2c6
github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc42033ac80, 0x16809ac, 0x9, 0x0)
        /go/src/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:785 +0x30e
github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).Execute(0xc42033ac80, 0xc420215f70, 0x142c4c9)
        /go/src/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:738 +0x2b
main.main()
        /go/src/github.com/elastic/beats/heartbeat/main.go:10 +0x2f

goroutine 9 [syscall, 8 minutes]:
os/signal.signal_recv(0x16c2848)
        /usr/local/go/src/runtime/sigqueue.go:131 +0xa6
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.0
        /usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 11 [chan receive, 8 minutes]:
github.com/elastic/beats/libbeat/service.HandleSignals.func1(0xc4203c70e0, 0xc42034db30, 0xc4203cdb20)
        /go/src/github.com/elastic/beats/libbeat/service/service.go:32 +0x38
created by github.com/elastic/beats/libbeat/service.HandleSignals
        /go/src/github.com/elastic/beats/libbeat/service/service.go:31 +0x106

goroutine 27 [select]:
github.com/elastic/beats/libbeat/publisher/queue/memqueue.(*bufferingEventLoop).run(0xc420343f40)
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/eventloop.go:299 +0x2bf
github.com/elastic/beats/libbeat/publisher/queue/memqueue.NewBroker.func1(0xc420347490, 0x1ecb2a0, 0xc420343f40)
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/broker.go:139 +0x5d
created by github.com/elastic/beats/libbeat/publisher/queue/memqueue.NewBroker
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/broker.go:137 +0x3a1

goroutine 28 [select]:
github.com/elastic/beats/libbeat/publisher/queue/memqueue.(*ackLoop).run(0xc4202de730)
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/ackloop.go:43 +0x1b2
github.com/elastic/beats/libbeat/publisher/queue/memqueue.NewBroker.func2(0xc420347490, 0xc4202de730)
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/broker.go:143 +0x57
created by github.com/elastic/beats/libbeat/publisher/queue/memqueue.NewBroker
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/broker.go:141 +0x3d3

goroutine 29 [select]:
github.com/elastic/beats/libbeat/publisher/queue/memqueue.(*consumer).Get(0xc4203cd1e0, 0x32, 0xc420038e68, 0x0, 0xc420322800, 0x0)
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/consume.go:50 +0x129
github.com/elastic/beats/libbeat/publisher/pipeline.(*eventConsumer).loop(0xc4202de7d0, 0x1ecb2e0, 0xc4203cd1e0)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/consumer.go:150 +0x450
created by github.com/elastic/beats/libbeat/publisher/pipeline.newEventConsumer
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/consumer.go:60 +0x198

goroutine 30 [select, 8 minutes]:
github.com/elastic/beats/libbeat/publisher/pipeline.(*retryer).loop(0xc420323980)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/retry.go:115 +0x225
created by github.com/elastic/beats/libbeat/publisher/pipeline.newRetryer
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/retry.go:70 +0x141

goroutine 31 [chan receive]:
github.com/elastic/beats/libbeat/publisher/pipeline.(*netClientWorker).run(0xc4203239c0)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/output.go:82 +0x15c
created by github.com/elastic/beats/libbeat/publisher/pipeline.makeClientWorker
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/output.go:31 +0x100

goroutine 32 [select]:
github.com/elastic/beats/libbeat/monitoring/report/log.(*reporter).snapshotLoop(0xc4204086c0)
        /go/src/github.com/elastic/beats/libbeat/monitoring/report/log/log.go:108 +0x380
github.com/elastic/beats/libbeat/monitoring/report/log.MakeReporter.func1(0xc4204086c0)
        /go/src/github.com/elastic/beats/libbeat/monitoring/report/log/log.go:86 +0x53
created by github.com/elastic/beats/libbeat/monitoring/report/log.MakeReporter
        /go/src/github.com/elastic/beats/libbeat/monitoring/report/log/log.go:84 +0x160

goroutine 33 [select, 8 minutes, locked to thread]:
runtime.gopark(0x16c29c0, 0x0, 0x167e011, 0x6, 0x18, 0x1)
        /usr/local/go/src/runtime/proc.go:287 +0x12c
runtime.selectgo(0xc42041ef50, 0xc420146c60)
        /usr/local/go/src/runtime/select.go:395 +0x1149
runtime.ensureSigM.func1()
        /usr/local/go/src/runtime/signal_unix.go:511 +0x220
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:2337 +0x1

goroutine 13 [select]:
github.com/elastic/beats/heartbeat/scheduler.(*Scheduler).run(0xc4203c8f00)
        /go/src/github.com/elastic/beats/heartbeat/scheduler/scheduler.go:183 +0x719
created by github.com/elastic/beats/heartbeat/scheduler.(*Scheduler).Start
        /go/src/github.com/elastic/beats/heartbeat/scheduler/scheduler.go:96 +0xbc

goroutine 56 [select]:
net/http.(*persistConn).writeLoop(0xc42043cfc0)
        /usr/local/go/src/net/http/transport.go:1759 +0x165
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1187 +0xa53

goroutine 55 [IO wait]:
internal/poll.runtime_pollWait(0x7f084461deb0, 0x72, 0x0)
        /usr/local/go/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc4203c9718, 0x72, 0xffffffffffffff00, 0x1ec9aa0, 0x1ec1d50)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae
internal/poll.(*pollDesc).waitRead(0xc4203c9718, 0xc420470000, 0x1000, 0x1000)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc4203c9700, 0xc420470000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18a
net.(*netFD).Read(0xc4203c9700, 0xc420470000, 0x1000, 0x1000, 0xc420442f00, 0xc420420ef8, 0xc420147058)
        /usr/local/go/src/net/fd_unix.go:202 +0x52
net.(*conn).Read(0xc42000e440, 0xc420470000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:176 +0x6d
github.com/elastic/beats/libbeat/outputs/transport.(*statsConn).Read(0xc42046e000, 0xc420470000, 0x1000, 0x1000, 0xa9cfd8, 0xc420147200, 0xc420443320)
        /go/src/github.com/elastic/beats/libbeat/outputs/transport/stats.go:27 +0x55
net/http.(*persistConn).Read(0xc42043cfc0, 0xc420470000, 0x1000, 0x1000, 0xc420383820, 0xc420147258, 0xaf06f0)
        /usr/local/go/src/net/http/transport.go:1391 +0x140
bufio.(*Reader).fill(0xc420442300)
        /usr/local/go/src/bufio/bufio.go:97 +0x11a
bufio.(*Reader).Peek(0xc420442300, 0x1, 0x0, 0x0, 0x1, 0xc4200626c0, 0x0)
        /usr/local/go/src/bufio/bufio.go:129 +0x3a
net/http.(*persistConn).readLoop(0xc42043cfc0)
        /usr/local/go/src/net/http/transport.go:1539 +0x185
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1186 +0xa2e

goroutine 1230 [runnable]:
github.com/elastic/beats/libbeat/common.normalizeMap(0xc42012ad80, 0xc4202cfb48, 0x2, 0xa, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/common/event.go:46 +0xdf
github.com/elastic/beats/libbeat/common.normalizeValue(0x1619100, 0xc42012ad80, 0xc4202cfb48, 0x2, 0xa, 0x14d9700, 0xc4203c2c80, 0x0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/common/event.go:162 +0x148c
github.com/elastic/beats/libbeat/common.normalizeMap(0xc42012adb0, 0xc4202cfb48, 0x1, 0xa, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/common/event.go:47 +0x1c0
github.com/elastic/beats/libbeat/common.normalizeValue(0x1619100, 0xc42012adb0, 0xc4202cfb48, 0x1, 0xa, 0x1619100, 0xc42012bf20, 0x0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/common/event.go:162 +0x148c
github.com/elastic/beats/libbeat/common.normalizeMap(0xc4200ff4d0, 0xc4202cfb48, 0x0, 0xa, 0x1619100, 0xc4202cfb08, 0xc4203cb108, 0xc4202cfba8)
        /go/src/github.com/elastic/beats/libbeat/common/event.go:47 +0x1c0
github.com/elastic/beats/libbeat/common.ConvertToGenericEvent(0xc4200ff4d0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/common/event.go:31 +0x84
github.com/elastic/beats/libbeat/publisher/pipeline.glob..func2(0xc420323ac0, 0x15c4fe0, 0xc4203cb9e0, 0xa00202cfd28)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/processor.go:179 +0x51
github.com/elastic/beats/libbeat/publisher/pipeline.(*processorFn).Run(0xc4200cf460, 0xc420323ac0, 0x0, 0xc42030ce78, 0xc42014dc00)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/processor.go:170 +0x34
github.com/elastic/beats/libbeat/publisher/pipeline.(*program).Run(0xc420408450, 0xc420323ac0, 0xfaa2a8ec7cd49841, 0x0, 0x1619100)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/processor.go:140 +0xad
github.com/elastic/beats/libbeat/publisher/pipeline.(*client).publish(0xc4203c6ea0, 0xbea8fcc1d00b73e8, 0x7ee54f50c9, 0x1f16d80, 0x0, 0xc4200ff4d0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/client.go:67 +0x46f
github.com/elastic/beats/libbeat/publisher/pipeline.(*client).Publish(0xc4203c6ea0, 0xbea8fcc1d00b73e8, 0x7ee54f50c9, 0x1f16d80, 0x0, 0xc4200ff4d0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/client.go:46 +0x95
github.com/elastic/beats/heartbeat/beater.(*monitorTask).prepareSchedulerJob.func1(0x10, 0x16c1038, 0xc4202de870)
        /go/src/github.com/elastic/beats/heartbeat/beater/manager.go:298 +0x3b7
github.com/elastic/beats/heartbeat/scheduler.(*Scheduler).runTask.func1(0xc4202de870, 0xc4203c8f00, 0xc4202de870, 0xc42012aff0)
        /go/src/github.com/elastic/beats/heartbeat/scheduler/scheduler.go:322 +0x5b
created by github.com/elastic/beats/heartbeat/scheduler.(*Scheduler).runTask
        /go/src/github.com/elastic/beats/heartbeat/scheduler/scheduler.go:312 +0x63

@JoostV
Copy link

JoostV commented Apr 3, 2018

Windows server 2016:

2018-04-03T15:37:15.124+0200    DEBUG   [scheduler]     scheduler/scheduler.go:170      Next wakeup time: 2018-04-03 15:37:20.1236676 +0200 CEST
2018-04-03T15:37:15.125+0200    DEBUG   [publish]       pipeline/processor.go:275       Publish event: {
  "@timestamp": "2018-04-03T13:37:15.124Z",
  "@metadata": {
    "beat": "heartbeat",
    "type": "doc",
    "version": "6.2.3"
  },
  "monitor": {
    "status": "up",
    "scheme": "tcp",
    "type": "tcp",
    "host": "localhost",
    "ip": "127.0.0.1",
    "duration": {
      "us": 980
    },
    "id": "tcp-tcp@localhost:[7222 7226]",
    "name": "tcp"
  },
  "resolve": {
    "host": "localhost",
    "ip": "127.0.0.1",
    "rtt": {
      "us": 0
    }
  },
  "type": "monitor",
  ,
  "beat": {
    "name": "xx",
    "hostname": "xx",
    "version": "6.2.3"
  }
}
2018-04-03T15:37:15.148+0200    DEBUG   [scheduler]     scheduler/scheduler.go:208      Job 'tcp-tcp@localhost:[7222 7226]' returned at 2018-04-03 15:37:15.1487125 +0200 CEST m=+925.061852701 (cont=2)
.
2018-04-03T15:37:15.150+0200    DEBUG   [scheduler]     scheduler/scheduler.go:262      start returned tasks
2018-04-03T15:37:15.151+0200    DEBUG   [scheduler]     scheduler/scheduler.go:170      Next wakeup time: 2018-04-03 15:37:20.1236676 +0200 CEST
fatal error: 2018-04-03T15:37:15.152+0200       DEBUG   [publish]       pipeline/processor.go:275       Publish event: {
  "@timestamp": "2018-04-03T13:37:15.124Z",
  "@metadata": {
    "beat": "heartbeat",
    "type": "doc",
    "version": "6.2.3"
  },
  "monitor": {
    "status": "up",
    "host": "localhost",
    "ip": "127.0.0.1",
    "duration": {
      "us": 27981
    },
    "type": "tcp",
    "scheme": "tcp",
    "id": "tcp-tcp@localhost:[7222 7226]",
    "name": "tcp"
  },
  "beat": {
    "name": "xx",
    "hostname": "xx",
    "version": "6.2.3"
  },
  "type": "monitor",
  "tcp": {
    "rtt": {
      "connect": {
        "us": 997
      }
    },
    "port": 7222
  },
  "resolve": {
    "rtt": {
      "us": 0
    },
    "host": "localhost",
    "ip": "127.0.0.1"
  }
}
concurrent map writes2018-04-03T15:37:15.181+0200       DEBUG   [scheduler]     scheduler/scheduler.go:208      Job 'tcp-tcp@localhost:[7222 7226]' returned at 2018-04-03 15:37:15.1817154 +0200 CEST m
=+925.094855501 (cont=0).

2018-04-03T15:37:15.183+0200    DEBUG   [scheduler]     scheduler/scheduler.go:170      Next wakeup time: 2018-04-03 15:37:20.1236676 +0200 CEST

goroutine 1518 [running]:
runtime.throw(0x1043c66, 0x15)
        /usr/local/go/src/runtime/panic.go:605 +0x9c fp=0xc04221bbe0 sp=0xc04221bbc0 pc=0x42e4bc
runtime.mapassign_faststr(0xfd14c0, 0xc0423ee300, 0x1035e0c, 0x4, 0xfd14c0)
        /usr/local/go/src/runtime/hashmap_fast.go:607 +0x4fc fp=0xc04221bc60 sp=0xc04221bbe0 pc=0x40d67c
github.com/elastic/beats/libbeat/common.MapStr.DeepUpdate(0xc0423ee300, 0xc042433620)
        /go/src/github.com/elastic/beats/libbeat/common/mapstr.go:58 +0x133 fp=0xc04221bd38 sp=0xc04221bc60 pc=0x79bf73
github.com/elastic/beats/libbeat/common.deepUpdateValue(0xfd14c0, 0xc0423ee300, 0xc042433620, 0x7, 0xc0423e62d8)
        /go/src/github.com/elastic/beats/libbeat/common/mapstr.go:70 +0xaa fp=0xc04221bd68 sp=0xc04221bd38 pc=0x79c26a
github.com/elastic/beats/libbeat/common.MapStr.DeepUpdate(0xc0423e80f0, 0xc042433650)
        /go/src/github.com/elastic/beats/libbeat/common/mapstr.go:56 +0x1bd fp=0xc04221be40 sp=0xc04221bd68 pc=0x79bffd
github.com/elastic/beats/heartbeat/beater.(*monitorTask).prepareSchedulerJob.func1(0xc000000010, 0x107cb48, 0xc042312780)
        /go/src/github.com/elastic/beats/heartbeat/beater/manager.go:293 +0x33c fp=0xc04221bf88 sp=0xc04221be40 pc=0x80f0bc
github.com/elastic/beats/heartbeat/scheduler.(*Scheduler).runTask.func1(0xc042312780, 0xc0423e3080, 0xc042312780, 0xc0423ee5a0)
        /go/src/github.com/elastic/beats/heartbeat/scheduler/scheduler.go:322 +0x62 fp=0xc04221bfc0 sp=0xc04221bf88 pc=0x7f62c2
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc04221bfc8 sp=0xc04221bfc0 pc=0x45da71
created by github.com/elastic/beats/heartbeat/scheduler.(*Scheduler).runTask
        /go/src/github.com/elastic/beats/heartbeat/scheduler/scheduler.go:312 +0x6a

goroutine 1 [chan receive, 15 minutes]:
github.com/elastic/beats/heartbeat/beater.(*Heartbeat).Run(0xc042430980, 0xc04216c380, 0x0, 0x0)
        /go/src/github.com/elastic/beats/heartbeat/beater/heartbeat.go:63 +0xdb
github.com/elastic/beats/libbeat/cmd/instance.(*Beat).launch(0xc04216c380, 0x107cab0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go:308 +0x3fd
github.com/elastic/beats/libbeat/cmd/instance.Run.func1(0x103a5d5, 0x9, 0x103a5d5, 0x9, 0x0, 0x0, 0x107cab0, 0xc0420300d0, 0xc042253d40)
        /go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go:135 +0xaf
github.com/elastic/beats/libbeat/cmd/instance.Run(0x103a5d5, 0x9, 0x103a5d5, 0x9, 0x0, 0x0, 0x107cab0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go:136 +0x78
github.com/elastic/beats/libbeat/cmd.genRunCmd.func1(0xc04236ec80, 0xc0423acc30, 0x0, 0x3)
        /go/src/github.com/elastic/beats/libbeat/cmd/run.go:19 +0x67
github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).execute(0xc04236ec80, 0xc0420300d0, 0x3, 0x3, 0xc04236ec80, 0xc0420300d0)
        /go/src/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:704 +0x2cd
github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc04236ec80, 0x103a5d5, 0x9, 0x0)
        /go/src/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:785 +0x315
github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).Execute(0xc04236ec80, 0xc042253f70, 0xde0140)
        /go/src/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:738 +0x32
main.main()
        /go/src/github.com/elastic/beats/heartbeat/main.go:10 +0x36

goroutine 6 [syscall, 15 minutes]:
os/signal.signal_recv(0x0)
        /usr/local/go/src/runtime/sigqueue.go:131 +0xad
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:22 +0x29
created by os/signal.init.0
        /usr/local/go/src/os/signal/signal_unix.go:28 +0x48

goroutine 18 [select, 15 minutes]:
github.com/elastic/beats/vendor/golang.org/x/sys/windows/svc/debug.Run.func1(0xc0424540c0, 0xc042454000, 0xc042454060)
        /go/src/github.com/elastic/beats/vendor/golang.org/x/sys/windows/svc/debug/service.go:32 +0x102
created by github.com/elastic/beats/vendor/golang.org/x/sys/windows/svc/debug.Run
        /go/src/github.com/elastic/beats/vendor/golang.org/x/sys/windows/svc/debug/service.go:29 +0xeb

goroutine 47 [select]:
github.com/elastic/beats/libbeat/publisher/queue/memqueue.(*bufferingEventLoop).run(0xc042377cc0)
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/eventloop.go:299 +0x2c6
github.com/elastic/beats/libbeat/publisher/queue/memqueue.NewBroker.func1(0xc04237b500, 0x1688c20, 0xc042377cc0)
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/broker.go:139 +0x64
created by github.com/elastic/beats/libbeat/publisher/queue/memqueue.NewBroker
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/broker.go:137 +0x3a8

goroutine 48 [select]:
github.com/elastic/beats/libbeat/publisher/queue/memqueue.(*ackLoop).run(0xc042312640)
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/ackloop.go:43 +0x1b9
github.com/elastic/beats/libbeat/publisher/queue/memqueue.NewBroker.func2(0xc04237b500, 0xc042312640)
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/broker.go:143 +0x5e
created by github.com/elastic/beats/libbeat/publisher/queue/memqueue.NewBroker
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/broker.go:141 +0x3da

goroutine 49 [select]:
github.com/elastic/beats/libbeat/publisher/queue/memqueue.(*consumer).Get(0xc0423d9fe0, 0x32, 0xc04242de68, 0x0, 0xc04214c4c0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/publisher/queue/memqueue/consume.go:50 +0x130
github.com/elastic/beats/libbeat/publisher/pipeline.(*eventConsumer).loop(0xc0423126e0, 0x1688c60, 0xc0423d9fe0)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/consumer.go:150 +0x457
created by github.com/elastic/beats/libbeat/publisher/pipeline.newEventConsumer
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/consumer.go:60 +0x19f

goroutine 50 [select, 15 minutes]:
github.com/elastic/beats/libbeat/publisher/pipeline.(*retryer).loop(0xc042337b00)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/retry.go:115 +0x22c
created by github.com/elastic/beats/libbeat/publisher/pipeline.newRetryer
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/retry.go:70 +0x148

goroutine 51 [chan receive]:
github.com/elastic/beats/libbeat/publisher/pipeline.(*netClientWorker).run(0xc042337b40)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/output.go:82 +0x163
created by github.com/elastic/beats/libbeat/publisher/pipeline.makeClientWorker
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/output.go:31 +0x107

goroutine 52 [select]:
github.com/elastic/beats/libbeat/monitoring/report/log.(*reporter).snapshotLoop(0xc0424336b0)
        /go/src/github.com/elastic/beats/libbeat/monitoring/report/log/log.go:108 +0x387
github.com/elastic/beats/libbeat/monitoring/report/log.MakeReporter.func1(0xc0424336b0)
        /go/src/github.com/elastic/beats/libbeat/monitoring/report/log/log.go:86 +0x5a
created by github.com/elastic/beats/libbeat/monitoring/report/log.MakeReporter
        /go/src/github.com/elastic/beats/libbeat/monitoring/report/log/log.go:84 +0x167

goroutine 53 [chan receive, 15 minutes]:
github.com/elastic/beats/libbeat/service.HandleSignals.func1(0xc0423e1aa0, 0xc0423f6440, 0xc0424309a0)
        /go/src/github.com/elastic/beats/libbeat/service/service.go:32 +0x3f
created by github.com/elastic/beats/libbeat/service.HandleSignals
        /go/src/github.com/elastic/beats/libbeat/service/service.go:31 +0x10d

goroutine 54 [chan receive, 15 minutes]:
github.com/elastic/beats/libbeat/service.(*beatService).Execute(0x1709058, 0xc0420fe120, 0x1, 0x1, 0xc042454000, 0xc042454060, 0xc042443f40)
        /go/src/github.com/elastic/beats/libbeat/service/service_windows.go:23 +0xb8
github.com/elastic/beats/vendor/golang.org/x/sys/windows/svc/debug.Run(0xc04200e090, 0x9, 0x1681f20, 0x1709058, 0xc042443f98, 0x1)
        /go/src/github.com/elastic/beats/vendor/golang.org/x/sys/windows/svc/debug/service.go:40 +0x166
github.com/elastic/beats/libbeat/service.ProcessWindowsControlEvents(0xc0424309c0)
        /go/src/github.com/elastic/beats/libbeat/service/service_windows.go:57 +0x19a
created by github.com/elastic/beats/libbeat/service.HandleSignals
        /go/src/github.com/elastic/beats/libbeat/service/service.go:38 +0x16f

goroutine 55 [select]:
github.com/elastic/beats/heartbeat/scheduler.(*Scheduler).run(0xc0423e3080)
        /go/src/github.com/elastic/beats/heartbeat/scheduler/scheduler.go:183 +0x720
created by github.com/elastic/beats/heartbeat/scheduler.(*Scheduler).Start
        /go/src/github.com/elastic/beats/heartbeat/scheduler/scheduler.go:96 +0xc3

goroutine 68 [IO wait]:
internal/poll.runtime_pollWait(0x2343a40, 0x72, 0x0)
        /usr/local/go/src/runtime/netpoll.go:173 +0x5e
internal/poll.(*pollDesc).wait(0xc0424625d8, 0x72, 0x167f500, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb5
internal/poll.(*ioSrv).ExecIO(0x16e1090, 0xc042462498, 0x107dc58, 0x107e638, 0xc04242b950, 0xc042066b40)
        /usr/local/go/src/internal/poll/fd_windows.go:205 +0x13a
internal/poll.(*FD).Read(0xc042462480, 0xc0421b8000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_windows.go:458 +0x266
net.(*netFD).Read(0xc042462480, 0xc0421b8000, 0x1000, 0x1000, 0xc04258e1e0, 0xc04242bef8, 0xc0421ac1b8)
        /usr/local/go/src/net/fd_windows.go:151 +0x59
net.(*conn).Read(0xc042456020, 0xc0421b8000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:176 +0x74
github.com/elastic/beats/libbeat/outputs/transport.(*statsConn).Read(0xc042430140, 0xc0421b8000, 0x1000, 0x1000, 0x403e86, 0xc042152240, 0xc042034540)
        /go/src/github.com/elastic/beats/libbeat/outputs/transport/stats.go:27 +0x5c
net/http.(*persistConn).Read(0xc0423ea5a0, 0xc0421b8000, 0x1000, 0x1000, 0xc04225c100, 0xc042152298, 0x457a70)
        /usr/local/go/src/net/http/transport.go:1391 +0x147
bufio.(*Reader).fill(0xc0424ae480)
        /usr/local/go/src/bufio/bufio.go:97 +0x121
bufio.(*Reader).Peek(0xc0424ae480, 0x1, 0x0, 0x0, 0x1, 0xc0421ac240, 0x0)
        /usr/local/go/src/bufio/bufio.go:129 +0x41
net/http.(*persistConn).readLoop(0xc0423ea5a0)
        /usr/local/go/src/net/http/transport.go:1539 +0x18c
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1186 +0xa35

goroutine 69 [select]:
net/http.(*persistConn).writeLoop(0xc0423ea5a0)
        /usr/local/go/src/net/http/transport.go:1759 +0x16c
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1187 +0xa5a

@JoostV
Copy link

JoostV commented Apr 3, 2018

Windows yml file:

# Configure monitors
heartbeat.monitors:
- type: tcp
  schedule: '@every 5s' 
  hosts: ["localhost"]
  ports: [7222, 7226]
  mode: any
  
  
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["127.0.0.1:1234"]

RHEL yml file:

# Configure monitors
heartbeat.monitors:
- type: tcp
  schedule: '@every 5s' 
  hosts: ["localhost"]
  ports: [7224, 7226, 7228, 7230]
  mode: any
  
  
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["127.0.0.1:1234"]

@JoostV
Copy link

JoostV commented Apr 3, 2018

The work around for me is to add "max_procs: 1" in the general section of the heartbeat.yml

@ruflin
Copy link
Contributor

ruflin commented Apr 4, 2018

It seems two go routines write to the same map:

fields.DeepUpdate(user)

and

event.Fields.DeepUpdate(meta)

@urso Does that trigger anything on your end? We should probably have a copy of the fields or use a lock?

@urso urso self-assigned this Apr 9, 2018
@urso
Copy link

urso commented Apr 16, 2018

It's not that simple. Events are build in multiple steps. It could be a step in the past did a deep-update using a global shared map. That global shared map is now updated again using DeepUpdate. Could be anywhere. Unfortunately beats events don't have any copy-on-write semantics. (That's the reason we always Clone shared maps in the publisher pipelines processors).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants