Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic 'fatal error: concurrent map iteration and map write' autodiscover and processor. #8040

Closed
ph opened this issue Aug 21, 2018 · 6 comments · Fixed by #8223
Closed

Panic 'fatal error: concurrent map iteration and map write' autodiscover and processor. #8040

ph opened this issue Aug 21, 2018 · 6 comments · Fixed by #8223

Comments

@ph
Copy link
Contributor

ph commented Aug 21, 2018

Created from: https://discuss.elastic.co/t/filebeat-autodiscover-gives-fatal-error-concurrent-map-iteration-and-map-write/145296

This look like we are not copying a shared data structure and this make the code panic with a concurrent error.

filebeat.autodiscover:
  providers:
    - type: docker
      templates:
        - condition:
            regexp:
              docker.container.image: '.*'
          config:
            - type: docker
              containers:
                path: '/usr/share/filebeat/containers/'
                stream: 'all'
                ids:
                  - '${data.docker.container.id}'
              processors:
                - add_docker_metadata: ~
                - add_host_metadata:
                    netinfo.enabled: true
output.logstash:
  hosts: ['logstash:5044', 'logstash2:5044']
  loadbalance: true
fatal error: concurrent map iteration and map write
goroutine 32 [running]:
runtime.throw(0x1638ce3, 0x26)
    /usr/local/go/src/runtime/panic.go:605 +0x95 fp=0xc42259b5f0 sp=0xc42259b5d0 pc=0xaba0e5
runtime.mapiternext(0xc42259b6c0)
    /usr/local/go/src/runtime/hashmap.go:778 +0x6f1 fp=0xc42259b688 sp=0xc42259b5f0 pc=0xa98291
github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.foldMapInterface(0xc4203f32c0, 0x14c5ac0, 0xc42074eb10, 0xc42074eb10, 0x16610d8)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_map.go:34 +0xea fp=0xc42259b730 sp=0xc42259b688 pc=0xeb3b5a
github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.foldInterfaceValue(0xc4203f32c0, 0x15aa700, 0xc42074eb10, 0x0, 0x0)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold.go:89 +0x15b fp=0xc42259b7a8 sp=0xc42259b730 pc=0xeb1b6b
github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.foldMapInlineInterface(0xc4203f32c0, 0x15aa700, 0xc421b8e410, 0x95, 0x15aa700, 0xc421b8e410)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_map_inline.generated.go:44 +0x158 fp=0xc42259b868 sp=0xc42259b7a8 pc=0xeb46e8
github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.makeFieldInlineFold.func1(0xc4203f32c0, 0x15397c0, 0xc421b8e3c0, 0x99, 0x0, 0x0)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_reflect.go:292 +0x86 fp=0xc42259b8b8 sp=0xc42259b868 pc=0xf17e06
github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.makeFieldsFold.func1(0xc4203f32c0, 0x15397c0, 0xc421b8e3c0, 0x99, 0x0, 0x15397c0)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_reflect.go:177 +0x88 fp=0xc42259b910 sp=0xc42259b8b8 pc=0xf17c18
github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.makeStructFold.func1(0xc4203f32c0, 0x15397c0, 0xc421b8e3c0, 0x99, 0x0, 0x0)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_reflect.go:167 +0x95 fp=0xc42259b958 sp=0xc42259b910 pc=0xf17b15
github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.foldAnyReflect(0xc4203f32c0, 0x15397c0, 0xc421b8e3c0, 0x99, 0x99, 0x0)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_reflect.go:511 +0xb9 fp=0xc42259b998 sp=0xc42259b958 pc=0xeb9ce9
github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.foldInterfaceValue(0xc4203f32c0, 0x15397c0, 0xc421b8e3c0, 0xc421b8e3c0, 0xc421b8e3c0)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold.go:92 +0x1c9 fp=0xc42259ba10 sp=0xc42259b998 pc=0xeb1bd9
github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.(*Iterator).Fold(0xc4203f32c0, 0x15397c0, 0xc421b8e3c0, 0xc421b8e3c0, 0xc4208d3bb0)
  ...
@ph
Copy link
Contributor Author

ph commented Aug 21, 2018

Note: add_docker_metadata should not be needed since the docker autodiscover already add the metadata for you. Maybe this reason of this problem.

@ph
Copy link
Contributor Author

ph commented Aug 27, 2018

I've run this weekend a stress test with Filebeat and only the add_host_metadata with the netinfo.enabled: true and aggressive cache invalidation but I did not run into panics.

The code changed recently with the addition of netinfo in #6878

@ph
Copy link
Contributor Author

ph commented Aug 27, 2018

I will keep it running with the race detector to see if I catch the offending goroutine. theory so far is the way filebeat is structured with the outlet and the cache data structure inside the processors is problematic.

@ddanielr
Copy link

ddanielr commented Aug 31, 2018

running 6.3.1 and getting the same error as well. pulled code and rebuild with -race, getting the following race condition on 6.3.1 and 6.4.0

==================
WARNING: DATA RACE
Read at 0x00c0005275f0 by goroutine 33:
  github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.foldMapInterface()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_map.go:30 +0x74
  github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.foldInterfaceValue()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold.go:89 +0x56a
  github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.foldMapInlineInterface()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_map_inline.generated.go:44 +0x1ad
  github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.makeFieldInlineFold.func1()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_reflect.go:292 +0x9d
  github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.makeFieldsFold.func1()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_reflect.go:177 +0xbd
  github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.makeStructFold.func1()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_reflect.go:167 +0xd8
  github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.foldAnyReflect()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold_reflect.go:511 +0xe5
  github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.foldInterfaceValue()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold.go:92 +0x3c7
  github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype.(*Iterator).Fold()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-structform/gotype/fold.go:69 +0x4e
  github.com/elastic/beats/libbeat/outputs/codec/json.(*Encoder).Encode()
      /root/go/src/github.com/elastic/beats/libbeat/outputs/codec/json/json.go:96 +0x2e9
  github.com/elastic/beats/libbeat/outputs/logstash.makeLogstashEventEncoder.func1()
      /root/go/src/github.com/elastic/beats/libbeat/outputs/logstash/enc.go:28 +0x82
  github.com/elastic/beats/vendor/github.com/elastic/go-lumber/client/v2.(*Client).serialize()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-lumber/client/v2/client.go:207 +0xda
  github.com/elastic/beats/vendor/github.com/elastic/go-lumber/client/v2.(*Client).Send()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-lumber/client/v2/client.go:123 +0x435
  github.com/elastic/beats/vendor/github.com/elastic/go-lumber/client/v2.(*AsyncClient).Send()
      /root/go/src/github.com/elastic/beats/vendor/github.com/elastic/go-lumber/client/v2/async.go:103 +0x8d
  github.com/elastic/beats/libbeat/outputs/logstash.(*asyncClient).sendEvents()
      /root/go/src/github.com/elastic/beats/libbeat/outputs/logstash/async.go:201 +0x20f
  github.com/elastic/beats/libbeat/outputs/logstash.(*asyncClient).Publish()
      /root/go/src/github.com/elastic/beats/libbeat/outputs/logstash/async.go:154 +0x66d
  github.com/elastic/beats/libbeat/outputs.(*backoffClient).Publish()
      /root/go/src/github.com/elastic/beats/libbeat/outputs/backoff.go:60 +0x68
  github.com/elastic/beats/libbeat/publisher/pipeline.(*netClientWorker).run()
      /root/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/output.go:107 +0x2d5

Previous write at 0x00c0005275f0 by goroutine 136:
  runtime.mapassign_faststr()
      /usr/local/go/src/runtime/map_faststr.go:190 +0x0
  github.com/elastic/beats/libbeat/common.MapStr.DeepUpdate()
      /root/go/src/github.com/elastic/beats/libbeat/common/mapstr.go:75 +0x175
  github.com/elastic/beats/libbeat/common.deepUpdateValue()
      /root/go/src/github.com/elastic/beats/libbeat/common/mapstr.go:87 +0xbc
  github.com/elastic/beats/libbeat/common.MapStr.DeepUpdate()
      /root/go/src/github.com/elastic/beats/libbeat/common/mapstr.go:73 +0x21d
  github.com/elastic/beats/libbeat/publisher/pipeline.makeAddFieldsProcessor.func2()
      /root/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/processor.go:272 +0x7b
  github.com/elastic/beats/libbeat/publisher/pipeline.newAnnotateProcessor.func1()
      /root/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/processor.go:191 +0x46
  github.com/elastic/beats/libbeat/publisher/pipeline.(*processorFn).Run()
      /root/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/processor.go:197 +0x59
  github.com/elastic/beats/libbeat/publisher/pipeline.(*program).Run()
      /root/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/processor.go:167 +0x127
  github.com/elastic/beats/libbeat/publisher/pipeline.(*client).publish()
      /root/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/client.go:84 +0x8dc
  github.com/elastic/beats/libbeat/publisher/pipeline.(*client).Publish()
      /root/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/client.go:63 +0xf2
  github.com/elastic/beats/filebeat/channel.(*outlet).OnEvent()
      /root/go/src/github.com/elastic/beats/filebeat/channel/outlet.go:63 +0x241
  github.com/elastic/beats/filebeat/channel.SubOutlet.func1()
      /root/go/src/github.com/elastic/beats/filebeat/channel/util.go:54 +0x82

Goroutine 33 (running) created at:
  github.com/elastic/beats/libbeat/publisher/pipeline.makeClientWorker()
      /root/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/output.go:48 +0x16d
  github.com/elastic/beats/libbeat/publisher/pipeline.(*outputController).Set()
      /root/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/controller.go:102 +0x178
  github.com/elastic/beats/libbeat/publisher/pipeline.New()
      /root/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/pipeline.go:207 +0x8ef
  github.com/elastic/beats/libbeat/publisher/pipeline.Load()
      /root/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/module.go:93 +0x896
  github.com/elastic/beats/libbeat/cmd/instance.(*Beat).createBeater()
      /root/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go:291 +0x3c5
  github.com/elastic/beats/libbeat/cmd/instance.(*Beat).launch()
      /root/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go:325 +0x1af
  github.com/elastic/beats/libbeat/cmd/instance.Run.func1()
      /root/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go:188 +0x73a
  github.com/elastic/beats/libbeat/cmd/instance.Run()
      /root/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go:189 +0x84
  github.com/elastic/beats/libbeat/cmd.genRunCmd.func1()
      /root/go/src/github.com/elastic/beats/libbeat/cmd/run.go:36 +0x7b
  github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).execute()
      /root/go/src/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:704 +0x898
  github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).ExecuteC()
      /root/go/src/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:785 +0x3bc
  github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).Execute()
      /root/go/src/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:738 +0x38
  main.main()
      /root/go/src/github.com/elastic/beats/filebeat/main.go:35 +0x4c

Goroutine 136 (running) created at:
  github.com/elastic/beats/filebeat/channel.SubOutlet()
      /root/go/src/github.com/elastic/beats/filebeat/channel/util.go:52 +0x18d
  github.com/elastic/beats/filebeat/input/log.subOutletWrap.func1()
      /root/go/src/github.com/elastic/beats/filebeat/input/log/input.go:627 +0xc1
  github.com/elastic/beats/filebeat/input/log.(*Harvester).Run()
      /root/go/src/github.com/elastic/beats/filebeat/input/log/harvester.go:193 +0x103
  github.com/elastic/beats/filebeat/harvester.(*Registry).Start.func1()
      /root/go/src/github.com/elastic/beats/filebeat/harvester/registry.go:96 +0x80

==================

contents of filebeat.yml

filebeat.inputs:

  - type: docker
    containers:
      path: "/data/docker/containers"
      ids:
        - '*'
    json.message_key: log
    json.keys_under_root: true
    multiline.pattern: '^(00|[0-9]|1[0-9]|2[0-3]):([0-9]|[0-5][0-9]):([0-9]|[0-5][0-9])\s.........[\t]'
    multiline.negate: false
    multiline.match: after
    processors:
      - add_host_metadata:
          netinfo.enabled: true
      - add_cloud_metadata: ~

output.logstash:
  hosts: ["<logstash_host>:5044"]
  timeout: 15

logging.level: debug
logging.selectors: '*'
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat.log
  keepfiles: 7
  permissions: 0644

@urso
Copy link

urso commented Sep 1, 2018

Thank you for the trace.

Which version did you create the last trace with? The Lines between both versions will be different, but I think I found it.

Skimming the trace, the bug seems to be most likely due to add_host_metadata. add_host_metadata adds a dictionary to the host field. The dictionary which is shared (by reference) between a many events. The host.name field (added with beats 6.3) is some global meta-data which overwrites and updates the host document. Beats add global meta-data at the end of any processing. Due to the host document being a shared instance, we see a race condition when it is serialized in an older event, while a new event is actively processed.

Updating Run in the add_host_metadata processor should fix this race condition:

func (p *addHostMetadata) Run(event *beat.Event) (*beat.Event, error) {
	p.loadData()
	event.Fields.DeepUpdate(p.data.Clone())
	return event, nil
}

@ph
Copy link
Contributor Author

ph commented Sep 3, 2018 via email

ph added a commit to ph/beats that referenced this issue Sep 4, 2018
Fix a race between updating the host data structure and the
serialization of events to json.

closes elastic#8040
@ph ph closed this as completed in #8223 Sep 5, 2018
ph added a commit that referenced this issue Sep 5, 2018
Fix a race between updating the host data structure and the
serialization of events to json.

closes #8040
ph added a commit to ph/beats that referenced this issue Sep 5, 2018
Fix a race between updating the host data structure and the
serialization of events to json.

closes elastic#8040

(cherry picked from commit 415715d)
ruflin pushed a commit that referenced this issue Sep 5, 2018
)

Fix a race between updating the host data structure and the
serialization of events to json.

closes #8040

(cherry picked from commit 415715d)
ph added a commit to ph/beats that referenced this issue Sep 5, 2018
Fix a race between updating the host data structure and the
serialization of events to json.

closes elastic#8040

(cherry picked from commit 415715d)
ph added a commit to ph/beats that referenced this issue Sep 5, 2018
Fix a race between updating the host data structure and the
serialization of events to json.

closes elastic#8040

(cherry picked from commit 415715d)
ph added a commit that referenced this issue Sep 5, 2018
…data processor (#8237)

Fix a race between updating the host data structure and the
serialization of events to json.

closes #8040

(cherry picked from commit 415715d)
ph added a commit that referenced this issue Sep 5, 2018
…data processor (#8238)

Cherry-pick of PR #8223 to 6.x branch. Original message: 

Fix a race between updating the host data structure and the
serialization of events to json.

closes #8040
leweafan pushed a commit to leweafan/beats that referenced this issue Apr 28, 2023
…st_metadata processor (elastic#8237)

Fix a race between updating the host data structure and the
serialization of events to json.

closes elastic#8040

(cherry picked from commit b497720)
leweafan pushed a commit to leweafan/beats that referenced this issue Apr 28, 2023
…) (elastic#8225)

Fix a race between updating the host data structure and the
serialization of events to json.

closes elastic#8040

(cherry picked from commit b497720)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants