You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CI on AppVeyor recently run into a race condition during the tests. The race condition seems to be cause by Stopping a harvester before it was fully started, means Wait() is called before all Add(1) are called on the waitgroup. See full log messages below.
Running test_shutdown.py:Test.test_shutdown_wait_ok can lead to the following log output / race condition:
2017/03/20 16:05:50.682951 beat.go:319: INFO Home path: [C:\gopath\src\github.com\elastic\beats\filebeat] Config path: [C:\gopath\src\github.com\elastic\beats\filebeat] Data path: [C:\gopath\src\github.com\elastic\beats\filebeat\data] Logs path: [C:\gopath\src\github.com\elastic\beats\filebeat\logs]
2017/03/20 16:05:50.682951 metrics.go:23: INFO Metrics logging every 30s
2017/03/20 16:05:50.682951 beat.go:344: INFO Beat metadata path: C:\gopath\src\github.com\elastic\beats\filebeat\data\meta.json
2017/03/20 16:05:50.683952 beat.go:326: INFO Beat UUID: a9a026d7-5da0-4904-aee5-dfb1bce68f32
2017/03/20 16:05:50.683952 beat.go:208: INFO Setup Beat: filebeat; Version: 6.0.0-alpha1
2017/03/20 16:05:50.683952 processor.go:44: DBG Processors:
2017/03/20 16:05:50.683952 beat.go:214: DBG Initializing output plugins
2017/03/20 16:05:50.683952 file.go:54: INFO File output path set to: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/output
2017/03/20 16:05:50.683952 file.go:55: INFO File output base filename set to: filebeat
2017/03/20 16:05:50.683952 file.go:58: INFO Rotate every bytes set to: 1024000
2017/03/20 16:05:50.683952 file.go:62: INFO Number of files set to: 7
2017/03/20 16:05:50.683952 outputs.go:117: INFO Activated file as output plugin.
2017/03/20 16:05:50.683952 publish.go:236: DBG Create output worker
2017/03/20 16:05:50.684952 publish.go:278: DBG No output is defined to store the topology. The server fields might not be filled.
2017/03/20 16:05:50.684952 publish.go:293: INFO Publisher name: APPVYR-WIN
2017/03/20 16:05:50.686952 async.go:63: INFO Flush Interval set to: -1s
2017/03/20 16:05:50.686952 async.go:64: INFO Max Bulk Size set to: -1
2017/03/20 16:05:50.687952 beat.go:251: INFO filebeat start running.
2017/03/20 16:05:50.687952 service_windows.go:51: DBG Windows is interactive: true
2017/03/20 16:05:50.687952 registrar.go:72: INFO No registry file found under: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry. Creating a new registry file.
2017/03/20 16:05:50.687952 registrar.go:346: DBG Write registry file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry
2017/03/20 16:05:50.741954 file_windows.go:51: DBG delete old: remove C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry.old: The system cannot find the file specified.
2017/03/20 16:05:50.742952 file_windows.go:57: DBG rotate to old: rename C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry.old: The system cannot find the file specified.
2017/03/20 16:05:50.742952 registrar.go:373: DBG Registry file updated. 0 states written.
2017/03/20 16:05:50.742952 registrar.go:120: INFO Loading registrar data from C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry
2017/03/20 16:05:50.742952 registrar.go:140: INFO States Loaded from registrar: 0
2017/03/20 16:05:50.742952 registrar.go:271: INFO Starting Registrar
2017/03/20 16:05:50.742952 sync.go:46: INFO Start sending events to output
2017/03/20 16:05:50.742952 spooler.go:68: INFO Starting spooler: spool_size: 2048; idle_timeout: 100ms
2017/03/20 16:05:50.742952 crawler.go:42: INFO Loading Prospectors: 1
2017/03/20 16:05:50.743951 prospector.go:90: DBG File Configs: [C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/log/*]
2017/03/20 16:05:50.743951 prospector_log.go:46: DBG exclude_files: []
2017/03/20 16:05:50.743951 prospector_log.go:71: INFO Prospector with previous states loaded: 0
2017/03/20 16:05:50.743951 prospector.go:142: INFO Starting prospector of type: log; id: 3861952635306202170
2017/03/20 16:05:50.744952 crawler.go:67: INFO Loading and starting Prospectors completed. Enabled prospectors: 1
2017/03/20 16:05:50.744952 prospector_log.go:77: DBG Start next scan
2017/03/20 16:05:50.744952 prospector_log.go:267: DBG Check file for harvesting: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:50.843080 spooler.go:103: DBG Flushing spooler because of timeout. Events flushed: 0
2017/03/20 16:05:50.943941 spooler.go:103: DBG Flushing spooler because of timeout. Events flushed: 0
2017/03/20 16:05:50.945950 service.go:33: DBG Received sigterm/sigint, stopping
2017/03/20 16:05:50.945950 service.go:39: DBG Received svc stop/shutdown request
2017/03/20 16:05:50.945950 filebeat.go:251: INFO Stopping filebeat
2017/03/20 16:05:50.945950 prospector.go:157: INFO Prospector channel stopped
2017/03/20 16:05:50.945950 crawler.go:109: INFO Stopping Crawler
2017/03/20 16:05:50.945950 crawler.go:122: INFO Stopping 1 prospectors
2017/03/20 16:05:50.956946 prospector_log.go:287: DBG Start harvester for new file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:50.957921 state.go:72: DBG New state added for C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:50.957921 log.go:341: DBG Setting offset for file based on seek: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:50.957921 log.go:325: DBG Setting offset for file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log. Offset: 0
2017/03/20 16:05:50.957921 prospector_log.go:99: DBG Prospector states cleaned up. Before: 1, After: 1
2017/03/20 16:05:50.957921 prospector.go:208: INFO Prospector ticker stopped
2017/03/20 16:05:50.957921 prospector.go:276: INFO Stopping Prospector: 3861952635306202170
2017/03/20 16:05:50.957921 log.go:109: INFO Harvester started for file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
==================
WARNING: DATA RACE
Write at 0x00c0421b57cc by goroutine 40:
internal/race.Write()
C:/Users/appveyor/.gvm/versions/go1.7.4.windows.amd64/src/internal/race/race.go:41 +0x3f
sync.(*WaitGroup).Wait()
C:/Users/appveyor/.gvm/versions/go1.7.4.windows.amd64/src/sync/waitgroup.go:129 +0x158
github.com/elastic/beats/filebeat/harvester.(*Harvester).Stop()
github.com/elastic/beats/filebeat/harvester/_obj/log.go:200 +0x81
github.com/elastic/beats/filebeat/prospector.(*harvesterRegistry).Stop.func1()
github.com/elastic/beats/filebeat/prospector/_obj/registry.go:49 +0x6b
Previous read at 0x00c0421b57cc by goroutine 38:
internal/race.Read()
C:/Users/appveyor/.gvm/versions/go1.7.4.windows.amd64/src/internal/race/race.go:37 +0x3f
sync.(*WaitGroup).Add()
C:/Users/appveyor/.gvm/versions/go1.7.4.windows.amd64/src/sync/waitgroup.go:71 +0x299
github.com/elastic/beats/filebeat/harvester.(*Harvester).Harvest()
github.com/elastic/beats/filebeat/harvester/_obj/log.go:68 +0x10a
github.com/elastic/beats/filebeat/prospector.(*harvesterRegistry).start.func1()
github.com/elastic/beats/filebeat/prospector/_obj/registry.go:76 +0xb7
Goroutine 40 (running) created at:
github.com/elastic/beats/filebeat/prospector.(*harvesterRegistry).Stop()
github.com/elastic/beats/filebeat/prospector/_obj/registry.go:50 +0x17c
github.com/elastic/beats/filebeat/prospector.(*Prospector).stop()
github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:286 +0x1e3
github.com/elastic/beats/filebeat/prospector.(*Prospector).Start.func2.1()
github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:182 +0x81
github.com/elastic/beats/filebeat/prospector.(*Prospector).Start.func2()
github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:187 +0x9a
Goroutine 38 (running) created at:
2017/03/20 16:05:50.958921 log.go:366: DBG Stopping harvester for file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
github.com/elastic/beats/filebeat/prospector.(*harvesterRegistry).start()
github.com/elastic/beats/filebeat/prospector/_obj/registry.go:77 +0xc2
github.com/elastic/beats/filebeat/prospector.(*Prospector).startHarvester()
github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:376 +0x5d4
github.com/elastic/beats/filebeat/prospector.(*ProspectorLog).scan()
github.com/elastic/beats/filebeat/prospector/_obj/prospector_log.go:288 +0x8cd
github.com/elastic/beats/filebeat/prospector.(*ProspectorLog).Run()
github.com/elastic/beats/filebeat/prospector/_obj/prospector_log.go:93 +0x124
github.com/elastic/beats/filebeat/prospector.(*Prospector).Run()
github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:195 +0x9e
github.com/elastic/beats/filebeat/prospector.(*Prospector).Start.func2()
github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:186 +0x99
==================
2017/03/20 16:05:50.958921 log.go:373: DBG Closing file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:50.958921 log.go:217: DBG Update state: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log, offset: 88
2017/03/20 16:05:50.958921 crawler.go:138: INFO Crawler stopped
2017/03/20 16:05:50.958921 filebeat.go:235: INFO Shutdown output timer started. Waiting for max 10s.
2017/03/20 16:05:51.044903 spooler.go:103: DBG Flushing spooler because of timeout. Events flushed: 1
2017/03/20 16:05:51.044903 client.go:158: DBG No events to publish
2017/03/20 16:05:51.044903 sync.go:82: DBG Events sent: 1
2017/03/20 16:05:51.044903 registrar.go:320: DBG Processing 1 events
2017/03/20 16:05:51.044903 state.go:72: DBG New state added for C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:51.044903 registrar.go:302: DBG Registrar states cleaned up. Before: 1, After: 1
2017/03/20 16:05:51.044903 registrar.go:346: DBG Write registry file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry
2017/03/20 16:05:51.066776 file_windows.go:51: DBG delete old: remove C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry.old: The system cannot find the file specified.
2017/03/20 16:05:51.067798 registrar.go:373: DBG Registry file updated. 1 states written.
2017/03/20 16:05:51.067798 signalwait.go:94: INFO Continue shutdown: All enqueued events being published.
2017/03/20 16:05:51.067798 spooler.go:116: INFO Stopping spooler
2017/03/20 16:05:51.067798 spooler.go:121: DBG Spooler has stopped
2017/03/20 16:05:51.067798 sync.go:54: DBG Shutting down sync publisher
2017/03/20 16:05:51.067798 registrar.go:338: INFO Stopping Registrar
2017/03/20 16:05:51.067798 registrar.go:287: INFO Ending Registrar
2017/03/20 16:05:51.067798 registrar.go:346: DBG Write registry file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry
2017/03/20 16:05:51.068774 registrar.go:373: DBG Registry file updated. 1 states written.
2017/03/20 16:05:51.068774 metrics.go:51: INFO Total non-zero values: beat.memstats.gc_next=4194304 beat.memstats.memory_alloc=1326664 beat.memstats.memory_total=1326664 filebeat.harvester.closed=1 filebeat.harvester.started=1 publish.events=1 registrar.states.current=1 registrar.states.update=1 registrar.writes=3
2017/03/20 16:05:51.068774 metrics.go:52: INFO Uptime: 397.8223ms
2017/03/20 16:05:51.068774 beat.go:255: INFO filebeat stopped.
PASS
coverage: 42.2% of statements in ./...
Found 1 data race(s)
The text was updated successfully, but these errors were encountered:
There is still a case where a race condition could happen (see TODO). Registry must be improved to also handle this case.
This should not be backported as the issue can only happen on shutdown in very rare cases and should not have any side effect.
Closeselastic#3779
It was possible that with reloading enabled that during the shutdown of filebeat, a new harvester was started. This is now prevent by having a lock on the starting of the harvester so no new harvesters can be started when shutdown started.
This is not backported to 5.x because it can only happen on shutdown and does not have any side affects.
Closeselastic#3779
It was possible that with reloading enabled that during the shutdown of filebeat, a new harvester was started. This is now prevent by having a lock on the starting of the harvester so no new harvesters can be started when shutdown started.
This is not backported to 5.x because it can only happen on shutdown and does not have any side affects.
Closes#3779
CI on AppVeyor recently run into a race condition during the tests. The race condition seems to be cause by Stopping a harvester before it was fully started, means
Wait()
is called before allAdd(1)
are called on the waitgroup. See full log messages below.Running
test_shutdown.py:Test.test_shutdown_wait_ok
can lead to the following log output / race condition:The text was updated successfully, but these errors were encountered: