Stopping / tracking harvester implementation #964

ruflin · 2016-02-11T12:18:37Z

The goal of this PR is properly track and shut down harvesters

ruflin · 2016-02-25T01:52:25Z

@urso This implementation has currently the problem that if the output is not responding, PublishEvent is blocking and filebeat does not stop. Any recommendation on how to handle this case?

ruflin · 2016-03-14T15:36:43Z

This should also get #1148 to green.

andrewkroh · 2016-03-14T16:41:45Z

filebeat/crawler/prospector.go


 	logp.Info("Starting prospector of type: %v", p.ProspectorConfig.Harvester.InputType)

+	defer func() {


Just use defer wg.Done(). There's a TODO statement a few lines up that can be removed now that this is deferred.

This PR simplifies elastic#964 by already applying the non related parts. * Remove unnecessary for loop in readline

urso · 2016-03-18T17:32:28Z

filebeat/crawler/prospector.go

+		go h.Stop()
+	}
+	logp.Debug("prospector", "Waiting for %d harvesters to stop", len(p.harvesters))
+	p.harvestersWaitGroup.Wait()


how is p.done used? the prospector loop starting new harvesters must be stopped before shutting down harvesters

Good point. close(p.done) should be moved up before the stopping and it seems like each prospector needs its own waitgroup or channel to notify when all Run methods are stopped.

urso · 2016-03-18T18:11:01Z

shutdown still seems to be somewhat incomplete (subject to hanging).

once harvester is running, the event flows like this (each step running in it's own go-routine communicating via channels):

harvester -> spooler -> filebeat publisher -> libbeat publisher
                                           -> registrar

on shutdown ensure the prospector loop looking for new files to be harvested is shutdown before stopping the harvesters.

after stopping harvester, stop the spooler and next the filebeat publisher. the registrar must be stopped last. Not sure if filebeat is stopping it's own publisher worker yet.

when stopping the spooler make the publisher break the queue at line 148.

The publisher in filebeat has a 'Stop' method, but not sure if it's called. Stop on publisher must be called in between spooler and registrar shutdown.

Unfortunately the publisher itself might hang on shutdown in PublishEvents to libbeat if logstash/elasticsearch is unresponsive. To solve this, I'm planning to add a Close method to libbeat publisher clients, which will break PublishEvent/PublishEvents returning an error. This method will be called by the publisher's Stop method in order to break out of unresponsive PublishEvents. Close being required by all beats will be done in another PR though.

urso · 2016-03-18T18:17:00Z

The publisher worker is initialized and started at line 96, but never killed.

ruflin · 2016-04-13T11:39:52Z

@urso Most comments applied, except for

when stopping the spooler make the publisher break the queue at line 148.

I'm not sure how to break the queue here, as in case it already hangs on line 148, the only thing I know of to stop it is closing the channel which leads to a panic? I'm sure you had a better way in mind :-)

tsg · 2016-04-18T12:39:20Z

Waiting for #1402.

The test is skipped until elastic#964 is addressed.

The test is skipped until #964 is addressed.

* Add clean prospector shutdown * Add harvester waitgroup to prospectors * Refactor harvester run function * Stop publisher

ruflin · 2016-05-12T12:44:10Z

Closing as this one got replaced by #1604

ruflin mentioned this pull request Feb 12, 2016

Filebeats not closing "old" file handles on Windows #922

Closed

ruflin force-pushed the harvester-next-step branch 3 times, most recently from ca6e3ed to cd68248 Compare February 25, 2016 01:41

ruflin mentioned this pull request Feb 28, 2016

Filebeat collect binary file with custom codec. #1055

Closed

monicasarbu added in progress Pull request is currently in progress. Filebeat Filebeat labels Mar 1, 2016

ruflin mentioned this pull request Mar 14, 2016

Prospector and Harvester Cleanup #1144

Merged

andrewkroh reviewed Mar 14, 2016
View reviewed changes

ruflin force-pushed the harvester-next-step branch from 14ce2a9 to 2b746c5 Compare March 15, 2016 08:26

ruflin added a commit to ruflin/beats that referenced this pull request Mar 18, 2016

Prospector and Harvester Cleanup

7e39032

This PR simplifies elastic#964 by already applying the non related parts. * Remove unnecessary for loop in readline

urso reviewed Mar 18, 2016
View reviewed changes

ruflin force-pushed the harvester-next-step branch from 2b746c5 to e181f96 Compare April 13, 2016 11:29

ruflin changed the title ~~First draft of stopping / tracking harvester implementation~~ Stopping / tracking harvester implementation Apr 13, 2016

ruflin mentioned this pull request Apr 18, 2016

Add test case to reproduce panic on Filebeat shutdown #1148

Closed

ruflin force-pushed the harvester-next-step branch from 69ff414 to e89777a Compare April 19, 2016 12:02

andrewkroh added a commit to andrewkroh/beats that referenced this pull request Apr 19, 2016

Add test case to reproduce panic on Filebeat shutdown

811bf08

The test is skipped until elastic#964 is addressed.

andrewkroh added a commit to andrewkroh/beats that referenced this pull request Apr 22, 2016

Add test case to reproduce panic on Filebeat shutdown

f2d6e29

The test is skipped until elastic#964 is addressed.

andrewkroh mentioned this pull request Apr 22, 2016

Add test case to reproduce panic on Filebeat shutdown #1454

Merged

ruflin pushed a commit that referenced this pull request Apr 22, 2016

Add test case to reproduce panic on Filebeat shutdown (#1454)

dcdd94c

The test is skipped until #964 is addressed.

ruflin mentioned this pull request Apr 22, 2016

Filebeat refactoring and cleanup #1423

Merged

ruflin added 3 commits April 25, 2016 13:28

First draft of stopping / tracking harvester implementation

78c4ee0

* Add clean prospector shutdown * Add harvester waitgroup to prospectors * Refactor harvester run function * Stop publisher

proper shutdown working, but race conditions ...

f15f3f0

Try getting race conditions under control

fe8b9dc

ruflin force-pushed the harvester-next-step branch from e89777a to fe8b9dc Compare April 25, 2016 12:24

ruflin mentioned this pull request Apr 27, 2016

filebeat: Flaky file rotation test #1083

Closed

ruflin closed this May 12, 2016

ruflin deleted the harvester-next-step branch May 12, 2016 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stopping / tracking harvester implementation #964

Stopping / tracking harvester implementation #964

ruflin commented Feb 11, 2016

ruflin commented Feb 25, 2016

ruflin commented Mar 14, 2016

andrewkroh Mar 14, 2016

ruflin Mar 15, 2016

urso Mar 18, 2016

ruflin Mar 21, 2016

urso commented Mar 18, 2016

urso commented Mar 18, 2016

ruflin commented Apr 13, 2016

tsg commented Apr 18, 2016

ruflin commented May 12, 2016


		logp.Info("Starting prospector of type: %v", p.ProspectorConfig.Harvester.InputType)

		defer func() {

Stopping / tracking harvester implementation #964

Stopping / tracking harvester implementation #964

Conversation

ruflin commented Feb 11, 2016

ruflin commented Feb 25, 2016

ruflin commented Mar 14, 2016

andrewkroh Mar 14, 2016

Choose a reason for hiding this comment

ruflin Mar 15, 2016

Choose a reason for hiding this comment

urso Mar 18, 2016

Choose a reason for hiding this comment

ruflin Mar 21, 2016

Choose a reason for hiding this comment

urso commented Mar 18, 2016

urso commented Mar 18, 2016

ruflin commented Apr 13, 2016

tsg commented Apr 18, 2016

ruflin commented May 12, 2016