Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stopping / tracking harvester implementation #964

Closed
wants to merge 3 commits into from

Conversation

ruflin
Copy link
Contributor

@ruflin ruflin commented Feb 11, 2016

The goal of this PR is properly track and shut down harvesters

@ruflin ruflin force-pushed the harvester-next-step branch 3 times, most recently from ca6e3ed to cd68248 Compare February 25, 2016 01:41
@ruflin
Copy link
Contributor Author

ruflin commented Feb 25, 2016

@urso This implementation has currently the problem that if the output is not responding, PublishEvent is blocking and filebeat does not stop. Any recommendation on how to handle this case?

@monicasarbu monicasarbu added in progress Pull request is currently in progress. Filebeat Filebeat labels Mar 1, 2016
@ruflin
Copy link
Contributor Author

ruflin commented Mar 14, 2016

This should also get #1148 to green.


logp.Info("Starting prospector of type: %v", p.ProspectorConfig.Harvester.InputType)

defer func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use defer wg.Done(). There's a TODO statement a few lines up that can be removed now that this is deferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ruflin ruflin force-pushed the harvester-next-step branch from 14ce2a9 to 2b746c5 Compare March 15, 2016 08:26
ruflin added a commit to ruflin/beats that referenced this pull request Mar 18, 2016
This PR simplifies elastic#964 by already applying the non related parts.

* Remove unnecessary for loop in readline
go h.Stop()
}
logp.Debug("prospector", "Waiting for %d harvesters to stop", len(p.harvesters))
p.harvestersWaitGroup.Wait()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is p.done used? the prospector loop starting new harvesters must be stopped before shutting down harvesters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. close(p.done) should be moved up before the stopping and it seems like each prospector needs its own waitgroup or channel to notify when all Run methods are stopped.

@urso
Copy link

urso commented Mar 18, 2016

shutdown still seems to be somewhat incomplete (subject to hanging).

once harvester is running, the event flows like this (each step running in it's own go-routine communicating via channels):

harvester -> spooler -> filebeat publisher -> libbeat publisher
                                           -> registrar

on shutdown ensure the prospector loop looking for new files to be harvested is shutdown before stopping the harvesters.

after stopping harvester, stop the spooler and next the filebeat publisher. the registrar must be stopped last. Not sure if filebeat is stopping it's own publisher worker yet.

when stopping the spooler make the publisher break the queue at line 148.

The publisher in filebeat has a 'Stop' method, but not sure if it's called. Stop on publisher must be called in between spooler and registrar shutdown.

Unfortunately the publisher itself might hang on shutdown in PublishEvents to libbeat if logstash/elasticsearch is unresponsive. To solve this, I'm planning to add a Close method to libbeat publisher clients, which will break PublishEvent/PublishEvents returning an error. This method will be called by the publisher's Stop method in order to break out of unresponsive PublishEvents. Close being required by all beats will be done in another PR though.

@urso
Copy link

urso commented Mar 18, 2016

The publisher worker is initialized and started at line 96, but never killed.

@ruflin ruflin force-pushed the harvester-next-step branch from 2b746c5 to e181f96 Compare April 13, 2016 11:29
@ruflin
Copy link
Contributor Author

ruflin commented Apr 13, 2016

@urso Most comments applied, except for

when stopping the spooler make the publisher break the queue at line 148.

I'm not sure how to break the queue here, as in case it already hangs on line 148, the only thing I know of to stop it is closing the channel which leads to a panic? I'm sure you had a better way in mind :-)

@ruflin ruflin changed the title First draft of stopping / tracking harvester implementation Stopping / tracking harvester implementation Apr 13, 2016
@tsg
Copy link
Contributor

tsg commented Apr 18, 2016

Waiting for #1402.

andrewkroh added a commit to andrewkroh/beats that referenced this pull request Apr 19, 2016
andrewkroh added a commit to andrewkroh/beats that referenced this pull request Apr 22, 2016
ruflin pushed a commit that referenced this pull request Apr 22, 2016
ruflin added 3 commits April 25, 2016 13:28
* Add clean prospector shutdown
* Add harvester waitgroup to prospectors
* Refactor harvester run function
* Stop publisher
@ruflin
Copy link
Contributor Author

ruflin commented May 12, 2016

Closing as this one got replaced by #1604

@ruflin ruflin closed this May 12, 2016
@ruflin ruflin deleted the harvester-next-step branch May 12, 2016 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat in progress Pull request is currently in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants