Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libbeat: Stop autodiscover runners in a goroutine #7170

Merged
merged 1 commit into from
May 28, 2018

Conversation

jsoriano
Copy link
Member

Stopping runners can be a blocking operation, what can block the
autodiscover event loop. Run it in a goroutine so events can continue
being handled.

@jsoriano jsoriano added discuss Issue needs further discussion. libbeat labels May 24, 2018
@jsoriano jsoriano requested review from urso and exekias May 24, 2018 11:18
Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it will need a changelog, as I think this is a bugfix

@exekias
Copy link
Contributor

exekias commented May 24, 2018

@urso @ruflin I find this interesting, do you have any insight about why module stopping may block for a while?

@exekias
Copy link
Contributor

exekias commented May 24, 2018

@jsoriano we should backport this to, at least, 6.3

@jsoriano jsoriano added needs_backport PR is waiting to be backported to other branches. bug labels May 24, 2018
@jsoriano
Copy link
Member Author

jsoriano commented May 24, 2018

In case it gives any clue, runner.Start() doesn't seem to block, and when it is blocked on Stop() it also blocks graceful stop of the process with Ctrl-C.

@@ -178,7 +178,7 @@ func (a *Autodiscover) handleStop(event bus.Event) {

if runner := a.runners.Get(hash); runner != nil {
logp.Info("Autodiscover stopping runner: %s", runner)
runner.Stop()
go runner.Stop()
a.runners.Remove(hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the hash should only be removed if stopping is finished I would think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually want to remove it here, if a new event arrives to start the same module we want that to happen, even if the previous instance is still stopping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so adding and removing is synchronous. To make it more clear in the code we should add a comment here and move the stopping command after the removal of the hash.

Any concerns about leaking go routines here if Stop() gets stuck?

@ruflin
Copy link
Contributor

ruflin commented May 24, 2018

Could this cause some race conditions?

@exekias exekias added the v6.3.0 label May 24, 2018
@jsoriano
Copy link
Member Author

@ruflin it is possible, yes, not sure what happens if a runner is stopped twice.

@exekias
Copy link
Contributor

exekias commented May 25, 2018

I don't think it's possible, events are processed by autodiscover one bye one, it acts as a state machine. This should ensure that the event is removed from the list by the moment you process the next event. For instance 2 stop events (this can happen) would go like this:

  • First event calls runner.Stop() as a goroutine, stop beggins. The runner is removed from the list.
  • Second event goes to the list and see no runner for the given config, so it does nothing; even if the runner is still stopping.

@jsoriano jsoriano force-pushed the autodiscover-async-stop branch from 10a77c1 to 26ef20f Compare May 25, 2018 10:57
@jsoriano
Copy link
Member Author

Added changelog.

@ruflin
Copy link
Contributor

ruflin commented May 25, 2018

Change LGTM but I think we should document what we discussed here in the code for the next person touching this code.

Stopping runners can be a blocking operation, what can block the
autodiscover event loop. Run it in a goroutine so events can continue
being handled.
@jsoriano jsoriano force-pushed the autodiscover-async-stop branch from 26ef20f to e5050e7 Compare May 25, 2018 16:04
@jsoriano
Copy link
Member Author

@ruflin ok, I have added a comment explaining the decissions there.

@ruflin
Copy link
Contributor

ruflin commented May 28, 2018

jenkins, test this

@ruflin ruflin merged commit 500ab7c into elastic:master May 28, 2018
@jsoriano jsoriano added v6.3.0 and removed needs_backport PR is waiting to be backported to other branches. labels May 28, 2018
jsoriano added a commit to jsoriano/beats that referenced this pull request May 28, 2018
Stopping runners can be a blocking operation, what can block the
autodiscover event loop. Run it in a goroutine so events can continue
being handled.

(cherry picked from commit 500ab7c)
exekias pushed a commit that referenced this pull request May 29, 2018
Stopping runners can be a blocking operation, what can block the
autodiscover event loop. Run it in a goroutine so events can continue
being handled.

(cherry picked from commit 500ab7c)
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…stic#7186)

Stopping runners can be a blocking operation, what can block the
autodiscover event loop. Run it in a goroutine so events can continue
being handled.

(cherry picked from commit eb58c3a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug discuss Issue needs further discussion. libbeat review v6.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants