Reduce dependencies in Crawler #17653

urso · 2020-04-09T21:00:43Z

Refactoring

What does this PR do?

The crawler creates active inputs for static configuration, starts
config file reloading, and starts the module loader.
With this change the crawler has no direct dependency (well, reduced) on
input.Input anymore, but will use the Runner interface, even for
statically configured inputs.
This also reduces dependencies, as most plumbing is already done by the
inputs.RunnerFactory and must not be duplicated by the crawler anymore.

The input.Runner used to compute a 'ID' by hashing the inputs
configuration. The ID was public, to be used by the crawler only.
Instead of having the input compute the ID, it is the crawler who will
compute the input ID now.

While reducing dependencies, I replaced logp.Info calls with an internal logp.Logger instance for logging purposes.

Note: the cfgfile RunnerList maintains its own set of IDs. The crawler
and RunnerList each used to use the ID to check for 'duplicate'
configurations, but because the IDs are not 'shred' duplication
detection is not across the Beat.
ID detection is actively used by input config file reloading and auto
discvovery only, in order to check if an input still needs to be
running, are shall shut down.

Why is it important?

Reduce dependencies and responsibilities in the crawler, while centering input configuration and running more around the RunnerFactory and Runner interfaces only. This changes makes it easier to integrate with alternative RunnerFactory implementations in the future.

Checklist

My code follows the style guidelines of this project
~~- [ ] I have commented my code, particularly in hard-to-understand areas~~
~~- [ ] I have made corresponding changes to the documentation~~
~~- [ ] I have made corresponding change to the default configuration files~~
~~- [ ] I have added tests that prove my fix is effective or that my feature works~~
~~- [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.~~

The crawler creates active inputs for static configuration, starts config file reloading, and starts the module loader. With this change the crawler has no direct dependency (well, reduced) on input.Input anymore, but will use the `Runner` interface, even for statically configured inputs. This also reduces dependencies, as most plumbing is already done by the inputs.RunnerFactory and must not be duplicated by the crawler anymore. The input.Runner used to compute a 'ID' by hashing the inputs configuration. The ID was public, to be used by the crawler only. Instead of having the input compute the ID, it is the crawler who will compute the input ID now. Note: the cfgfile RunnerList maintains its own set of IDs. The crawler and RunnerList each used to use the ID to check for 'duplicate' configurations, but because the IDs are not 'shred' duplication detection is not across the Beat. ID detection is actively used by input config file reloading and auto discvovery only, in order to check if an input still needs to be running, are shall shut down.

elasticmachine · 2020-04-09T21:00:45Z

Pinging @elastic/integrations-services (Team:Services)

urso · 2020-04-09T21:03:51Z

filebeat/beater/crawler.go

+		return fmt.Errorf("Error while initializing input: %+v", err)
+	}
+	if inputRunner, ok := runner.(*input.Runner); ok {
+		inputRunner.Once = c.once


input.New returns a partially initialized type. The once flag must only be set for static inputs, but the factory is used for static and dynamic inputs. As I expect the crawler to go away in the future I wanted keep the scope rather small for what I need for the integration of the new input API, I still keep this small piece of tech debt here for now.

* Reduce dependencies in Crawler The crawler creates active inputs for static configuration, starts config file reloading, and starts the module loader. With this change the crawler has no direct dependency (well, reduced) on input.Input anymore, but will use the `Runner` interface, even for statically configured inputs. This also reduces dependencies, as most plumbing is already done by the inputs.RunnerFactory and must not be duplicated by the crawler anymore. The input.Runner used to compute a 'ID' by hashing the inputs configuration. The ID was public, to be used by the crawler only. Instead of having the input compute the ID, it is the crawler who will compute the input ID now. Note: the cfgfile RunnerList maintains its own set of IDs. The crawler and RunnerList each used to use the ID to check for 'duplicate' configurations, but because the IDs are not 'shred' duplication detection is not across the Beat. ID detection is actively used by input config file reloading and auto discvovery only, in order to check if an input still needs to be running, are shall shut down. * fix import formatting in crawler * Update log message test looks for (cherry picked from commit 52fa265)

urso added review refactoring Filebeat Filebeat Team:Services (Deprecated) Label for the former Integrations-Services team labels Apr 9, 2020

urso requested a review from kvch April 9, 2020 21:00

urso commented Apr 9, 2020

View reviewed changes

urso mentioned this pull request Apr 9, 2020

Start to split filebeat/channel up #17655

Merged

3 tasks

urso added 2 commits April 10, 2020 15:57

fix import formatting in crawler

eb22b81

Update log message test looks for

72061ab

andresrc added [zube]: Inbox [zube]: In Review and removed [zube]: Inbox labels Apr 13, 2020

kvch approved these changes Apr 14, 2020

View reviewed changes

urso added the needs_backport PR is waiting to be backported to other branches. label Apr 15, 2020

urso merged commit 52fa265 into elastic:master Apr 15, 2020

urso deleted the crawler-use-runner-factory branch April 15, 2020 13:26

zube bot added [zube]: Done and removed [zube]: In Review labels Apr 15, 2020

urso mentioned this pull request Apr 15, 2020

Cherry-pick #17653 to 7.x: Reduce dependencies in Crawler #17727

Merged

1 task

urso added v7.8.0 and removed needs_backport PR is waiting to be backported to other branches. labels Apr 15, 2020

urso pushed a commit that referenced this pull request Apr 23, 2020

Cherry-pick #17653 to 7.x: Reduce dependencies in Crawler (#17727)

7731ffa

andresrc removed the [zube]: Done label Apr 27, 2020

urso mentioned this pull request Jul 3, 2020

Cherry-pick #17655 to 7.x: Start to split filebeat/channel up #19654

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce dependencies in Crawler #17653

Reduce dependencies in Crawler #17653

urso commented Apr 9, 2020

elasticmachine commented Apr 9, 2020

urso Apr 9, 2020

Reduce dependencies in Crawler #17653

Reduce dependencies in Crawler #17653

Conversation

urso commented Apr 9, 2020

What does this PR do?

Why is it important?

Checklist

elasticmachine commented Apr 9, 2020

urso Apr 9, 2020

Choose a reason for hiding this comment