Update libbeat publisher pipeline #4492

urso · 2017-06-12T17:52:16Z

This PR marks the beginning of the libbeat event publisher pipeline refactoring.

central to the publisher pipeline is the broker:
- broker implementation can be configured when constructing the pipeline
- common broker implementation tests in brokertest package
broker features:
- Fully in control of all published events. In comparison to old publisher pipeline with many batches in flight, the broker now configures/controls the total number of events stored in the publisher pipeline. Only after ACKs from outputs, will new space become available.
- broker returns ACKS in correct order to publisher
- broker batches up multiple ACKs
- producer can only send one event at a time to the broker (push)
- consumer can only receive batches of events from broker (pull)
- producer can cancel(remove) active events not yet pulled by a consumer
broker/output related interfaces defined in publisher package
pipeline/client interfaces for use by beats currently defined in publisher/beat package
- event structure has been changed to be more compatible with Logstash (See beat.Event): Beats can send metadata to libbeat outputs (e.g. pipeline) and logstash by using the Event.Meta field. Event fields will be stored on Event.Fields. Event fields are normalized (for use with processors) and serialized using.
The old publishers publish API is moved to libbeat/publisher/bc/publisher for now:
- move to new sub-package to fight of circular imports
- package implements old pipeline API on top of new pipeline
Filters/Processors are still executed before pushing events to the new pipeline
New API:
- beats client requirements are configured via beat.ClientConfig:
  - register async ACK callbacks (currently callbacks will not be triggered after Client.Close)
  - configurable sending guarantees (must match ACK support)
  - "wait on close", for beats clients to wait for pending events to be ACKed (only if ACK is configured)
  - pipeline also supports "wait on close", waiting for pending events (independent of ACK configurations). Can be used by any beat, to wait on shutdown for published events to be actually send

Event Structure:

The event structure has been changed a little:

type Event struct {
	Timestamp time.Time
	Meta      common.MapStr
	Fields    common.MapStr
}

We always require the timestamps.
Meta contains additional meta data (hints?) a beat can forward to the outputs. For example pipeline or index settings for the Elasticsearch output.
If output is not Elasticsearch, a @metadata field will always be written to the json document. This way Logstash can take advantage of all @metadata, even if the event has been send via kafka or redis.

Output changes

The new output plugin factory is defined as:

type Factory func(beat common.BeatInfo, cfg *common.Config) (Group, error)

The package libbeat/output/mode is being removed + all it's functionality is moved into a single implementation in the publisher pipeline supporting sync/async clients with failover and load-balancing. In the future dynamic output discovery might be added as well. This change requires output.Group to return some common settings for an active output, to configure the pipeline:

// Group configures and combines multiple clients into load-balanced group of clients
// being managed by the publisher pipeline.
type Group struct {
	Clients   []Client
	BatchSize int
	Retry     int
}

Moving functionality from the outputs to the publisher pipeline restricts beats from having one output type configured only.

All client instances configured will participate in load-balancing being driven by the publisher pipeline. This removes some intermediate workers used for forwarding batches. Future changes to groups include:

load-balance to N of M client nodes only + use left-over nodes for fail-over
overlapping fail-over: send event to other client (very short timeout) while waiting for ACK from first client -> ACK/cancel batch once one output did ACK the event (might lead to duplicates, but improves thoughput if one output is stalled).
provide functionality in groups to add/remove clients
dynamic output update/reloading by instantiating a new group and dynamically switching over to new output group.
move encoding to earlier phase in the pipeline, to limit events/memory usage and batch sizes by number of events and/or byte size

outputs always operate in batches by implement only the Publish method:

	// Publish sends events to the clients sink. A client must synchronously or
	// asynchronously ACK the given batch, once all events have been processed.
	// Using Retry/Cancelled a client can return a batch of unprocessed events to
	// the publisher pipeline. The publisher pipeline (if configured by the output
	// factory) will take care of retrying/dropping events.
	Publish(publisher.Batch) error

With:

// Batch is used to pass a batch of events to the outputs and asynchronously listening
// for signals from these outpts. After a batch is processed (completed or
// errors), one of the signal methods must be called.
type Batch interface {
	Events() []Event

	// signals
	ACK()
	Drop()
	Retry()
	RetryEvents(events []Event)
	Cancelled()
	CancelledEvents(events []Event)
}

The batch interface combines events + signaling into one common interface.
The main difference between sync/async clients is, when batch.ACK is called. Batches/Events can be processed out of order. The publisher pipelining doing the batching and load-balancing guarantees ACKs being returned to the beat in order + implements upper bound. Once publisher pipeline is 'full', it will block, waiting for ACKs from outputs.

The logic for dropping events on retry and guaranteed sending is moved to the publisher pipeline as well. Outputs are concerned with publishing and signaling ACK or Retry only.

Upcoming

improve unit tests in new pipeline
integrate pipeline stress test tool with tests
update beats configuration (remove queue_size and bulk_queue_size in favor of configurable broker)
update processors to operate on beat.Event instead of common.MapStr
make @metadata accessible from processors?
update all beats to new publisher pipeline:
1. make connect-to-new-pipeline available from bc/publisher package
2. update beats one by one to new pipeline
3. remove bc/publisher package
move the outputs codec as a setting to outputs.Group, so codecs can be applied earlier in pipeline (=> limit events per batch/queue by memory usage)

andrewkroh · 2017-06-15T22:43:57Z

libbeat/outputs/codec/json/event.go

+	"github.com/elastic/beats/libbeat/publisher/beat"
+)
+
+// Event describes the event strucutre for events


s/strucutre/structure/

urso · 2017-06-21T10:02:59Z

filebeat/tests/system/test_shutdown.py

@@ -63,9 +63,10 @@ def test_shutdown_wait_ok(self):
        assert len(registry) == 1
        assert registry[0]["offset"] == output["offset"]

+    @unittest.skip("Skipping unreliable test")


Have had to skip the test, as it has been very unreliable and I didn't manage to reproduce it yet for investigation. Plus, this feature as is will be replaced with wait shutdown support directly provided by the new publisher pipeline.

can you perhaps open a follow up github issue to track these things?

It's the only test I have had to skip. But will create meta-ticket for further pipeline work.

tsg · 2017-06-21T10:17:01Z

libbeat/publisher/beat/bc.go

@@ -0,0 +1 @@
+package beat


Empty file, is it needed?

Some left-over. Will remove it.

urso · 2017-06-21T10:19:44Z

For Reviewer notes:

the libbeat/publisher package has been reviewed in the past. Only the logging has been adapted to use logp
new publisher pipeline is not exposed to beats yet. libbeat/publisher/bc/publisher implements a compatibility layer using the old API on top of new pipeline
have a look for recent ES/LS changes potentially missing. I have had a many merge conflicts and some auto-merge potentially breaking things (have had to re-implement multiple onConnect callbacks in ES outputs)
libbeat/outputs/codec has been changed
new event type libbeat/publisher/event.go
updated /libbeat/common/fmtstr package to support new event type
small adjustments to logp package
system tests have been adapted to ignore @metadata field in events (as these fields are exposed by all outputs, but ES, but not for 'public' use).

tsg · 2017-06-21T22:36:18Z

@urso I hit an error when using an output config section like this:

output.elasticsearch:
  enabled: false
  hosts: ["localhost:9200"]
output.logstash:
  hosts: ["localhost:5044"]

The error is:

metricbeat2017/06/21 22:35:31.167563 beat.go:632: CRIT Exiting: error unpacking config data: more then one namespace configured accessing 'output' (source:'metricbeat.dev.yml')

Expected?

urso · 2017-06-22T09:13:06Z

@tsg this is to be excepted. From notes:

Moving functionality from the outputs to the publisher pipeline restricts beats from having one output type configured only.

With this PR one can have only one output configured. The error message is generated by beats config using Output common.ConfigNamespace, which allows for max one setting.

The fix for using enabled: false is in PR #4339

ruflin

I skimmed through and left some remarks. In general LGTM and we can move forward on this.

ruflin · 2017-06-22T09:59:00Z

filebeat/tests/system/test_shutdown.py

@@ -63,9 +63,10 @@ def test_shutdown_wait_ok(self):
        assert len(registry) == 1
        assert registry[0]["offset"] == output["offset"]

+    @unittest.skip("Skipping unreliable test")


can you perhaps open a follow up github issue to track these things?

ruflin · 2017-06-22T10:02:36Z

libbeat/logp/logger.go

@@ -0,0 +1,31 @@
+package logp


why did you touch the logger in this PR?

some parts of the pipeline allow for a logger to be passed by interface/type. This allows the more 'expensive' tests to capture the log output in the test context -> correctly group log messages with test output.

The old internal Logger type is renamed to logger. We now export a *Logger type with configurable selector.

ruflin · 2017-06-22T10:04:51Z

libbeat/monitoring/report/elasticsearch/elasticsearch.go


-	// backoff parameters


I assume I will find all the defaults below somewhere else :-)

This is the monitoring reporter, not the actual Elasticsearch event output.
The backoff parameters have become configurable, as one has(will have to) configure the backoff strategy when creating the outputs.Group. This is due this PR shifting some responsibilities from the outputs to the publisher pipeline itself (e.g. error handling, backoff, retry). Some of the shifting will be necessary, when introducing dynamic output reloading, as the pipeline must transfer active events from old to new outputs.

ruflin · 2017-06-22T10:06:38Z

libbeat/monitoring/report/report.go

-		if found != "" {
-			err := fmt.Errorf("multiple potential monitoring reporters found (for example %v and %v)", found, name)
-			return "", nil, err
+	if outputs.IsSet() {


Nice. I see the possibility coming of not having an output initially ;-)

No promisses ;)

ruflin · 2017-06-22T10:09:10Z

libbeat/outputs/backoff.go

+}
+
+// WithBackoff wraps a NetworkClient, adding exponential backoff support to a network client if connection/publishing failed.
+func WithBackoff(client NetworkClient, init, max time.Duration) NetworkClient {


Reminds me of the filebeat readers implementation ;-)

yeah. It's kind of a wrapper for NetworkClient. This will be moved closer to the libbeat pipeline itself. Backoff strategy + parameters will become config parameters in outputs.Group.

ruflin · 2017-06-22T10:36:40Z

libbeat/outputs/logstash/logstash.go


-// BulkPublish implements the BulkOutputer interface pushing a bulk of events
-// via lumberjack.
-func (lj *logstash) BulkPublish(


Did all this stuff get replaced by new publisher?

yes. I modified the pipeline and output interfaces to be of one kind only. Outputs always get batches and on input it's of pipeline it's always by one event being pushed. This simplifies/removes some extra logic in outputs and pipeline itself for dealing with bulk and non-bulk event handling.

ruflin · 2017-06-22T10:44:05Z

libbeat/publisher/bc/publisher/client.go

+func makeEvent(fields common.MapStr, meta common.MapStr) beat.Event {
+	var ts time.Time
+	switch value := fields["@timestamp"].(type) {
+	case time.Time:


seems we have this code in different places

Different places? The bc client is adapting the event, before pushing to the new pipeline. Maybe in some unit tests you mean?

ruflin · 2017-06-22T10:48:08Z

libbeat/publisher/broker/brokertest/log.go

+	return len(p), nil
+}
+
+func withLogOutput(fn func(*testing.T)) func(*testing.T) {


own logger?

originally I have had my own logger interface, to correctly capture log output within a test context (so log output and test name do make some more sense if something goes wrong). But when adopting pipeline to libbeat I opted for using logp. This enforced the introduction of withLogOutput, capturing stderr into t.Log, so I can still make sense of error logs mixed with pipeline processing logs.

ruflin · 2017-06-22T10:50:50Z

libbeat/publisher/includes/includes.go

+package includes
+
+import (
+	// load supported output plugins


we could generate this in the future ... but as we are not going to add outputs ...

ruflin · 2017-06-22T10:53:09Z

libbeat/tests/system/beat/beat.py

@@ -239,7 +239,9 @@ def read_output_json(self, output_file=None):
                    # hit EOF
                    break

-                jsons.append(json.loads(line))
+                event = json.loads(line)
+                del event['@metadata']


id @metadata in the output event?

?
I'm removing @metadata from events. All outputs but Elasticsearch print the events in JSON-format, as expected by logstash. including @metadata. So our beats->LS->ES configs are valid, even if kafka or redis is used. With @metadata being somewhat private, we never documented them. That is, I have to remove the fields, as a number of system tests do check all fields in an event are documented.

tsg · 2017-06-22T20:40:40Z

@urso test_invalid_config_cli_param is failing on both travis and jenkins.

urso · 2017-06-23T09:34:51Z

@tsg looking into it. Interestingly test is completely unrelated to any changes in this PR + passes for me. Did restart travis job.

urso · 2017-06-23T10:12:06Z

@tsg did fix test, let's wait for travis.

This change marks the beginning of the libbeat event publisher pipeline refactoring. - central to the publisher pipeline is the broker: - broker implementation can be configured when constructing the pipeline - common broker implementation tests in `brokertest` package - broker features: - Fully in control of all published events. In comparison to old publisher pipeline with many batches in flight, the broker now configures/controls the total number of events stored in the publisher pipeline. Only after ACKs from outputs, will new space become available. - broker returns ACKS in correct order to publisher - broker batches up multiple ACKs - producer can only send one event at a time to the broker (push) - consumer can only receive batches of events from broker (pull) - producer can cancel(remove) active events not yet pulled by a consumer - broker/output related interfaces defined in `publisher` package - pipeline/client interfaces for use by beats currently defined in `publisher/beat` package - event structure has been changed to be more compatible with Logstash (See beat.Event): Beats can send metadata to libbeat outputs (e.g. pipeline) and logstash by using the `Event.Meta` field. Event fields will be stored on `Event.Fields`. Event fields are normalized (for use with processors) and serialized using. - The old publishers publish API is moved to libbeat/publisher/bc/publisher for now: - move to new sub-package to fight of circular imports - package implements old pipeline API on top of new pipeline - Filters/Processors are still executed before pushing events to the new pipeline - New API: - beats client requirements are configured via `beat.ClientConfig`: - register async ACK callbacks (currently callbacks will not be triggered after `Client.Close`) - configurable sending guarantees (must match ACK support) - "wait on close", for beats clients to wait for pending events to be ACKed (only if ACK is configured) - pipeline also supports "wait on close", waiting for pending events (independent of ACK configurations). Can be used by any beat, to wait on shutdown for published events to be actually send Event Structure: ---------------- The event structure has been changed a little: ``` type Event struct { Timestamp time.Time Meta common.MapStr Fields common.MapStr } ``` - We always require the timestamps. - Meta contains additional meta data (hints?) a beat can forward to the outputs. For example `pipeline` or `index` settings for the Elasticsearch output. - If output is not Elasticsearch, a `@metadata` field will always be written to the json document. This way Logstash can take advantage of all `@metadata`, even if the event has been send via kafka or redis. The new output plugin factory is defined as: ``` type Factory func(beat common.BeatInfo, cfg *common.Config) (Group, error) ``` The package libbeat/output/mode is being removed + all it's functionality is moved into a single implementation in the publisher pipeline supporting sync/async clients with failover and load-balancing. In the future dynamic output discovery might be added as well. This change requires output.Group to return some common settings for an active output, to configure the pipeline: ``` // Group configures and combines multiple clients into load-balanced group of // clients being managed by the publisher pipeline. type Group struct { Clients []Client BatchSize int Retry int } ``` Moving functionality from the outputs to the publisher pipeline restricts beats from having one output type configured only. All client instances configured will participate in load-balancing being driven by the publisher pipeline. This removes some intermediate workers used for forwarding batches. Future changes to groups include: outputs always operate in batches by implement only the `Publish` method: ``` // Publish sends events to the clients sink. A client must synchronously or // asynchronously ACK the given batch, once all events have been processed. // Using Retry/Cancelled a client can return a batch of unprocessed events to // the publisher pipeline. The publisher pipeline (if configured by the output // factory) will take care of retrying/dropping events. Publish(publisher.Batch) error ``` With: ``` // Batch is used to pass a batch of events to the outputs and asynchronously listening // for signals from these outpts. After a batch is processed (completed or // errors), one of the signal methods must be called. type Batch interface { Events() []Event // signals ACK() Drop() Retry() RetryEvents(events []Event) Cancelled() CancelledEvents(events []Event) } ``` The batch interface combines `events + signaling` into one common interface. The main difference between sync/async clients is, when `batch.ACK` is called. Batches/Events can be processed out of order. The publisher pipelining doing the batching and load-balancing guarantees ACKs being returned to the beat in order + implements upper bound. Once publisher pipeline is 'full', it will block, waiting for ACKs from outputs. The logic for dropping events on retry and guaranteed sending is moved to the publisher pipeline as well. Outputs are concerned with publishing and signaling ACK or Retry only.

Write the packetbeat log output to the test output, in case of packetbeat exit code does not match the expected exit code

"Producer cancel" is a feature that allows closing queue producers to also cancel any pending events created by that producer that have not yet been sent to a queue reader. It was introduced as a small part of a [very large refactor](#4492) in 2017, but current code doesn't depend on it for anything. Since this feature adds considerable complexity to the queue API and implementation, this PR removes the feature and associated helpers. This PR should cause no user-visible behavior change.

urso added the in progress Pull request is currently in progress. label Jun 12, 2017

urso force-pushed the enh/publisher-pipeline branch from aa3995b to a6868b0 Compare June 12, 2017 19:51

urso mentioned this pull request Jun 15, 2017

vendor go-structform #4514

Merged

andrewkroh reviewed Jun 15, 2017

View reviewed changes

urso force-pushed the enh/publisher-pipeline branch 3 times, most recently from 88789e6 to cfe657d Compare June 19, 2017 20:18

urso commented Jun 21, 2017

View reviewed changes

urso added refactoring review and removed in progress Pull request is currently in progress. labels Jun 21, 2017

urso changed the title ~~[WIP] update libbeat publisher pipeline~~ Update libbeat publisher pipeline Jun 21, 2017

tsg reviewed Jun 21, 2017

View reviewed changes

ruflin approved these changes Jun 22, 2017

View reviewed changes

urso force-pushed the enh/publisher-pipeline branch from 3da60a6 to 7e6cbc8 Compare June 22, 2017 16:27

urso force-pushed the enh/publisher-pipeline branch from 7e6cbc8 to 80f5369 Compare June 23, 2017 10:11

urso force-pushed the enh/publisher-pipeline branch from fa87690 to 76176d7 Compare June 23, 2017 17:47

urso added 3 commits June 24, 2017 00:25

Packetbeat tests write log output on fail

ea48ebd

Write the packetbeat log output to the test output, in case of packetbeat exit code does not match the expected exit code

Update go-structform

f9482d7

urso force-pushed the enh/publisher-pipeline branch from 79d35f0 to f9482d7 Compare June 23, 2017 22:26

urso mentioned this pull request Jun 24, 2017

Move processors into new pipeline #4554

Merged

tsg merged commit eefdbd6 into elastic:master Jun 25, 2017

monicasarbu added the libbeat label Jun 26, 2017

urso mentioned this pull request Jul 3, 2017

Publisher Pipeline #4598

Closed

22 tasks

urso deleted the enh/publisher-pipeline branch February 19, 2019 18:27

andrewkroh mentioned this pull request Oct 6, 2021

[libbeat] Deprecate common.Float #28279

Closed

leizhag mentioned this pull request Jun 17, 2022

Remove outdated setting in Kafka output documentation #31978

Merged

6 tasks

cmacknz mentioned this pull request Jul 21, 2022

Remove Kafka worker from reference config. #32440

Merged

faec mentioned this pull request May 29, 2024

[libbeat] Remove "producer cancel" features from queue API #39760

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update libbeat publisher pipeline #4492

Update libbeat publisher pipeline #4492

urso commented Jun 12, 2017 •

edited

Loading

andrewkroh Jun 15, 2017

urso Jun 21, 2017

ruflin Jun 22, 2017

urso Jun 22, 2017

tsg Jun 21, 2017

urso Jun 21, 2017

urso commented Jun 21, 2017 •

edited

Loading

tsg commented Jun 21, 2017

urso commented Jun 22, 2017 •

edited

Loading

ruflin left a comment

ruflin Jun 22, 2017

ruflin Jun 22, 2017

urso Jun 22, 2017

urso Jun 22, 2017

ruflin Jun 22, 2017

urso Jun 22, 2017

ruflin Jun 22, 2017

urso Jun 22, 2017

ruflin Jun 22, 2017

urso Jun 22, 2017

ruflin Jun 22, 2017

urso Jun 22, 2017

ruflin Jun 22, 2017

urso Jun 22, 2017

ruflin Jun 22, 2017

urso Jun 22, 2017

ruflin Jun 22, 2017

ruflin Jun 22, 2017

urso Jun 22, 2017

tsg commented Jun 22, 2017

urso commented Jun 23, 2017

urso commented Jun 23, 2017

Update libbeat publisher pipeline #4492

Update libbeat publisher pipeline #4492

Conversation

urso commented Jun 12, 2017 • edited Loading

Event Structure:

Output changes

Upcoming

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

urso commented Jun 21, 2017 • edited Loading

tsg commented Jun 21, 2017

urso commented Jun 22, 2017 • edited Loading

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsg commented Jun 22, 2017

urso commented Jun 23, 2017

urso commented Jun 23, 2017

urso commented Jun 12, 2017 •

edited

Loading

urso commented Jun 21, 2017 •

edited

Loading

urso commented Jun 22, 2017 •

edited

Loading