Filebeat: multiline: introduce merge by using max-lines as condition instead of pattern #18038

williamd67 · 2020-04-28T08:03:18Z

Describe the enhancement:
Once in a while people like to merge messages into a single line not based on a pattern but based on the number of lines that have to be merged. This may be caused by not having a clear usable pattern or by just wanting to reduce the number of lines in a message by combining several. There are situations that it may also be handy to combine the lines into a JSON-array that can be used by other applications.

I propose to introduce an extra multiline parameter kind that distinguishes this behavior. Of course all the other parameters are still valid so in theory you can combine the pattern and the max_lines parameters. Although in practice I do not expect that.

The values of the kind parameter would be <<empty>> (default and current implementation), merge, and merge-json, where merge-json will combine the messages in a JSON-array.

Describe a specific use case for the enhancement or feature:
It is when you know the number of lines of an event but there is no clear pattern.

Per example someone has dumped a database table one field per line. In that case you know the number of lines for a row (= number of columns) but creating a pattern for that may be hard. In this situation the configuration can be as follows:

multiline.kind: "merge"
multiline.pattern: ".*"
multiline.match: "before"
multiline.negate: false
multiline.max_lines: 13

where 13 is the number of columns in a row. This will create a single event for a single row. In case you would choose merge-json they would be combined in one JSON-array.

Another use-case is that someone just want to group a set of events that are similar. Per example the application is creating a lot of events and you want to put them in buckets of 300 each so that you can handle such group as a single event. In that case the configuration can be as follows:

multiline.kind: "merge"
multiline.pattern: ".*"
multiline.match: "before"
multiline.negate: false
multiline.max_lines: 300

A side-effect of the merge and merge-json options are that there are no lines discarded.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-04-28T09:53:21Z

Pinging @elastic/integrations-services (Team:Services)

kvch · 2020-05-05T16:38:26Z

I am not sure I completely understand your request. Is merge-json is a kind of multiline? If you would like to parse JSON why not use decode_json_fields processor in this case?

If you configure merge, do you still need the pattern-based multiline aggregation as well? Or you just want to read every N lines into a single event from a file?

williamd67 · 2020-05-06T10:50:09Z

I am not sure I completely understand your request. Is merge-json is a kind of multiline? If you would like to parse JSON why not use decode_json_fields processor in this case?

Thanks for investigating this topic. The kind merge-json is to create json as output so it will combine the found number of lines in a json-array event in stead of single concatenated event. This could be handy in case the lines represent single fields like a database-table dump. So it does not refer to the input lines.

williamd67 · 2020-05-06T10:51:55Z

If you configure merge, do you still need the pattern-based multiline aggregation as well? Or you just want to read every N lines into a single event from a file?

In theory you could use the pattern as well but in practice i would expect that it just reads every N lines into a single event from a file, so we could remove/hard-code the other parameters as that would make the usage clearer and simpler.

kvch · 2020-05-07T12:50:00Z

I have opened this PR to add a new mode to multiline reader to aggregate N lines: #18352

With the following configuration you can aggregate 5 lines and parse the JSON:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
  multiline.type: count
  multiline.lines_count: 5

processors:
  - decode_json_fields:
      fields: ["message"]
      target: ""
      overwrite_keys: true

Does this solve your problem?

williamd67 · 2020-05-11T07:35:11Z

Thanks for your PR. I like your improvement count over merge. I would keep max_lines as it is familiar for people although count_lines describes the purpose better so I would be happy with both. The json is not for the reader but for the writer to concatenate the different lines into a single json-array. I expect that your configuration will not work as the concatenated lines will not be correct json. I will test it as well.
I think the change can be much smaller and I will create a PR (based on your PR) as soon as I have some time.

kvch · 2020-05-11T12:15:48Z

I would keep max_lines as it is familiar for people although count_lines describes the purpose better so I would be happy with both.

I introduced a new option because max_lines does not describe the feature exactly. It implies that the number of lines might be smaller than the value configured in that option. count_lines express that number of lines must be the same always.

The json is not for the reader but for the writer to concatenate the different lines into a single json-array. I expect that your configuration will not work as the concatenated lines will not be correct json.

I am not sure why it does not fit your use case. Could you please share a few example logs so I can understand it?

williamd67 · 2020-05-11T15:56:37Z

I am not sure why it does not fit your use case. Could you please share a few example logs so I can understand it?

I will test it first and in case it fails I will give you some examples.

I created in my own space a changelist that contains an implementation with less changes. My changelist is based on this changelist so comparing should be straight-forward. It does reuse the current implementation of multiline so in case that is not preferred the implementation of this PR can be used. I also fixed the go-test and python-test.

kvch · 2020-05-12T10:56:11Z

Your approach leads to a smaller changeset. However, I do not want to add more complexity to the already pretty complicated pattern-based matcher of the multiline reader. So I would rather go with my own solution. I hope that is fine with you. :)

I am looking forward to seeing the results of your tests.

williamd67 · 2020-05-12T12:32:23Z

I hope that is fine with you. :)

Absolutely. I will test and let you know the results when I have some time.

williamd67 · 2020-05-13T14:32:26Z

Hi, I tested your change and the concatenating of the lines works fine. Thanks.

As expected does the json-decoder fail. This is caused by the fact that the lines are concatenated with a new-line character, and even if you would replace that by a , I expect it to fail as the brackets {} or [] are missing. Find below some sample input-files:

foo.1.log
foo.2.log
foo.3.log

and the filebeat.yml:

filebeat.yml.txt

As the extension json will make the understanding of the feature more complex and we can do without it by post-processing the events using the new-line as separator I am fine with only this improvement.

kvch · 2020-05-21T14:17:33Z

I added a new option skip_newline. If you set it to true, newline character is not added to the concatenated lines.

#18352) ## What does this PR do? This PR adds a new mode for the multiline reader of Libbeat (exposed in Filebeat). The new mode lets users to aggregate the configured number of lines into a single event. Example configuration to aggregate 5 lines: ```yaml muliline.type: count multiline.count_lines: 5 ``` This PR also adds a new configuration option `skip_newline`. If set, Filebeat does not add a newline when two events are concatenated. Closes #18038

elastic#18352) ## What does this PR do? This PR adds a new mode for the multiline reader of Libbeat (exposed in Filebeat). The new mode lets users to aggregate the configured number of lines into a single event. Example configuration to aggregate 5 lines: ```yaml muliline.type: count multiline.count_lines: 5 ``` This PR also adds a new configuration option `skip_newline`. If set, Filebeat does not add a newline when two events are concatenated. Closes elastic#18038 (cherry picked from commit e3f51ab)

…ate constant number of lines (#19243) * Add new mode to multiline reader to aggregate constant number of lines (#18352) ## What does this PR do? This PR adds a new mode for the multiline reader of Libbeat (exposed in Filebeat). The new mode lets users to aggregate the configured number of lines into a single event. Example configuration to aggregate 5 lines: ```yaml muliline.type: count multiline.count_lines: 5 ``` This PR also adds a new configuration option `skip_newline`. If set, Filebeat does not add a newline when two events are concatenated. Closes #18038 (cherry picked from commit e3f51ab)

williamd67 · 2020-09-30T11:05:39Z

Thanks all. I just integrated filebeat 7.9.x version (which contains this change) in our system and it works like a charm. Thanks again.

elastic#18352) ## What does this PR do? This PR adds a new mode for the multiline reader of Libbeat (exposed in Filebeat). The new mode lets users to aggregate the configured number of lines into a single event. Example configuration to aggregate 5 lines: ```yaml muliline.type: count multiline.count_lines: 5 ``` This PR also adds a new configuration option `skip_newline`. If set, Filebeat does not add a newline when two events are concatenated. Closes elastic#18038

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 28, 2020

andresrc added [zube]: Inbox Team:Services (Deprecated) Label for the former Integrations-Services team labels Apr 28, 2020

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 28, 2020

andresrc added enhancement needs_team Indicates that the issue/PR needs a Team:* label labels Apr 28, 2020

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 28, 2020

andresrc changed the title ~~Filebeat: multiline: introduce merge by using max-lines as condition in stead of pattern~~ Filebeat: multiline: introduce merge by using max-lines as condition instead of pattern May 4, 2020

andresrc added [zube]: Investigate and removed [zube]: Inbox labels May 4, 2020

kvch self-assigned this May 5, 2020

kvch added [zube]: In Progress and removed [zube]: Investigate labels May 6, 2020

kvch mentioned this issue May 7, 2020

Add new mode to multiline reader to aggregate constant number of lines #18352

Merged

6 tasks

kvch closed this as completed in #18352 Jun 17, 2020

zube bot added [zube]: Done and removed [zube]: In Progress labels Jun 17, 2020

kvch mentioned this issue Jun 17, 2020

Cherry-pick #18352 to 7.x: Add new mode to multiline reader to aggregate constant number of lines #19243

Merged

6 tasks

andresrc removed the [zube]: Done label Jun 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filebeat: multiline: introduce merge by using max-lines as condition instead of pattern #18038

Filebeat: multiline: introduce merge by using max-lines as condition instead of pattern #18038

williamd67 commented Apr 28, 2020

elasticmachine commented Apr 28, 2020

kvch commented May 5, 2020

williamd67 commented May 6, 2020

williamd67 commented May 6, 2020

kvch commented May 7, 2020

williamd67 commented May 11, 2020 •

edited

Loading

kvch commented May 11, 2020

williamd67 commented May 11, 2020 •

edited

Loading

kvch commented May 12, 2020

williamd67 commented May 12, 2020

williamd67 commented May 13, 2020 •

edited

Loading

kvch commented May 21, 2020

williamd67 commented Sep 30, 2020 •

edited

Loading

Filebeat: multiline: introduce merge by using max-lines as condition instead of pattern #18038

Filebeat: multiline: introduce merge by using max-lines as condition instead of pattern #18038

Comments

williamd67 commented Apr 28, 2020

elasticmachine commented Apr 28, 2020

kvch commented May 5, 2020

williamd67 commented May 6, 2020

williamd67 commented May 6, 2020

kvch commented May 7, 2020

williamd67 commented May 11, 2020 • edited Loading

kvch commented May 11, 2020

williamd67 commented May 11, 2020 • edited Loading

kvch commented May 12, 2020

williamd67 commented May 12, 2020

williamd67 commented May 13, 2020 • edited Loading

kvch commented May 21, 2020

williamd67 commented Sep 30, 2020 • edited Loading

williamd67 commented May 11, 2020 •

edited

Loading

williamd67 commented May 11, 2020 •

edited

Loading

williamd67 commented May 13, 2020 •

edited

Loading

williamd67 commented Sep 30, 2020 •

edited

Loading