Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat: multiline: introduce merge by using max-lines as condition instead of pattern #18038

Closed
williamd67 opened this issue Apr 28, 2020 · 13 comments · Fixed by #18352
Closed
Assignees
Labels
enhancement Team:Services (Deprecated) Label for the former Integrations-Services team

Comments

@williamd67
Copy link
Contributor

Describe the enhancement:
Once in a while people like to merge messages into a single line not based on a pattern but based on the number of lines that have to be merged. This may be caused by not having a clear usable pattern or by just wanting to reduce the number of lines in a message by combining several. There are situations that it may also be handy to combine the lines into a JSON-array that can be used by other applications.

I propose to introduce an extra multiline parameter kind that distinguishes this behavior. Of course all the other parameters are still valid so in theory you can combine the pattern and the max_lines parameters. Although in practice I do not expect that.

The values of the kind parameter would be <<empty>> (default and current implementation), merge, and merge-json, where merge-json will combine the messages in a JSON-array.

Describe a specific use case for the enhancement or feature:
It is when you know the number of lines of an event but there is no clear pattern.

Per example someone has dumped a database table one field per line. In that case you know the number of lines for a row (= number of columns) but creating a pattern for that may be hard. In this situation the configuration can be as follows:

multiline.kind: "merge"
multiline.pattern: ".*"
multiline.match: "before"
multiline.negate: false
multiline.max_lines: 13

where 13 is the number of columns in a row. This will create a single event for a single row. In case you would choose merge-json they would be combined in one JSON-array.

Another use-case is that someone just want to group a set of events that are similar. Per example the application is creating a lot of events and you want to put them in buckets of 300 each so that you can handle such group as a single event. In that case the configuration can be as follows:

multiline.kind: "merge"
multiline.pattern: ".*"
multiline.match: "before"
multiline.negate: false
multiline.max_lines: 300

A side-effect of the merge and merge-json options are that there are no lines discarded.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 28, 2020
@andresrc andresrc added [zube]: Inbox Team:Services (Deprecated) Label for the former Integrations-Services team labels Apr 28, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 28, 2020
@andresrc andresrc added enhancement needs_team Indicates that the issue/PR needs a Team:* label labels Apr 28, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 28, 2020
@andresrc andresrc changed the title Filebeat: multiline: introduce merge by using max-lines as condition in stead of pattern Filebeat: multiline: introduce merge by using max-lines as condition instead of pattern May 4, 2020
@kvch kvch self-assigned this May 5, 2020
@kvch
Copy link
Contributor

kvch commented May 5, 2020

I am not sure I completely understand your request. Is merge-json is a kind of multiline? If you would like to parse JSON why not use decode_json_fields processor in this case?

If you configure merge, do you still need the pattern-based multiline aggregation as well? Or you just want to read every N lines into a single event from a file?

@williamd67
Copy link
Contributor Author

I am not sure I completely understand your request. Is merge-json is a kind of multiline? If you would like to parse JSON why not use decode_json_fields processor in this case?

Thanks for investigating this topic. The kind merge-json is to create json as output so it will combine the found number of lines in a json-array event in stead of single concatenated event. This could be handy in case the lines represent single fields like a database-table dump. So it does not refer to the input lines.

@williamd67
Copy link
Contributor Author

If you configure merge, do you still need the pattern-based multiline aggregation as well? Or you just want to read every N lines into a single event from a file?

In theory you could use the pattern as well but in practice i would expect that it just reads every N lines into a single event from a file, so we could remove/hard-code the other parameters as that would make the usage clearer and simpler.

@kvch
Copy link
Contributor

kvch commented May 7, 2020

I have opened this PR to add a new mode to multiline reader to aggregate N lines: #18352

With the following configuration you can aggregate 5 lines and parse the JSON:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
  multiline.type: count
  multiline.lines_count: 5

processors:
  - decode_json_fields:
      fields: ["message"]
      target: ""
      overwrite_keys: true

Does this solve your problem?

@williamd67
Copy link
Contributor Author

williamd67 commented May 11, 2020

Thanks for your PR. I like your improvement count over merge. I would keep max_lines as it is familiar for people although count_lines describes the purpose better so I would be happy with both. The json is not for the reader but for the writer to concatenate the different lines into a single json-array. I expect that your configuration will not work as the concatenated lines will not be correct json. I will test it as well.
I think the change can be much smaller and I will create a PR (based on your PR) as soon as I have some time.

@kvch
Copy link
Contributor

kvch commented May 11, 2020

I would keep max_lines as it is familiar for people although count_lines describes the purpose better so I would be happy with both.

I introduced a new option because max_lines does not describe the feature exactly. It implies that the number of lines might be smaller than the value configured in that option. count_lines express that number of lines must be the same always.

The json is not for the reader but for the writer to concatenate the different lines into a single json-array. I expect that your configuration will not work as the concatenated lines will not be correct json.

I am not sure why it does not fit your use case. Could you please share a few example logs so I can understand it?

@williamd67
Copy link
Contributor Author

williamd67 commented May 11, 2020

I am not sure why it does not fit your use case. Could you please share a few example logs so I can understand it?

I will test it first and in case it fails I will give you some examples.

I created in my own space a changelist that contains an implementation with less changes. My changelist is based on this changelist so comparing should be straight-forward. It does reuse the current implementation of multiline so in case that is not preferred the implementation of this PR can be used. I also fixed the go-test and python-test.

@kvch
Copy link
Contributor

kvch commented May 12, 2020

Your approach leads to a smaller changeset. However, I do not want to add more complexity to the already pretty complicated pattern-based matcher of the multiline reader. So I would rather go with my own solution. I hope that is fine with you. :)

I am looking forward to seeing the results of your tests.

@williamd67
Copy link
Contributor Author

I hope that is fine with you. :)

Absolutely. I will test and let you know the results when I have some time.

@williamd67
Copy link
Contributor Author

williamd67 commented May 13, 2020

Hi, I tested your change and the concatenating of the lines works fine. Thanks.

As expected does the json-decoder fail. This is caused by the fact that the lines are concatenated with a new-line character, and even if you would replace that by a , I expect it to fail as the brackets {} or [] are missing. Find below some sample input-files:

foo.1.log
foo.2.log
foo.3.log

and the filebeat.yml:

filebeat.yml.txt

As the extension json will make the understanding of the feature more complex and we can do without it by post-processing the events using the new-line as separator I am fine with only this improvement.

@kvch
Copy link
Contributor

kvch commented May 21, 2020

I added a new option skip_newline. If you set it to true, newline character is not added to the concatenated lines.

kvch added a commit that referenced this issue Jun 17, 2020
#18352)

## What does this PR do?

This PR adds a new mode for the multiline reader of Libbeat (exposed in Filebeat). The new mode lets users to aggregate the configured number of lines into a single event.

Example configuration to aggregate 5 lines:
```yaml
muliline.type: count
multiline.count_lines: 5
```

This PR also adds a new configuration option `skip_newline`. If set, Filebeat does not add a newline when two events are concatenated.

Closes #18038
kvch added a commit to kvch/beats that referenced this issue Jun 17, 2020
elastic#18352)

## What does this PR do?

This PR adds a new mode for the multiline reader of Libbeat (exposed in Filebeat). The new mode lets users to aggregate the configured number of lines into a single event.

Example configuration to aggregate 5 lines:
```yaml
muliline.type: count
multiline.count_lines: 5
```

This PR also adds a new configuration option `skip_newline`. If set, Filebeat does not add a newline when two events are concatenated.

Closes elastic#18038
(cherry picked from commit e3f51ab)
kvch added a commit that referenced this issue Jun 17, 2020
…ate constant number of lines (#19243)

* Add new mode to multiline reader to aggregate constant number of lines (#18352)

## What does this PR do?

This PR adds a new mode for the multiline reader of Libbeat (exposed in Filebeat). The new mode lets users to aggregate the configured number of lines into a single event.

Example configuration to aggregate 5 lines:
```yaml
muliline.type: count
multiline.count_lines: 5
```

This PR also adds a new configuration option `skip_newline`. If set, Filebeat does not add a newline when two events are concatenated.

Closes #18038
(cherry picked from commit e3f51ab)
@williamd67
Copy link
Contributor Author

williamd67 commented Sep 30, 2020

Thanks all. I just integrated filebeat 7.9.x version (which contains this change) in our system and it works like a charm. Thanks again.

melchiormoulin pushed a commit to melchiormoulin/beats that referenced this issue Oct 14, 2020
elastic#18352)

## What does this PR do?

This PR adds a new mode for the multiline reader of Libbeat (exposed in Filebeat). The new mode lets users to aggregate the configured number of lines into a single event.

Example configuration to aggregate 5 lines:
```yaml
muliline.type: count
multiline.count_lines: 5
```

This PR also adds a new configuration option `skip_newline`. If set, Filebeat does not add a newline when two events are concatenated.

Closes elastic#18038
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Team:Services (Deprecated) Label for the former Integrations-Services team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants