Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Support multiline options in aws s3 input #25249

Closed
andrewkroh opened this issue Apr 22, 2021 · 8 comments · Fixed by #25710 or #25873
Closed

[Filebeat] Support multiline options in aws s3 input #25249

andrewkroh opened this issue Apr 22, 2021 · 8 comments · Fixed by #25710 or #25873

Comments

@andrewkroh
Copy link
Member

Describe the enhancement:

When reading log files from S3 users should be able to specify the same multiline options that are available with the log input.

Describe a specific use case for the enhancement or feature:

Reading XML based Windows event logs from S3 that are newline delimited, but the XML itself contains strings with newlines. So in order to get one full XML object we need the multiline reader options.

Ideally config like this would work:

- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/foo/queue
  credential_profile_name: beats
  multiline.pattern: ^\<Event\>
  multiline.negate: true
  multiline.match: after
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 22, 2021
@exekias
Copy link
Contributor

exekias commented Apr 26, 2021

@kvch @urso I saw #24763 🚀 , I wonder if this should be the way to go for all inputs wanting to support parsing? Your guidance here would be good

@urso
Copy link

urso commented Apr 26, 2021

@kvch @urso I saw #24763 🚀 , I wonder if this should be the way to go for all inputs wanting to support parsing? Your guidance here would be good

Yes, we are still improving this on the filestream input. The filestream input is supposed to supersede the current logs input, as it takes on many issues of the logs input (integrations will need to be updated), especially in conjunction with k8s autodiscovery.

The parsing in the logs input was not as 'clean', so we still have to jump through some hoops to make it work properly in a generic fashion. Unfortunately we can't use these parsers via processors, for differrent reasons (e.g. processors can't be stateful), but we're thinking to move the syslog parsing from the syslog input into a parser as well, and provide the parser settings to multiple inputs. This will allow users to mix different levels of multiline, json, syslog at will. You have multiline logs embedded in syslog, embedded in a json log file... no problem :P

@exekias
Copy link
Contributor

exekias commented Apr 26, 2021

That's great to hear. For this case, this input is not based on a file so we would need to update the code to use these parsers. Does that make sense?

@kvch
Copy link
Contributor

kvch commented Apr 28, 2021

I have updated this issue with the current state: #16137

@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 29, 2021
leehinman added a commit to leehinman/beats that referenced this issue May 14, 2021
- only applies to non JSON logstash

Closes elastic#25249
leehinman added a commit that referenced this issue May 17, 2021
* Add multiline support to awss3 input

- only applies to non JSON logs

Closes #25249

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
leehinman added a commit that referenced this issue May 17, 2021
* Add multiline support to awss3 input

- only applies to non JSON logs

Closes #25249

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
(cherry picked from commit 5f242e3)
@leehinman leehinman reopened this May 18, 2021
@leehinman
Copy link
Contributor

Need to re-work to make use of parsers See https://github.com/elastic/beats/tree/master/filebeat/input/filestream for example. This is just for the non-JSON part of awss3 input.

cachedout pushed a commit that referenced this issue May 18, 2021
* Add multiline support to awss3 input

- only applies to non JSON logs

Closes #25249

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
(cherry picked from commit 5f242e3)
leehinman added a commit that referenced this issue May 18, 2021
* Add multiline support to awss3 input

- only applies to non JSON logs

Closes #25249

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
(cherry picked from commit 5f242e3)

Co-authored-by: Lee Hinman <57081003+leehinman@users.noreply.github.com>
@dnz-wseabrook
Copy link

Just noting this is something we're looking for too

@andrewkroh
Copy link
Member Author

andrewkroh commented Jun 23, 2021

This feature has been implemented for 7.14.0. There's a refactoring PR open that could change the config format.

leehinman added a commit to leehinman/beats that referenced this issue Jun 29, 2021
- switches multiline configuration to parsers
- JSON parsing is independent

Closes elastic#25249
leehinman added a commit that referenced this issue Jun 29, 2021
…25873)

* change multiline configuration in awss3 input to parsers

- switches multiline configuration to parsers
- JSON parsing is independent

Closes #25249
mergify bot pushed a commit that referenced this issue Jun 29, 2021
…25873)

* change multiline configuration in awss3 input to parsers

- switches multiline configuration to parsers
- JSON parsing is independent

Closes #25249

(cherry picked from commit beaa972)
leehinman added a commit that referenced this issue Jun 29, 2021
…25873) (#26586)

* change multiline configuration in awss3 input to parsers

- switches multiline configuration to parsers
- JSON parsing is independent

Closes #25249

(cherry picked from commit beaa972)

Co-authored-by: Lee Hinman <57081003+leehinman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants