Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filebeat binary and sparse file handling #5179

Closed
jpcarey opened this issue Sep 14, 2017 · 4 comments
Closed

filebeat binary and sparse file handling #5179

jpcarey opened this issue Sep 14, 2017 · 4 comments
Labels
enhancement Filebeat Filebeat Stalled Team:Integrations Label for the Integrations team

Comments

@jpcarey
Copy link

jpcarey commented Sep 14, 2017

Filebeat needs to gracefully handle binary and sparse files. A sparse file like /var/log/lastlog file can eat up all of a systems memory and swap in a matter ~30 seconds if filebeat is monitoring such a file.

$ du -h /var/log/lastlog
76K     /var/log/lastlog

$ du -h --apparent-size lastlog
368G    lastlog

With sar, journalctl, lastlog - these could easily cause issues if filebeat is configured to monitor files / folders under /var/log/*.

@jpcarey jpcarey added the Filebeat Filebeat label Sep 14, 2017
@tsg tsg added the bug label Sep 15, 2017
ruflin added a commit to ruflin/beats that referenced this issue Sep 18, 2017
Right now in case the size changes of a file and it triggers reading a file again, in the debug log its not visibile what the new size is. This should make detecting issues like elastic#5179 easier.
tsg pushed a commit that referenced this issue Sep 25, 2017
Right now in case the size changes of a file and it triggers reading a file again, in the debug log its not visibile what the new size is. This should make detecting issues like #5179 easier.
@urso
Copy link

urso commented Dec 20, 2019

Binary files can have different formats and normally require special parsing support in order to be handled correctly. This rather calls for a special filebeat input type. We for example introduced journalbeat since this issue has been opened.

The log input should not be used to ship binary files at all.

I'm closing this issue, due to lack of updates. If you think binaries is still a problem, please comment and we can reopen. It would also be interesting to learn what you would expect how filebeat handles those files.

@urso urso closed this as completed Dec 20, 2019
@jpcarey
Copy link
Author

jpcarey commented Dec 20, 2019

@urso this wasn't for reading those files, rather avoid reading them. If you configure a pattern that accidentally includes a large sparse file (pattern /var/log/*), filebeat would open and try to read lines - which in this case would be the entire file into memory. In my example, the sparse file was /var/log/lastlog, which was 368G. Filebeat ended up consuming all the system's resources trying to read this file into memory.

@urso
Copy link

urso commented Dec 23, 2019

I see. This operation is not for free, as one needs to read some contents. Detecting text only is not fully reliable, due to different file encodings. Some might look like binary (e.g. a simple check looking for invalid bytes might missclassify UTF-16 or UTF-32). The harvester would need to do a decode+classify precheck. Plus we might want to cache the inode+path of files that look like binary. so we don't reopen and scan those files over and over again.

A simple to use library to check for file types: github.com/gabriel-vasile/mimetype

If we introduce a check, we should make it optional. Once can also apply a similar check if JSON encoding is enabled. The logs input only supports ndjson. By validating upfront we can fail more gracefully.

@urso urso reopened this Dec 23, 2019
@andresrc andresrc added Team:Integrations Label for the Integrations team and removed Team:Beats labels Mar 6, 2020
@botelastic
Copy link

botelastic bot commented Feb 4, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the Stalled label Feb 4, 2021
@botelastic botelastic bot closed this as completed Mar 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Filebeat Filebeat Stalled Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

4 participants