-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filebeat binary and sparse file handling #5179
Comments
Right now in case the size changes of a file and it triggers reading a file again, in the debug log its not visibile what the new size is. This should make detecting issues like elastic#5179 easier.
Right now in case the size changes of a file and it triggers reading a file again, in the debug log its not visibile what the new size is. This should make detecting issues like #5179 easier.
Binary files can have different formats and normally require special parsing support in order to be handled correctly. This rather calls for a special filebeat input type. We for example introduced journalbeat since this issue has been opened. The log input should not be used to ship binary files at all. I'm closing this issue, due to lack of updates. If you think binaries is still a problem, please comment and we can reopen. It would also be interesting to learn what you would expect how filebeat handles those files. |
@urso this wasn't for reading those files, rather avoid reading them. If you configure a pattern that accidentally includes a large sparse file (pattern |
I see. This operation is not for free, as one needs to read some contents. Detecting text only is not fully reliable, due to different file encodings. Some might look like binary (e.g. a simple check looking for invalid bytes might missclassify UTF-16 or UTF-32). The harvester would need to do a decode+classify precheck. Plus we might want to cache the inode+path of files that look like binary. so we don't reopen and scan those files over and over again. A simple to use library to check for file types: github.com/gabriel-vasile/mimetype If we introduce a check, we should make it optional. Once can also apply a similar check if JSON encoding is enabled. The logs input only supports ndjson. By validating upfront we can fail more gracefully. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Filebeat needs to gracefully handle binary and sparse files. A sparse file like
/var/log/lastlog
file can eat up all of a systems memory and swap in a matter ~30 seconds if filebeat is monitoring such a file.With
sar
,journalctl
,lastlog
- these could easily cause issues if filebeat is configured to monitor files / folders under/var/log/*
.The text was updated successfully, but these errors were encountered: