-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza] support gzip compressed log files for file log receiver #33406
[pkg/stanza] support gzip compressed log files for file log receiver #33406
Conversation
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
…ed reader Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this PR together. It's very helpful towards understanding how this might look.
The main thing I'm wondering about is whether it's really necessary to have such a separate implementation. I see a lot of functionality that was written in a different way, but may not actually be necessary. For example, we currently track changes in non-compressed files a certain way. It's not yet clear to me why exactly we would need to change this to support gzip (or other appendable compression formats).
If I can ask for one major direction change - is it possible that we could just our current Reader
struct by changing its internal file
field to an io.Reader
? Then, when we build a Reader
for a non-compressed file, we assign an os.File
. And when we build a Reader
for a gzip file, we assign a gzip.Reader
.
It's possible this doesn't work for a reason I'm overlooking (maybe tokenization or offset updating) but I'd really like to start from this idea and clearly identify why it doesn't work so we can address those needs specifically without adding a lot of other changes.
} | ||
|
||
initalRead := true | ||
if lastReadFileSize, ok := r.FileAttributes["lastReadFileSize"]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to adding such an attribute but it should be proposed in another issue.
In another comment I question whether we need to track this for the sake of this implementation. If we don't, let's remove it. If we do, let's use an unexported field on the reader struct.
Thanks a lot for the detailed review @djaglowski! I agree that the number of modifications to make this work with gzip files should be as small as possible, so I'll look into how this can be best achieved today. The main challenge here though is to figure out how to track the current offset within a gzip compressed file, as it does not seem to be possible to start reading from an offset within such a file. However I might be missing something here, so I appreciate any ideas regarding how this could be done. Although, one question just to clarify: For this use case, i.e. ingesting logs from gzip compressed files, does it actually make sense to keep watching for incoming changes in the compressed file, or would it make sense to emit all the logs of a compressed file, as soon as we get a new one that matches the file name pattern? The reason i'm asking this is because I would assume that in a usual log file rotation workflow the compressed file is created at the end of a rotation period with the delta to the previous interval being stored, and not being updated inbetween. |
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
…eiver-compressed-files # Conflicts: # pkg/stanza/fileconsumer/file_test.go # pkg/stanza/go.mod
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
@djaglowski this should be ready for review now :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress on this. It's looking much cleaner.
I was trying to clarify my understanding of the changes so ended up pulling your branch and tinkering a bit. I came up with what seems like a simpler way to handle gzip readers and made a PR in bacherfl#1. Let me know what you think.
Leave internal reader nil if gzip EOF
Thanks a lot for the review and the suggestions @djaglowski! The changes you proposed with your PR all made sense to me, so I merged them into this branch now . |
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Move gzip internal reader creation into ReadToEnd
Description: This PR adds support for reading gzip compressed log files for the file log receiver. This is done by, if enabled via the
gzip_file_suffix
parameter, creating agzip.Reader
on top of the file handle of a compressed file.Link to tracking Issue: #2328
Testing: Added unit tests for the new functionality. Manually tested using the following configuration for the filelog receiver:
Documentation: Added documentation in the readme of the file log receiver