-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-use buffers to optimise memory allocation in fingerprint #36736
Conversation
This dramatically drops the memory usage, particularly on large amount of files.
6acd3c1
to
1562bfc
Compare
This dramatically drops the memory usage, particularly on large amount of files. (cherry picked from commit 429b38f)
This dramatically drops the memory usage, particularly on large amount of files. (cherry picked from commit 429b38f)
@rdner can you give me a rough idea as to the order of magnitude improvement in memory usage of this patch? Were similar data inputs used? An improvement of 5-6 times is a huge improvement. Will this PR improve some of the issues described here: |
@rodrigc I think it's more correct to compare numbers per operation, since the Go benchmarks here adjust the iteration count to run for 10 seconds. So, it's The issue you linked (same twice?) is not using the filestream fingerprint mode, this optimisation affects only the filestream input and only when the new fingerprint file identity is used. |
written, err := io.Copy(h, r) | ||
s.hasher.Reset() | ||
lr := io.LimitReader(file, s.cfg.Fingerprint.Length) | ||
written, err := io.CopyBuffer(s.hasher, lr, s.readBuffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to reset the length of s.readBuffer
before calling CopyBuffer
? If s.cfg.Fingerprint.Length
were to be made smaller there would still be data left in s.readBuffer
from the previous read that is never cleared.
The CopyBuffer implementation does not clear the buffer before it copies https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/io/io.go;l=399
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to reset the length of s.readBuffer before calling CopyBuffer?
No, because the buffer is created per file watcher (per prospector, eventually per filestream input).
beats/filebeat/input/filestream/input.go
Line 90 in e322104
prospector, err := newProspector(config) |
beats/filebeat/input/filestream/prospector_creator.go
Lines 40 to 46 in e322104
func newProspector(config config) (loginp.Prospector, error) { | |
err := checkConfigCompatibility(config.FileWatcher, config.FileIdentity) | |
if err != nil { | |
return nil, err | |
} | |
filewatcher, err := newFileWatcher(config.Paths, config.FileWatcher) |
If the input configuration changes (e.g. fingerprint size), the file watcher gets re-created with a new buffer size.
The CopyBuffer implementation does not clear the buffer before it copies
It's true but it does not matter since this is just a buffer and once Read
returns some data it also returns amount of bytes written into the buffer and only this amount of bytes is used for Write
in the destination Writer
https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/io/io.go;l=432
I have tests that I have not changed in this PR and that would fail if the previous buffer value was re-used or buffer got corrupted in general:
beats/filebeat/input/filestream/fswatch_test.go
Lines 699 to 744 in e322104
{ | |
name: "returns all files except too small to fingerprint", | |
cfgStr: ` | |
scanner: | |
symlinks: true | |
recursive_glob: true | |
fingerprint: | |
enabled: true | |
offset: 0 | |
length: 1024 | |
`, | |
expDesc: map[string]loginp.FileDescriptor{ | |
normalFilename: { | |
Filename: normalFilename, | |
Fingerprint: "2edc986847e209b4016e141a6dc8716d3207350f416969382d431539bf292e4a", | |
Info: testFileInfo{ | |
size: sizes[normalFilename], | |
name: normalBasename, | |
}, | |
}, | |
excludedFilename: { | |
Filename: excludedFilename, | |
Fingerprint: "bd151321c3bbdb44185414a1b56b5649a00206dd4792e7230db8904e43987336", | |
Info: testFileInfo{ | |
size: sizes[excludedFilename], | |
name: excludedBasename, | |
}, | |
}, | |
excludedIncludedFilename: { | |
Filename: excludedIncludedFilename, | |
Fingerprint: "bfdb99a65297062658c26dfcea816d76065df2a2da2594bfd9b96e9e405da1c2", | |
Info: testFileInfo{ | |
size: sizes[excludedIncludedFilename], | |
name: excludedIncludedBasename, | |
}, | |
}, | |
travelerSymlinkFilename: { | |
Filename: travelerSymlinkFilename, | |
Fingerprint: "c4058942bffcea08810a072d5966dfa5c06eb79b902bf0011890dd8d22e1a5f8", | |
Info: testFileInfo{ | |
size: sizes[travelerFilename], | |
name: travelerSymlinkBasename, | |
}, | |
}, | |
}, | |
}, |
…36736) This dramatically drops the memory usage, particularly on large amount of files.
This dramatically drops the memory usage, particularly on large amount of files.
Benchmark results
CPU Profiles
Before
After
Memory Profiles
Before
After
Checklist
- [ ] I have commented my code, particularly in hard-to-understand areas- [ ] I have made corresponding changes to the documentation- [ ] I have made corresponding change to the default configuration files- [ ] I have added tests that prove my fix is effective or that my feature worksCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
Related issues
fingerprint
file identity #35734