Re-use buffers to optimise memory allocation in fingerprint #36736

rdner · 2023-10-04T08:21:04Z

This dramatically drops the memory usage, particularly on large amount of files.

Benchmark results

Before
BenchmarkToFileDescriptor-10   764442     15849 ns/op   2688 B/op    12 allocs/op

After
BenchmarkToFileDescriptor-10   758116     15171 ns/op   416 B/op      8 allocs/op

CPU Profiles

Before

After

Memory Profiles

Before

After

Checklist

My code follows the style guidelines of this project
~~- [ ] I have commented my code, particularly in hard-to-understand areas~~
~~- [ ] I have made corresponding changes to the documentation~~
~~- [ ] I have made corresponding change to the default configuration files~~
~~- [ ] I have added tests that prove my fix is effective or that my feature works~~
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

go test -run=none -bench=".*ToFileDescriptor.*" -benchmem -benchtime=10s -memprofile profile.bin
go tool pprof -http localhost:9999 profile.bin

Related issues

Relates Add new fingerprint file identity #35734

This dramatically drops the memory usage, particularly on large amount of files.

elasticmachine · 2023-10-04T09:32:57Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Duration: 71 min 32 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

This dramatically drops the memory usage, particularly on large amount of files. (cherry picked from commit 429b38f)

…in fingerprint (#36738) * Re-use buffers to optimise memory allocation in fingerprint (#36736) This dramatically drops the memory usage, particularly on large amount of files. (cherry picked from commit 429b38f) * Fix changelog --------- Co-authored-by: Denis <denis.rechkunov@elastic.co>

…in fingerprint (#36739) * Re-use buffers to optimise memory allocation in fingerprint (#36736) This dramatically drops the memory usage, particularly on large amount of files. (cherry picked from commit 429b38f) * Fix changelog --------- Co-authored-by: Denis <denis.rechkunov@elastic.co>

rodrigc · 2023-10-11T04:02:58Z

@rdner can you give me a rough idea as to the order of magnitude improvement in memory usage of this patch?
In the images you posted, I see before: 1.97GB, and after 301MB.

Were similar data inputs used? An improvement of 5-6 times is a huge improvement.

Will this PR improve some of the issues described here:

rdner · 2023-10-11T14:46:32Z

@rodrigc I think it's more correct to compare numbers per operation, since the Go benchmarks here adjust the iteration count to run for 10 seconds.

So, it's 416 B/op against 2688 B/op, which is 6,5 times (646%).

The issue you linked (same twice?) is not using the filestream fingerprint mode, this optimisation affects only the filestream input and only when the new fingerprint file identity is used.

cmacknz · 2023-10-12T14:31:32Z

filebeat/input/filestream/fswatch.go

-		written, err := io.Copy(h, r)
+		s.hasher.Reset()
+		lr := io.LimitReader(file, s.cfg.Fingerprint.Length)
+		written, err := io.CopyBuffer(s.hasher, lr, s.readBuffer)


Do you need to reset the length of s.readBuffer before calling CopyBuffer? If s.cfg.Fingerprint.Length were to be made smaller there would still be data left in s.readBuffer from the previous read that is never cleared.

The CopyBuffer implementation does not clear the buffer before it copies https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/io/io.go;l=399

Do you need to reset the length of s.readBuffer before calling CopyBuffer?

No, because the buffer is created per file watcher (per prospector, eventually per filestream input).

beats/filebeat/input/filestream/input.go

Line 90 in e322104

prospector, err := newProspector(config)

beats/filebeat/input/filestream/prospector_creator.go

Lines 40 to 46 in e322104

func newProspector(config config) (loginp.Prospector, error) {

err := checkConfigCompatibility(config.FileWatcher, config.FileIdentity)

if err != nil {

return nil, err

}

filewatcher, err := newFileWatcher(config.Paths, config.FileWatcher)

If the input configuration changes (e.g. fingerprint size), the file watcher gets re-created with a new buffer size.

The CopyBuffer implementation does not clear the buffer before it copies

It's true but it does not matter since this is just a buffer and once Read returns some data it also returns amount of bytes written into the buffer and only this amount of bytes is used for Write in the destination Writer https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/io/io.go;l=432

I have tests that I have not changed in this PR and that would fail if the previous buffer value was re-used or buffer got corrupted in general:

beats/filebeat/input/filestream/fswatch_test.go

Lines 699 to 744 in e322104

{

name: "returns all files except too small to fingerprint",

cfgStr: `

scanner:

symlinks: true

recursive_glob: true

fingerprint:

enabled: true

offset: 0

length: 1024

`,

expDesc: map[string]loginp.FileDescriptor{

normalFilename: {

Filename: normalFilename,

Fingerprint: "2edc986847e209b4016e141a6dc8716d3207350f416969382d431539bf292e4a",

Info: testFileInfo{

size: sizes[normalFilename],

name: normalBasename,

},

},

excludedFilename: {

Filename: excludedFilename,

Fingerprint: "bd151321c3bbdb44185414a1b56b5649a00206dd4792e7230db8904e43987336",

Info: testFileInfo{

size: sizes[excludedFilename],

name: excludedBasename,

},

},

excludedIncludedFilename: {

Filename: excludedIncludedFilename,

Fingerprint: "bfdb99a65297062658c26dfcea816d76065df2a2da2594bfd9b96e9e405da1c2",

Info: testFileInfo{

size: sizes[excludedIncludedFilename],

name: excludedIncludedBasename,

},

},

travelerSymlinkFilename: {

Filename: travelerSymlinkFilename,

Fingerprint: "c4058942bffcea08810a072d5966dfa5c06eb79b902bf0011890dd8d22e1a5f8",

Info: testFileInfo{

size: sizes[travelerFilename],

name: travelerSymlinkBasename,

},

},

},

},

…36736) This dramatically drops the memory usage, particularly on large amount of files.

rdner added Filebeat Filebeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team backport-v8.10.0 Automated backport with mergify labels Oct 4, 2023

rdner self-assigned this Oct 4, 2023

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 4, 2023

Re-use buffers to optimise memory allocation in fingerprint

1562bfc

This dramatically drops the memory usage, particularly on large amount of files.

rdner force-pushed the optimise-fingerprint-memory branch from 6acd3c1 to 1562bfc Compare October 4, 2023 09:26

rdner changed the title ~~Re-use the SHA256 block to optimise memory allocation in fingerprint~~ Re-use buffers to optimise memory allocation in fingerprint Oct 4, 2023

Add changelog entry

9d446b1

rdner added the backport-7.17 Automated backport to the 7.17 branch with mergify label Oct 4, 2023

rdner requested review from belimawr and leehinman October 4, 2023 09:37

rdner marked this pull request as ready for review October 4, 2023 09:37

rdner requested a review from a team as a code owner October 4, 2023 09:37

belimawr approved these changes Oct 4, 2023

View reviewed changes

rdner merged commit 429b38f into elastic:main Oct 4, 2023
26 checks passed

rdner deleted the optimise-fingerprint-memory branch October 4, 2023 11:40

mergify bot pushed a commit that referenced this pull request Oct 4, 2023

Re-use buffers to optimise memory allocation in fingerprint (#36736)

03615d3

This dramatically drops the memory usage, particularly on large amount of files. (cherry picked from commit 429b38f)

mergify bot mentioned this pull request Oct 4, 2023

[7.17](backport #36736) Re-use buffers to optimise memory allocation in fingerprint #36738

Merged

mergify bot pushed a commit that referenced this pull request Oct 4, 2023

Re-use buffers to optimise memory allocation in fingerprint (#36736)

a02e814

This dramatically drops the memory usage, particularly on large amount of files. (cherry picked from commit 429b38f)

mergify bot mentioned this pull request Oct 4, 2023

[8.10](backport #36736) Re-use buffers to optimise memory allocation in fingerprint #36739

Merged

gizas mentioned this pull request Oct 4, 2023

Potential memory leak issue with filebeat and metricbeat #35796

Closed

cmacknz reviewed Oct 12, 2023

View reviewed changes

Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024

Re-use buffers to optimise memory allocation in fingerprint (elastic#…

f0f6571

…36736) This dramatically drops the memory usage, particularly on large amount of files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-use buffers to optimise memory allocation in fingerprint #36736

Re-use buffers to optimise memory allocation in fingerprint #36736

rdner commented Oct 4, 2023 •

edited

Loading

elasticmachine commented Oct 4, 2023 •

edited by jenkins-beats-ci bot

Loading

Build stats

rodrigc commented Oct 11, 2023

rdner commented Oct 11, 2023

cmacknz Oct 12, 2023

rdner Oct 12, 2023

	func newProspector(config config) (loginp.Prospector, error) {
	err := checkConfigCompatibility(config.FileWatcher, config.FileIdentity)
	if err != nil {
	return nil, err
	}

	filewatcher, err := newFileWatcher(config.Paths, config.FileWatcher)

	{
	name: "returns all files except too small to fingerprint",
	cfgStr: `
	scanner:
	symlinks: true
	recursive_glob: true
	fingerprint:
	enabled: true
	offset: 0
	length: 1024
	`,
	expDesc: map[string]loginp.FileDescriptor{
	normalFilename: {
	Filename: normalFilename,
	Fingerprint: "2edc986847e209b4016e141a6dc8716d3207350f416969382d431539bf292e4a",
	Info: testFileInfo{
	size: sizes[normalFilename],
	name: normalBasename,
	},
	},
	excludedFilename: {
	Filename: excludedFilename,
	Fingerprint: "bd151321c3bbdb44185414a1b56b5649a00206dd4792e7230db8904e43987336",
	Info: testFileInfo{
	size: sizes[excludedFilename],
	name: excludedBasename,
	},
	},
	excludedIncludedFilename: {
	Filename: excludedIncludedFilename,
	Fingerprint: "bfdb99a65297062658c26dfcea816d76065df2a2da2594bfd9b96e9e405da1c2",
	Info: testFileInfo{
	size: sizes[excludedIncludedFilename],
	name: excludedIncludedBasename,
	},
	},
	travelerSymlinkFilename: {
	Filename: travelerSymlinkFilename,
	Fingerprint: "c4058942bffcea08810a072d5966dfa5c06eb79b902bf0011890dd8d22e1a5f8",
	Info: testFileInfo{
	size: sizes[travelerFilename],
	name: travelerSymlinkBasename,
	},
	},
	},
	},

Re-use buffers to optimise memory allocation in fingerprint #36736

Re-use buffers to optimise memory allocation in fingerprint #36736

Conversation

rdner commented Oct 4, 2023 • edited Loading

Benchmark results

CPU Profiles

Before

After

Memory Profiles

Before

After

Checklist

How to test this PR locally

Related issues

elasticmachine commented Oct 4, 2023 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

❕ Flaky test report

🤖 GitHub comments

rodrigc commented Oct 11, 2023

rdner commented Oct 11, 2023

cmacknz Oct 12, 2023

Choose a reason for hiding this comment

rdner Oct 12, 2023

Choose a reason for hiding this comment

rdner commented Oct 4, 2023 •

edited

Loading

elasticmachine commented Oct 4, 2023 •

edited by jenkins-beats-ci bot

Loading