Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Optimize filesetFiles function #3900

Merged
merged 6 commits into from
Nov 9, 2021

Conversation

soundvibe
Copy link
Collaborator

What this PR does / why we need it:

For namespaces with long retentions huge CPU spikes could emerge during node bootstrap. Looking at the CPU profile, we can see that filesetFiles function is not very efficient.
image
This PR optimizes filesetFiles function by reducing the amount of work it needs to do. It now collects blockStart and volumeIndex fields and later reuse them during sorting and further iterations (without a need to parse them again).

I've written a small benchmark to measure the new implementation (new implementation is ~ 30% faster):

  • 230140 ns/op - new implementation
  • 324493 ns/op - old implementation

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:


Does this PR require updating code package or user-facing documentation?:


* master:
  Fix race when checking for dirty aggregations (#3886)
  [aggregator] Add test coverage to expireValues (#3898)
  [aggregator] Propagate cancellation through tick (#3895)
@codecov
Copy link

codecov bot commented Nov 4, 2021

Codecov Report

Merging #3900 (ad56b53) into master (513748e) will decrease coverage by 0.4%.
The diff coverage is 90.2%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #3900     +/-   ##
========================================
- Coverage    57.0%   56.6%   -0.5%     
========================================
  Files         553     553             
  Lines       63639   63275    -364     
========================================
- Hits        36297   35827    -470     
- Misses      24136   24238    +102     
- Partials     3206    3210      +4     
Flag Coverage Δ
aggregator 62.3% <ø> (-0.1%) ⬇️
cluster ∅ <ø> (∅)
collector 58.4% <ø> (ø)
dbnode 60.3% <90.2%> (-0.6%) ⬇️
m3em 46.4% <ø> (ø)
metrics 19.7% <ø> (ø)
msg 74.2% <ø> (-0.3%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 513748e...ad56b53. Read the comment docs.

Copy link
Collaborator

@vpranckaitis vpranckaitis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Though would be good to get another review by someone else before merging

src/dbnode/persist/fs/files.go Show resolved Hide resolved
Comment on lines +1273 to +1285
result := make([]filesetFile, len(matched))
for i, file := range matched {
blockStart, volume, err := fn(file)
if err != nil {
return nil, err
}

result[i] = filesetFile{
fileName: file,
blockStart: blockStart,
volumeIndex: volume,
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
result := make([]filesetFile, len(matched))
for i, file := range matched {
blockStart, volume, err := fn(file)
if err != nil {
return nil, err
}
result[i] = filesetFile{
fileName: file,
blockStart: blockStart,
volumeIndex: volume,
}
}
result := make([]filesetFile, 0, len(matched))
for _, file := range matched {
blockStart, volume, err := fn(file)
if err != nil {
return nil, err
}
result = append(result, filesetFile{
fileName: file,
blockStart: blockStart,
volumeIndex: volume,
})
}

Copy link
Collaborator

@linasm linasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you care to share the benchmark code (would be enough to put it in PR description)?
I am surprised that the improvement is only 30%. What was the sample size for benchmarking?

@soundvibe
Copy link
Collaborator Author

Would you care to share the benchmark code (would be enough to put it in PR description)? I am surprised that the improvement is only 30%. What was the sample size for benchmarking?

So initially I was benchmarking with only 100 files so the performance gain was quite low. Tested with more files and most of the time it is ~2x improvement.

func BenchmarkFilesetFiles(b *testing.B) {
	shard := uint32(0)
	numIters := 100
	dir := createDataFilesWithVolumeIndex(nil, dataDirName, testNs1ID, shard, numIters, true, dataFileSuffix, 0)
	defer os.RemoveAll(dir)

	for i := 0; i < b.N; i++ {
		files, err := filesetFiles(filesetFilesSelector{
			fileSetType:    persist.FileSetFlushType,
			contentType:    persist.FileSetDataContentType,
			filePathPrefix: dir,
			namespace:      testNs1ID,
			shard:          shard,
			pattern:        filesetFilePrefix + "*",
		})
		if err != nil {
			b.Fatal(err)
		}

		if len(files) == 0 {
			b.Fatal("no files found")
		}
	}
}

-test.benchtime 3s
// new
numIters := 1000
BenchmarkFilesetFiles-12 2124 1611317 ns/op
BenchmarkFilesetFiles-12 2108 1619529 ns/op
numIters := 10000
BenchmarkFilesetFiles-12 121 27584583 ns/op
BenchmarkFilesetFiles-12 116 26826172 ns/op
numIters := 20000
BenchmarkFilesetFiles-12 21 146918829 ns/op
BenchmarkFilesetFiles-12 18 188116477 ns/op

//old
numIters := 1000
BenchmarkFilesetFiles-12 1189 3033102 ns/op
BenchmarkFilesetFiles-12 1174 2936289 ns/op
numIters := 10000
BenchmarkFilesetFiles-12 55 56059259 ns/op
BenchmarkFilesetFiles-12 58 54876307 ns/op
numIters := 20000
BenchmarkFilesetFiles-12 14 233025020 ns/op
BenchmarkFilesetFiles-12 10 308521944 ns/op

@linasm
Copy link
Collaborator

linasm commented Nov 8, 2021

func BenchmarkFilesetFiles(b *testing.B) {
shard := uint32(0)
numIters := 100
dir := createDataFilesWithVolumeIndex(nil, dataDirName, testNs1ID, shard, numIters, true, dataFileSuffix, 0)
defer os.RemoveAll(dir)

This is benchmarking creation of the files on disk. You need to call b.ResetTimer() between benchmark setup and the actual benchmark loop.

@soundvibe
Copy link
Collaborator Author

func BenchmarkFilesetFiles(b *testing.B) {
shard := uint32(0)
numIters := 100
dir := createDataFilesWithVolumeIndex(nil, dataDirName, testNs1ID, shard, numIters, true, dataFileSuffix, 0)
defer os.RemoveAll(dir)

This is benchmarking creation of the files on disk. You need to call b.ResetTimer() between benchmark setup and the actual benchmark loop.

🤦🏻 Thanks for spotting it. Updated results:

numIters := 1000
--new
BenchmarkFilesetFiles-12    	    2260	   1600852 ns/op
BenchmarkFilesetFiles-12    	    2282	   1595209 ns/op
--old
BenchmarkFilesetFiles-12    	    1023	   2945657 ns/op
BenchmarkFilesetFiles-12    	    1142	   3255573 ns/op

numIters := 10000
--new
BenchmarkFilesetFiles-12    	     157	  22408915 ns/op
BenchmarkFilesetFiles-12    	     152	  22152958 ns/op
--old
BenchmarkFilesetFiles-12    	      76	  42065111 ns/op
BenchmarkFilesetFiles-12    	      76	  40777160 ns/op

numIters := 20000
--new
BenchmarkFilesetFiles-12    	      62	  51564310 ns/op
BenchmarkFilesetFiles-12    	      69	  48784302 ns/op
--old
BenchmarkFilesetFiles-12    	      30	 103220878 ns/op
BenchmarkFilesetFiles-12    	      30	 104825039 ns/op

Still ~2x improvement.

Copy link
Collaborator

@linasm linasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some comments.

src/dbnode/persist/fs/files.go Outdated Show resolved Hide resolved
Comment on lines +1238 to +1240
if len(matched) == 0 {
return nil, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this if is redundant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to change semantics on what is returned from this function. Previously nil would be returned if filepath.Glob returns nil. Without this if check we would be returning empty slice which is not the same thing as before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findSortedFilesetFiles is a new function so I guess there can be no existing semantics for it.
And for the filesetFiles, there is still an if for this purpose:
https://github.com/m3db/m3/pull/3900/files#diff-78d9cf687193bca4cdc4ae73e54059f6a3b3ab4360a627c4bf26c070b8a0f909R1340

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findSortedFilesetFiles replaced findFiles (which is still present) so its semantics were based on it so that these both functions could be used interchangeably if needed. In Go returning nil slices is quite common and I don't see problems with this approach here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But findFiles does not have such check, it returns whatever is returned by filepath.Glob...

Copy link
Collaborator Author

@soundvibe soundvibe Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, and filepath.Glob returns nil if it founds nothing. So findSortedFilesetFilesshould also return nil when it founds nothing. I could have probably checked if matched == nil but since filepath.Glob never returns empty slice, there is no difference here.

src/dbnode/persist/fs/files.go Outdated Show resolved Hide resolved
@soundvibe soundvibe merged commit 59ea90c into master Nov 9, 2021
@soundvibe soundvibe deleted the linasn/find-files-improve-perf branch November 9, 2021 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants