-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dbnode] Optimize filesetFiles
function
#3900
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3900 +/- ##
========================================
- Coverage 57.0% 56.6% -0.5%
========================================
Files 553 553
Lines 63639 63275 -364
========================================
- Hits 36297 35827 -470
- Misses 24136 24238 +102
- Partials 3206 3210 +4
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Though would be good to get another review by someone else before merging
result := make([]filesetFile, len(matched)) | ||
for i, file := range matched { | ||
blockStart, volume, err := fn(file) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
result[i] = filesetFile{ | ||
fileName: file, | ||
blockStart: blockStart, | ||
volumeIndex: volume, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
result := make([]filesetFile, len(matched)) | |
for i, file := range matched { | |
blockStart, volume, err := fn(file) | |
if err != nil { | |
return nil, err | |
} | |
result[i] = filesetFile{ | |
fileName: file, | |
blockStart: blockStart, | |
volumeIndex: volume, | |
} | |
} | |
result := make([]filesetFile, 0, len(matched)) | |
for _, file := range matched { | |
blockStart, volume, err := fn(file) | |
if err != nil { | |
return nil, err | |
} | |
result = append(result, filesetFile{ | |
fileName: file, | |
blockStart: blockStart, | |
volumeIndex: volume, | |
}) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you care to share the benchmark code (would be enough to put it in PR description)?
I am surprised that the improvement is only 30%. What was the sample size for benchmarking?
So initially I was benchmarking with only 100 files so the performance gain was quite low. Tested with more files and most of the time it is ~2x improvement.
//old |
This is benchmarking creation of the files on disk. You need to call |
🤦🏻 Thanks for spotting it. Updated results:
Still ~2x improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some comments.
if len(matched) == 0 { | ||
return nil, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this if
is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to change semantics on what is returned from this function. Previously nil
would be returned if filepath.Glob
returns nil
. Without this if
check we would be returning empty slice which is not the same thing as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findSortedFilesetFiles
is a new function so I guess there can be no existing semantics for it.
And for the filesetFiles
, there is still an if
for this purpose:
https://github.com/m3db/m3/pull/3900/files#diff-78d9cf687193bca4cdc4ae73e54059f6a3b3ab4360a627c4bf26c070b8a0f909R1340
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findSortedFilesetFiles
replaced findFiles
(which is still present) so its semantics were based on it so that these both functions could be used interchangeably if needed. In Go returning nil slices is quite common and I don't see problems with this approach here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But findFiles
does not have such check, it returns whatever is returned by filepath.Glob
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, and filepath.Glob
returns nil
if it founds nothing. So findSortedFilesetFiles
should also return nil
when it founds nothing. I could have probably checked if matched == nil
but since filepath.Glob
never returns empty slice, there is no difference here.
What this PR does / why we need it:
For namespaces with long retentions huge CPU spikes could emerge during node bootstrap. Looking at the CPU profile, we can see that
filesetFiles
function is not very efficient.This PR optimizes
filesetFiles
function by reducing the amount of work it needs to do. It now collectsblockStart
andvolumeIndex
fields and later reuse them during sorting and further iterations (without a need to parse them again).I've written a small benchmark to measure the new implementation (new implementation is ~ 30% faster):
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: