-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement commitlog rotation as well as writing out, cleaning up, and bootstrapping using snapshot metadata files #1170
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1170 +/- ##
========================================
+ Coverage 70.7% 70.7% +<.1%
========================================
Files 822 822
Lines 70248 70220 -28
========================================
- Hits 49707 49689 -18
+ Misses 17313 17301 -12
- Partials 3228 3230 +2
Continue to review full report at Codecov.
|
d7edc16
to
5327dca
Compare
a3d618a
to
b7560da
Compare
5052c4e
to
a501f3d
Compare
a501f3d
to
d719816
Compare
return nil, err | ||
// Assert that the snapshot metadata files are indeed sorted. | ||
lastMetadataIndex := int64(-1) | ||
for _, snapshotMetadata := range sortedSnapshotMetadatas { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? Seems like we should just trust the interface that we got sorted snapshots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary, but I've been trying to add more assertions like this throughout the codebase in situations where they are cheap to make our lives easier
src/dbnode/storage/cleanup.go
Outdated
"github.com/uber-go/tally" | ||
) | ||
|
||
type commitLogFilesFn func(commitlog.Options) ([]commitlog.File, []commitlog.ErrorWithPath, error) | ||
type commitLogFilesFn func(commitlog.Options) (persist.CommitlogFiles, []commitlog.ErrorWithPath, error) | ||
type sortedSnapshotMetadataFilesFn func(fs.Options) ([]fs.SnapshotMetadata, []fs.SnapshotMetadataErrorWithPaths, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to get back the snapshot metadata files sorted? In the logic below, it seems necessary to just have the list of files and know which one is the most recent one, which seems more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm....thats fair, let me see if I can rework it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I like the logic where its sorted, but maybe instead of checking if its sorted and assuming that in the interface I can just re-sort cause it should be cheap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok updated, let me know what you think
} | ||
|
||
// mark data flush finished | ||
multiErr = multiErr.Add(flush.DoneData()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this logic? Seems like multiErr.Add
handles the nil case for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it just felt easier to follow to me if the error path is clearly delimited
return err | ||
} | ||
|
||
m.setState(flushManagerSnapshotInProgress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you set back the state after the snapshot is done somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it will just remain in that state until the index flush begins which will cause a state transition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we verify that state is flushManagerSnapshotInProgress
when the index flush starts? Or does that happen in the method that calls this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We set it to flushManagerIndexFLushInProgress
right before we do the index flush:
m.setState(flushManagerIndexFlushInProgress)
for _, ns := range namespaces {
var (
indexOpts = ns.Options().IndexOptions()
indexEnabled = indexOpts.Enabled()
)
if !indexEnabled {
continue
}
multiErr = multiErr.Add(ns.FlushIndex(indexFlush))
}
multiErr = multiErr.Add(indexFlush.DoneIndex())
d719816
to
65ff94c
Compare
79a84b7
to
694ac35
Compare
// The commit log block size. | ||
BlockSize time.Duration `yaml:"blockSize" validate:"nonzero"` | ||
// Deprecated. Left in struct to keep old YAMLs parseable. | ||
DeprecatedBlockSize *time.Duration `yaml:"blockSize"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, good call - just wondering how we should track some of this stuff so we remove at v1.
Should we get some labels and/or test cases that can be discovered by scanning the repo easily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about just: TODO(v1): remove
// improve and simplify the commitlog bootstrapping logic. This is fine | ||
// because this integration test protects against performance regressions | ||
// not correctness. | ||
t.SkipNow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool, can we open an issue for this and link to it from here? Just so we're tracking it as an issue too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created a master task with all commitlog rotation / snapshotting remaining work: #1383
for ; ; newIndex++ { | ||
var ( | ||
prefix = opts.FilesystemOptions().FilePathPrefix() | ||
filePath = fs.CommitlogFilePath(prefix, time.Unix(0, 0), newIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make time.Unix(0,0)
a var at the top of this file and put a comment perhaps next to where you use it about why it's no longer required to be any value and can be zero?
var timeNone = time.Unix(0,0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, done. In the future we can just remove it entirely which I'm pretty sure is backwards compatible (because we just list all the files in the directory then read their heads and ignore their filenames now) but I didn't want to tackle it in this P.R
logInfo.Start = dec.decodeVarint() | ||
logInfo.Duration = dec.decodeVarint() | ||
|
||
// Deprecated, have to decode anyways for backwards compatibility, but we ignore the values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, we can remove the fields and just call dec.decodeVarint()
twice here possibly. Thoughts on that approach instead? Not too bullish on either, but seems a little cleaner to remove the fields from the LogInfo struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep them in because the structs actually make for good ad-hoc documentation of the file format on disk (we keep all the fields in order)
src/dbnode/storage/cleanup.go
Outdated
} | ||
|
||
return false | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm should this not be return finalErr
or just return
?
Can we add a test that makes sure errors truly come back if the multiErr
is not nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually handled by the defer
statement above which will reset the finalErr
var to any errors in the multiErr
. I can change it to return finalErr
to make it less confusing and I'll make sure we have a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I just verified existing logic was correct but I added a test case for it too
694ac35
to
9f03967
Compare
Moved to: #1384