Implement commitlog rotation as well as writing out, cleaning up, and bootstrapping using snapshot metadata files #1170

richardartoul · 2018-11-14T16:55:45Z

Moved to: #1384

codecov · 2018-11-14T17:04:58Z

Codecov Report

Merging #1170 into master will increase coverage by <.1%.
The diff coverage is 83.3%.

@@           Coverage Diff            @@
##           master   #1170     +/-   ##
========================================
+ Coverage    70.7%   70.7%   +<.1%     
========================================
  Files         822     822             
  Lines       70248   70220     -28     
========================================
- Hits        49707   49689     -18     
+ Misses      17313   17301     -12     
- Partials     3228    3230      +2

Flag	Coverage Δ
#aggregator	`82.3% <ø> (-0.1%)`	⬇️
#cluster	`85.6% <ø> (ø)`	⬆️
#collector	`78.4% <ø> (ø)`	⬆️
#dbnode	`80.9% <83.3%> (ø)`	⬆️
#m3em	`73.2% <ø> (ø)`	⬆️
#m3ninx	`74.3% <ø> (ø)`	⬆️
#m3nsch	`51.1% <ø> (ø)`	⬆️
#metrics	`17.8% <ø> (ø)`	⬆️
#msg	`74.9% <ø> (ø)`	⬆️
#query	`64.2% <ø> (-0.1%)`	⬇️
#x	`76.2% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 024eb7c...694ac35. Read the comment docs.

src/dbnode/storage/cleanup.go

justinjc · 2019-01-15T20:21:11Z

src/dbnode/storage/cleanup.go

-		return nil, err
+	// Assert that the snapshot metadata files are indeed sorted.
+	lastMetadataIndex := int64(-1)
+	for _, snapshotMetadata := range sortedSnapshotMetadatas {


Is this necessary? Seems like we should just trust the interface that we got sorted snapshots.

not necessary, but I've been trying to add more assertions like this throughout the codebase in situations where they are cheap to make our lives easier

justinjc · 2019-01-15T20:28:57Z

src/dbnode/storage/cleanup.go

 	"github.com/uber-go/tally"
 )

-type commitLogFilesFn func(commitlog.Options) ([]commitlog.File, []commitlog.ErrorWithPath, error)
+type commitLogFilesFn func(commitlog.Options) (persist.CommitlogFiles, []commitlog.ErrorWithPath, error)
+type sortedSnapshotMetadataFilesFn func(fs.Options) ([]fs.SnapshotMetadata, []fs.SnapshotMetadataErrorWithPaths, error)


Do we really need to get back the snapshot metadata files sorted? In the logic below, it seems necessary to just have the list of files and know which one is the most recent one, which seems more efficient.

hmm....thats fair, let me see if I can rework it

actually I like the logic where its sorted, but maybe instead of checking if its sorted and assuming that in the interface I can just re-sort cause it should be cheap

ok updated, let me know what you think

src/dbnode/storage/flush.go

justinjc · 2019-01-15T20:44:36Z

src/dbnode/storage/flush.go

 	}

-	// mark data flush finished
-	multiErr = multiErr.Add(flush.DoneData())


Why change this logic? Seems like multiErr.Add handles the nil case for you.

it just felt easier to follow to me if the error path is clearly delimited

src/dbnode/storage/flush.go

justinjc · 2019-01-15T21:04:47Z

src/dbnode/storage/flush.go

+		return err
+	}
+
+	m.setState(flushManagerSnapshotInProgress)


Do you set back the state after the snapshot is done somewhere?

no, it will just remain in that state until the index flush begins which will cause a state transition

Do we verify that state is flushManagerSnapshotInProgress when the index flush starts? Or does that happen in the method that calls this one?

We set it to flushManagerIndexFLushInProgress right before we do the index flush:

m.setState(flushManagerIndexFlushInProgress) for _, ns := range namespaces { var ( indexOpts = ns.Options().IndexOptions() indexEnabled = indexOpts.Enabled() ) if !indexEnabled { continue } multiErr = multiErr.Add(ns.FlushIndex(indexFlush)) } multiErr = multiErr.Add(indexFlush.DoneIndex())

src/dbnode/storage/flush.go

src/dbnode/generated/proto/snapshot/snapshot_metadata.proto

robskillington · 2019-02-15T14:03:43Z

src/cmd/services/m3dbnode/config/config.go

-	// The commit log block size.
-	BlockSize time.Duration `yaml:"blockSize" validate:"nonzero"`
+	// Deprecated. Left in struct to keep old YAMLs parseable.
+	DeprecatedBlockSize *time.Duration `yaml:"blockSize"`


Interesting, good call - just wondering how we should track some of this stuff so we remove at v1.

Should we get some labels and/or test cases that can be discovered by scanning the repo easily?

how about just: TODO(v1): remove

robskillington · 2019-02-15T14:04:33Z

src/dbnode/integration/commitlog_bootstrap_only_reads_required_files_test.go

+	// improve and simplify the commitlog bootstrapping logic. This is fine
+	// because this integration test protects against performance regressions
+	// not correctness.
+	t.SkipNow()


Ok cool, can we open an issue for this and link to it from here? Just so we're tracking it as an issue too?

created a master task with all commitlog rotation / snapshotting remaining work: #1383

src/dbnode/persist/fs/commitlog/commit_log.go

robskillington · 2019-02-15T14:10:08Z

src/dbnode/persist/fs/commitlog/files.go

+	for ; ; newIndex++ {
+		var (
+			prefix   = opts.FilesystemOptions().FilePathPrefix()
+			filePath = fs.CommitlogFilePath(prefix, time.Unix(0, 0), newIndex)


Can you make time.Unix(0,0) a var at the top of this file and put a comment perhaps next to where you use it about why it's no longer required to be any value and can be zero?

var timeNone = time.Unix(0,0)

sure, done. In the future we can just remove it entirely which I'm pretty sure is backwards compatible (because we just list all the files in the directory then read their heads and ignore their filenames now) but I didn't want to tackle it in this P.R

src/dbnode/persist/fs/files.go

robskillington · 2019-02-15T14:14:44Z

src/dbnode/persist/fs/msgpack/decoder.go

-	logInfo.Start = dec.decodeVarint()
-	logInfo.Duration = dec.decodeVarint()
+
+	// Deprecated, have to decode anyways for backwards compatibility, but we ignore the values.


Hm, we can remove the fields and just call dec.decodeVarint() twice here possibly. Thoughts on that approach instead? Not too bullish on either, but seems a little cleaner to remove the fields from the LogInfo struct.

I'd like to keep them in because the structs actually make for good ad-hoc documentation of the file format on disk (we keep all the fields in order)

src/dbnode/persist/types.go

robskillington · 2019-02-15T14:35:28Z

src/dbnode/storage/cleanup.go

 	}

-	return false
+	return nil


Hm should this not be return finalErr or just return?

Can we add a test that makes sure errors truly come back if the multiErr is not nil?

This is actually handled by the defer statement above which will reset the finalErr var to any errors in the multiErr. I can change it to return finalErr to make it less confusing and I'll make sure we have a test

Ok I just verified existing logic was correct but I added a test case for it too

src/dbnode/storage/flush.go

richardartoul force-pushed the ra/write-clean-snapshot-metadata branch from d7edc16 to 5327dca Compare November 20, 2018 22:06

richardartoul force-pushed the ra/write-clean-snapshot-metadata branch 3 times, most recently from a3d618a to b7560da Compare December 18, 2018 01:36

richardartoul changed the title ~~[WIP - Dont Review] - Write and clean up snapshot metadata files~~ Implement commitlog rotation as well as writing out, cleaning up, and bootstrapping using snapshot metadata files Dec 22, 2018

richardartoul requested review from robskillington and justinjc December 22, 2018 00:55

richardartoul added PR: Awaiting Review [db] PR: Review Priority Medium labels Dec 22, 2018

richardartoul force-pushed the ra/write-clean-snapshot-metadata branch from 5052c4e to a501f3d Compare December 23, 2018 20:59

richardartoul added PR: Review Priority High and removed PR: Review Priority Medium labels Jan 12, 2019

richardartoul force-pushed the ra/write-clean-snapshot-metadata branch from a501f3d to d719816 Compare January 12, 2019 20:24

justinjc reviewed Jan 15, 2019

View reviewed changes

richardartoul force-pushed the ra/write-clean-snapshot-metadata branch from d719816 to 65ff94c Compare January 17, 2019 20:24

richardartoul force-pushed the ra/write-clean-snapshot-metadata branch 2 times, most recently from 79a84b7 to 694ac35 Compare February 6, 2019 14:41