Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement commitlog rotation as well as writing out, cleaning up, and bootstrapping using snapshot metadata files #1170

Closed
wants to merge 10 commits into from

Conversation

richardartoul
Copy link
Contributor

@richardartoul richardartoul commented Nov 14, 2018

Moved to: #1384

@codecov
Copy link

codecov bot commented Nov 14, 2018

Codecov Report

Merging #1170 into master will increase coverage by <.1%.
The diff coverage is 83.3%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #1170     +/-   ##
========================================
+ Coverage    70.7%   70.7%   +<.1%     
========================================
  Files         822     822             
  Lines       70248   70220     -28     
========================================
- Hits        49707   49689     -18     
+ Misses      17313   17301     -12     
- Partials     3228    3230      +2
Flag Coverage Δ
#aggregator 82.3% <ø> (-0.1%) ⬇️
#cluster 85.6% <ø> (ø) ⬆️
#collector 78.4% <ø> (ø) ⬆️
#dbnode 80.9% <83.3%> (ø) ⬆️
#m3em 73.2% <ø> (ø) ⬆️
#m3ninx 74.3% <ø> (ø) ⬆️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 17.8% <ø> (ø) ⬆️
#msg 74.9% <ø> (ø) ⬆️
#query 64.2% <ø> (-0.1%) ⬇️
#x 76.2% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 024eb7c...694ac35. Read the comment docs.

@richardartoul richardartoul force-pushed the ra/write-clean-snapshot-metadata branch 3 times, most recently from a3d618a to b7560da Compare December 18, 2018 01:36
@richardartoul richardartoul changed the title [WIP - Dont Review] - Write and clean up snapshot metadata files Implement commitlog rotation as well as writing out, cleaning up, and bootstrapping using snapshot metadata files Dec 22, 2018
@richardartoul richardartoul force-pushed the ra/write-clean-snapshot-metadata branch from a501f3d to d719816 Compare January 12, 2019 20:24
src/dbnode/storage/cleanup.go Outdated Show resolved Hide resolved
return nil, err
// Assert that the snapshot metadata files are indeed sorted.
lastMetadataIndex := int64(-1)
for _, snapshotMetadata := range sortedSnapshotMetadatas {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Seems like we should just trust the interface that we got sorted snapshots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary, but I've been trying to add more assertions like this throughout the codebase in situations where they are cheap to make our lives easier

"github.com/uber-go/tally"
)

type commitLogFilesFn func(commitlog.Options) ([]commitlog.File, []commitlog.ErrorWithPath, error)
type commitLogFilesFn func(commitlog.Options) (persist.CommitlogFiles, []commitlog.ErrorWithPath, error)
type sortedSnapshotMetadataFilesFn func(fs.Options) ([]fs.SnapshotMetadata, []fs.SnapshotMetadataErrorWithPaths, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to get back the snapshot metadata files sorted? In the logic below, it seems necessary to just have the list of files and know which one is the most recent one, which seems more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm....thats fair, let me see if I can rework it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I like the logic where its sorted, but maybe instead of checking if its sorted and assuming that in the interface I can just re-sort cause it should be cheap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok updated, let me know what you think

src/dbnode/storage/flush.go Outdated Show resolved Hide resolved
}

// mark data flush finished
multiErr = multiErr.Add(flush.DoneData())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this logic? Seems like multiErr.Add handles the nil case for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just felt easier to follow to me if the error path is clearly delimited

src/dbnode/storage/flush.go Outdated Show resolved Hide resolved
return err
}

m.setState(flushManagerSnapshotInProgress)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you set back the state after the snapshot is done somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it will just remain in that state until the index flush begins which will cause a state transition

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we verify that state is flushManagerSnapshotInProgress when the index flush starts? Or does that happen in the method that calls this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set it to flushManagerIndexFLushInProgress right before we do the index flush:

	m.setState(flushManagerIndexFlushInProgress)
	for _, ns := range namespaces {
		var (
			indexOpts    = ns.Options().IndexOptions()
			indexEnabled = indexOpts.Enabled()
		)
		if !indexEnabled {
			continue
		}
		multiErr = multiErr.Add(ns.FlushIndex(indexFlush))
	}
	multiErr = multiErr.Add(indexFlush.DoneIndex())

src/dbnode/storage/flush.go Outdated Show resolved Hide resolved
src/dbnode/storage/flush.go Outdated Show resolved Hide resolved
src/dbnode/storage/flush.go Outdated Show resolved Hide resolved
@richardartoul richardartoul force-pushed the ra/write-clean-snapshot-metadata branch from d719816 to 65ff94c Compare January 17, 2019 20:24
@richardartoul richardartoul force-pushed the ra/write-clean-snapshot-metadata branch 2 times, most recently from 79a84b7 to 694ac35 Compare February 6, 2019 14:41
// The commit log block size.
BlockSize time.Duration `yaml:"blockSize" validate:"nonzero"`
// Deprecated. Left in struct to keep old YAMLs parseable.
DeprecatedBlockSize *time.Duration `yaml:"blockSize"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, good call - just wondering how we should track some of this stuff so we remove at v1.

Should we get some labels and/or test cases that can be discovered by scanning the repo easily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about just: TODO(v1): remove

// improve and simplify the commitlog bootstrapping logic. This is fine
// because this integration test protects against performance regressions
// not correctness.
t.SkipNow()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool, can we open an issue for this and link to it from here? Just so we're tracking it as an issue too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created a master task with all commitlog rotation / snapshotting remaining work: #1383

for ; ; newIndex++ {
var (
prefix = opts.FilesystemOptions().FilePathPrefix()
filePath = fs.CommitlogFilePath(prefix, time.Unix(0, 0), newIndex)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make time.Unix(0,0) a var at the top of this file and put a comment perhaps next to where you use it about why it's no longer required to be any value and can be zero?

var timeNone = time.Unix(0,0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done. In the future we can just remove it entirely which I'm pretty sure is backwards compatible (because we just list all the files in the directory then read their heads and ignore their filenames now) but I didn't want to tackle it in this P.R

logInfo.Start = dec.decodeVarint()
logInfo.Duration = dec.decodeVarint()

// Deprecated, have to decode anyways for backwards compatibility, but we ignore the values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we can remove the fields and just call dec.decodeVarint() twice here possibly. Thoughts on that approach instead? Not too bullish on either, but seems a little cleaner to remove the fields from the LogInfo struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep them in because the structs actually make for good ad-hoc documentation of the file format on disk (we keep all the fields in order)

}

return false
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm should this not be return finalErr or just return?

Can we add a test that makes sure errors truly come back if the multiErr is not nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually handled by the defer statement above which will reset the finalErr var to any errors in the multiErr. I can change it to return finalErr to make it less confusing and I'll make sure we have a test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I just verified existing logic was correct but I added a test case for it too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants