Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement commitlog rotation as well as writing out, cleaning up, and bootstrapping using snapshot metadata files #1384

Merged
merged 13 commits into from
Mar 2, 2019

Conversation

richardartoul
Copy link
Contributor

Summary

This P.R implements the following functionality:

  • Rotate commitlogs, generate snapshot ID, and write out snapshot metadata/checkpoint files for each snapshot.
  • Rewrite snapshot/commitlog cleanup logic to take new snapshotting process into account.
  • Most minimal possible changes to commitlog bootstrapping logic to make it work with the new snapshot metadata files.
  • Support a fully transparent upgrade for users (even those using the commitlog bootstrapper) by keeping the bootstrapping logic backwards-compatible with previous version of M3DB.
  • Removal of the concept of commitlog blocksize.

Review guide

Unfortunately its very difficult to make this P.R any smaller because as soon as writing out the snapshot metadata files is implemented, the cleanup logic needs to be implemented, and if the cleanup logic is implemented then the bootstrapping logic needs to be implemented as well.

Luckily, while this is a large P.R, there are much more deletions than additions, and the vast majority of the new code is a simplification of previously much more complicated code. I,E a lot of the code that was rewritten went from very complicated logic to very simple logic.

Places to pay particular attention to in the review:

  1. Cleanup logic in cleanup.go
  2. Commitlog rotation logic in commitlog.go (removed the concept of rotating files by time entirely, and commitlog files will only be rotated when an explicit call to RotateLogs() is made.
  3. Removal of the concept of a commitlog block size, and also while we retained the existing commitlog filename structure, all commitlog files will have a name that defaults to a blockStart of time.Unix(0, 0). Because the commitlog files have info headers and we don't need to rely on their names at all, I believe we can remove the blockStart from their name entirely, but I'd like to leave it out of this P.R as its already too big.
  4. Changes to the PersistManager to support writing out the snapshot ID and commitlog ID when the entire snapshot process completes in persist_manager.go
  5. Changes to the commitlog bootstrapping logic (mostly code deletion) that make it work for both the previous implementation and the current implementation. This is accomplished by simplifying the logic to: Read the most recent snapshot file for every namespace/shard/block combination and ALL commitlog files that exist on disk. This will be slow, but correct, and a subsequent P.R will make this significantly more efficient will still remaining backwards compatible with files generated from previous versions of M3DB (trying to make minimal possible changes to bootstrapping in this P.R to keep it from growing obscenely large). Also confidence that this is backwards compatible is maintained because I left the existing prop test unmodified which generates combinations of snapshots/commitlogs that look like they came from an older version of M3DB.
  6. The logic for rotating the commitlog, generating the snapshot ID, and performing the snapshot in flush.go

// change the commitlog filename structure (because we don't rely on the name
// for any information, we just list all the files in a directory and then
// read their encoded heads to obtain information about them), so in the future
// we can just get rid of this.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TODO(v1): remove this when ok to be backwards incompatible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just a regular TODO because it is backwards compatible (existing code doesn't rely on filename). I just didn't want to make the change in this P.R

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

enc.encodeVarintFn(info.Start)
enc.encodeVarintFn(info.Duration)

// Deprecated, have to encode anyways for backwards compatibility, but we ignore the values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TODO(v1): remove when we can make backwards incompatible changes with an upgrade to v1

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@richardartoul richardartoul force-pushed the ra/write-clean-snapshot-metadata branch from abd8d2c to d076698 Compare March 2, 2019 04:31
@codecov
Copy link

codecov bot commented Mar 2, 2019

Codecov Report

Merging #1384 into master will decrease coverage by 0.1%.
The diff coverage is 82.2%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #1384     +/-   ##
========================================
- Coverage    70.8%   70.7%   -0.2%     
========================================
  Files         834     827      -7     
  Lines       71509   71194    -315     
========================================
- Hits        50684   50365    -319     
- Misses      17516   17525      +9     
+ Partials     3309    3304      -5
Flag Coverage Δ
#aggregator 82.3% <ø> (ø) ⬆️
#cluster 85.9% <ø> (ø) ⬆️
#collector 63.7% <ø> (ø) ⬆️
#dbnode 80.7% <82.2%> (-0.1%) ⬇️
#m3em 73.2% <ø> (ø) ⬆️
#m3ninx 74.2% <ø> (ø) ⬆️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 17.6% <ø> (ø) ⬆️
#msg 74.9% <ø> (-0.2%) ⬇️
#query 65.2% <ø> (-0.6%) ⬇️
#x 76% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56c0fdd...d076698. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 2, 2019

Codecov Report

Merging #1384 into master will decrease coverage by <.1%.
The diff coverage is 81.6%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #1384     +/-   ##
========================================
- Coverage    70.8%   70.8%   -0.1%     
========================================
  Files         834     834             
  Lines       71509   71482     -27     
========================================
- Hits        50684   50649     -35     
- Misses      17516   17526     +10     
+ Partials     3309    3307      -2
Flag Coverage Δ
#aggregator 82.3% <ø> (ø) ⬆️
#cluster 85.8% <ø> (ø) ⬆️
#collector 63.7% <ø> (ø) ⬆️
#dbnode 80.7% <81.6%> (-0.1%) ⬇️
#m3em 73.2% <ø> (ø) ⬆️
#m3ninx 74.2% <ø> (ø) ⬆️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 17.6% <ø> (ø) ⬆️
#msg 74.9% <ø> (-0.2%) ⬇️
#query 65.7% <ø> (-0.1%) ⬇️
#x 76% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56c0fdd...d076698. Read the comment docs.

@richardartoul richardartoul merged commit 54e241a into master Mar 2, 2019
@justinjc justinjc deleted the ra/write-clean-snapshot-metadata branch June 17, 2019 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants