Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for putting commit log bootstrapper before peers #894

Merged
merged 28 commits into from
Sep 17, 2018

Conversation

richardartoul
Copy link
Contributor

@richardartoul richardartoul commented Sep 10, 2018

  • Add a new "uninitialized" bootstrapper which can be added to the end of the bootstrapping process and will only succeed bootstraps for uninitialized topologies
  • Add logic to the commit log bootstrapper to only announce that it can succeed a bootstrap if the requested shard is marked as "Available" for itself in the topology
  • General refactoring for code sharing
  • Update example config files to use bootstrappers: filesystem,commitlog,peers,uninitialized

I also tested the following flows on the test cluster (with bootstrapping configuration filesystem,commitlog,peers,uninitialized):

  • Turn off all nodes, create new topology, create new namespace, turn on all nodes and make sure they're able to start
  • Do a node remove and make sure bootstrap succeeds
  • Do a node add and make sure bootstrap succeeds
  • Roll the entire cluster and make sure each node is able to bootstrap

We will need to address #900 in a separate P.R

@codecov
Copy link

codecov bot commented Sep 10, 2018

Codecov Report

Merging #894 into master will increase coverage by <.01%.
The diff coverage is 67.53%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #894      +/-   ##
==========================================
+ Coverage   78.67%   78.67%   +<.01%     
==========================================
  Files         396      399       +3     
  Lines       33533    33678     +145     
==========================================
+ Hits        26382    26497     +115     
- Misses       5354     5372      +18     
- Partials     1797     1809      +12
Flag Coverage Δ
#dbnode 81.41% <67.53%> (-0.03%) ⬇️
#m3ninx 71.93% <ø> (ø) ⬆️
#query 69.55% <ø> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2883443...4ac6997. Read the comment docs.

@richardartoul richardartoul changed the title [WIP] - Add support for putting commit log bootstrapper before peers Add support for putting commit log bootstrapper before peers Sep 10, 2018
@richardartoul richardartoul requested review from prateek and robskillington and removed request for prateek September 10, 2018 21:59
kube/bundle.yaml Outdated
@@ -176,6 +176,7 @@ data:
bootstrappers:
- filesystem
- commitlog
- uninitialized
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah much better name 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to be putting peers after commitlog in all these configurations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha I thought you wouldn't like the name but I couldn't think of anything better, and yeah sure why not. It should all work now

@@ -195,6 +204,12 @@ func ValidateBootstrappersOrder(names []string) error {
bfs.FileSystemBootstrapperName,
peers.PeersBootstrapperName,
},
uninitialized.UninitializedBootstrapperName: []string{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. For peers bootstrapper, shouldn't the commit log bootstrapper name also appear in there (so it can appear after the commit log bootstrapper as well)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, will fix

@@ -63,7 +63,7 @@ db:
bootstrappers:
- filesystem
- commitlog
- noop-none
- uninitialized
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm as before, do we want to be putting peers after commitlog in all these configurations?

Also very surprised that we don't have peers actually listed here by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure thing. Probably because it was too confusing to explain it to people haha

// In the Initializing and Unknown states we have to assume that the commit log
// is missing data and can't satisfy the bootstrap request.
case shard.Initializing:
case shard.Unknown:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should shard.Unknown be removed here perhaps and let this case drop through to the default case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah thats fair

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

iOpts := s.opts.CommitLogOptions().InstrumentOptions()
invariantLogger := instrument.EmitInvariantViolationAndGetLogger(iOpts)
invariantLogger.Errorf(
"Initial topology state does not contain shard state for origin node and shard: %d", shardIDUint)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Lowercase the first letter of the log message for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

switch shardState {
case shard.Initializing:
numInitializing++
case shard.Unknown:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should track Unknown either, it should never be a part of a topology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do the same fallthrough thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// unfufilled time ranges from the set of shard time ranges.
func (r ShardTimeRanges) ToUnfulfilledResult() DataBootstrapResult {
func (r ShardTimeRanges) ToUnfulfilledDataResult() DataBootstrapResult {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -101,3 +106,41 @@ func (v TopologyView) Map() (topology.Map, error) {

return topology.NewStaticMap(opts), nil
}

// SourceAvailableHost is a human-friendly way of constructing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(take/leave): i constructed topologies using something like --

func TestFetchTaggedResultsAccumulatorAnyResponseShouldTerminateConsistencyLevelOneSimpleTopo(t *testing.T) {
	// rf=3, 30 shards total; three identical hosts
	topoMap := tu.MustNewTopologyMap(3, map[string][]shard.Shard{
		"testhost0": tu.ShardsRange(0, 29, shard.Available),
		"testhost1": tu.ShardsRange(0, 29, shard.Available),
		"testhost2": tu.ShardsRange(0, 29, shard.Available),
	})

I think something equivalent tu.MustNewStateSnapshot(...) might read better when using than this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this looks good, at this point I'd be re-writing a bunch of tests for a fairly superficial change (tests other than the ones I'm modifying in this P.R) so I'm gonna leave for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than the superficial, it's about keeping the tests simpler to read/maintain. If you wanna do it in another PR, mind opening an issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetInstrumentOptions(value instrument.Options) Options

// Validate validates the options are correct.
Validate() error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: convention to have this as the first method in the Options() type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing

InstrumentOptions() instrument.Options

// Set the instrument options.
SetInstrumentOptions(value instrument.Options) Options
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: follow ordering of setter before getter for each type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

"github.com/m3db/m3/src/dbnode/storage/bootstrap"
"github.com/m3db/m3/src/dbnode/storage/bootstrap/result"
"github.com/m3db/m3/src/dbnode/storage/namespace"
topotestutils "github.com/m3db/m3/src/dbnode/topology/testutil"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: tu instead of topotestutils

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


// The purpose of the unitializedSource is to succeed bootstraps for any
// shard/time-ranges if the given shard/namespace combination has never
// been completely initialized (is a new namespace). This is required for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's not per namespace, it's per cluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, good call

@@ -40,7 +40,9 @@ import (
"github.com/m3db/m3/src/dbnode/persist/fs/commitlog"
"github.com/m3db/m3/src/dbnode/storage/bootstrap/result"
"github.com/m3db/m3/src/dbnode/storage/namespace"
topotestutils "github.com/m3db/m3/src/dbnode/topology/testutil"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as other spot, tu would probably be more concise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1425,6 +1421,72 @@ func (s commitLogSource) maybeAddToIndex(
return err
}

func (s *commitLogSource) availability(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mind adding 1-2 lines explaining intent behind the method -- i.e. that anything not initialized is by definition not available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

default:
panic(
fmt.Sprintf("encountered unknown shard state: %s", shardState.String()))
// TODO(rartoul): Make this a hard error once we refactor the interface to support
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


// NewOptions creates a new Options.
func NewOptions() Options {
return &options{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm why not initialise these to result.NewOptions() and instrument.NewOptions()?

asking because almost always NewOptions() returns something that's valid by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid issues like we've had in the past where we cant find metrics because (somewhere) in the deep configuration pipeline / setup we forgot to set InstrumentOptions()

This seems like a reasonable compromise to make sure that we don't run into that type of stuff anymore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't necessarily disagree but doing it in one place like this breaks from the convention everywhere else in the code.

}

// The basic idea for the algorithm is that on a shard-by-shard basis we
// need to determine if the namespace is "new" in the sense that it has
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of namespace is "new" --> cluster is "new"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, not sure why I kept making that mistake. I guess its easy to think of the topology as being per-namespace but actually shards are for the whole cluster not a namespace

// in the topology are "available").
// In order to determine this, we simply count the number of hosts in the
// "initializing" state. If this number is larger than zero, than the
// namespace is "new".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
}

shardHasNeverBeenCompletelyInitialized := numInitializing-numLeaving > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm won't this break for a replace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, should be fine for everything except a replication factor change. Will add a comment

@@ -68,6 +68,8 @@ data:
bootstrappers:
- filesystem
- commitlog
- peers
- uninitialized_topology
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i hate it, but i hate all others more =P

}

// NewOptions creates a new Options.
func NewOptions() Options {
Copy link
Collaborator

@prateek prateek Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was tracing through uses of this type, don't think you need the resultOptions - it's unused afaict, and the instrumentOptions are only required because of the logger, which you can avoid once the interface change is done to allow error propagation back up. Maybe leave a note to delete the type in the future once it's unused?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the base bootstrapper uses it for some things in the provider.

prateek
prateek previously approved these changes Sep 17, 2018
Copy link
Collaborator

@prateek prateek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@prateek prateek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@richardartoul richardartoul merged commit 93e770d into master Sep 17, 2018
@prateek prateek deleted the ra/commitlog-peers-v2 branch September 29, 2018 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants