-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for putting commit log bootstrapper before peers #894
Conversation
Codecov Report
@@ Coverage Diff @@
## master #894 +/- ##
==========================================
+ Coverage 78.67% 78.67% +<.01%
==========================================
Files 396 399 +3
Lines 33533 33678 +145
==========================================
+ Hits 26382 26497 +115
- Misses 5354 5372 +18
- Partials 1797 1809 +12
Continue to review full report at Codecov.
|
3a38f69
to
bfb9413
Compare
kube/bundle.yaml
Outdated
@@ -176,6 +176,7 @@ data: | |||
bootstrappers: | |||
- filesystem | |||
- commitlog | |||
- uninitialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah much better name 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to be putting peers after commitlog
in all these configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha I thought you wouldn't like the name but I couldn't think of anything better, and yeah sure why not. It should all work now
@@ -195,6 +204,12 @@ func ValidateBootstrappersOrder(names []string) error { | |||
bfs.FileSystemBootstrapperName, | |||
peers.PeersBootstrapperName, | |||
}, | |||
uninitialized.UninitializedBootstrapperName: []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. For peers bootstrapper, shouldn't the commit log bootstrapper name also appear in there (so it can appear after the commit log bootstrapper as well)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, will fix
@@ -63,7 +63,7 @@ db: | |||
bootstrappers: | |||
- filesystem | |||
- commitlog | |||
- noop-none | |||
- uninitialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm as before, do we want to be putting peers after commitlog
in all these configurations?
Also very surprised that we don't have peers actually listed here by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sure thing. Probably because it was too confusing to explain it to people haha
2883443
to
2cf04be
Compare
b680f94
to
93b0eef
Compare
…hing if node is Available or Leaving for that shard and add uninitialized source
93b0eef
to
4ac6997
Compare
// In the Initializing and Unknown states we have to assume that the commit log | ||
// is missing data and can't satisfy the bootstrap request. | ||
case shard.Initializing: | ||
case shard.Unknown: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should shard.Unknown
be removed here perhaps and let this case drop through to the default
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah thats fair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
iOpts := s.opts.CommitLogOptions().InstrumentOptions() | ||
invariantLogger := instrument.EmitInvariantViolationAndGetLogger(iOpts) | ||
invariantLogger.Errorf( | ||
"Initial topology state does not contain shard state for origin node and shard: %d", shardIDUint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Lowercase the first letter of the log message for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
switch shardState { | ||
case shard.Initializing: | ||
numInitializing++ | ||
case shard.Unknown: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should track Unknown either, it should never be a part of a topology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do the same fallthrough thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// unfufilled time ranges from the set of shard time ranges. | ||
func (r ShardTimeRanges) ToUnfulfilledResult() DataBootstrapResult { | ||
func (r ShardTimeRanges) ToUnfulfilledDataResult() DataBootstrapResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -101,3 +106,41 @@ func (v TopologyView) Map() (topology.Map, error) { | |||
|
|||
return topology.NewStaticMap(opts), nil | |||
} | |||
|
|||
// SourceAvailableHost is a human-friendly way of constructing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(take/leave): i constructed topologies using something like --
func TestFetchTaggedResultsAccumulatorAnyResponseShouldTerminateConsistencyLevelOneSimpleTopo(t *testing.T) {
// rf=3, 30 shards total; three identical hosts
topoMap := tu.MustNewTopologyMap(3, map[string][]shard.Shard{
"testhost0": tu.ShardsRange(0, 29, shard.Available),
"testhost1": tu.ShardsRange(0, 29, shard.Available),
"testhost2": tu.ShardsRange(0, 29, shard.Available),
})
I think something equivalent tu.MustNewStateSnapshot(...)
might read better when using than this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this looks good, at this point I'd be re-writing a bunch of tests for a fairly superficial change (tests other than the ones I'm modifying in this P.R) so I'm gonna leave for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than the superficial, it's about keeping the tests simpler to read/maintain. If you wanna do it in another PR, mind opening an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetInstrumentOptions(value instrument.Options) Options | ||
|
||
// Validate validates the options are correct. | ||
Validate() error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: convention to have this as the first method in the Options() type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing
InstrumentOptions() instrument.Options | ||
|
||
// Set the instrument options. | ||
SetInstrumentOptions(value instrument.Options) Options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: follow ordering of setter before getter for each type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
"github.com/m3db/m3/src/dbnode/storage/bootstrap" | ||
"github.com/m3db/m3/src/dbnode/storage/bootstrap/result" | ||
"github.com/m3db/m3/src/dbnode/storage/namespace" | ||
topotestutils "github.com/m3db/m3/src/dbnode/topology/testutil" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: tu
instead of topotestutils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
|
||
// The purpose of the unitializedSource is to succeed bootstraps for any | ||
// shard/time-ranges if the given shard/namespace combination has never | ||
// been completely initialized (is a new namespace). This is required for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's not per namespace, it's per cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, good call
@@ -40,7 +40,9 @@ import ( | |||
"github.com/m3db/m3/src/dbnode/persist/fs/commitlog" | |||
"github.com/m3db/m3/src/dbnode/storage/bootstrap/result" | |||
"github.com/m3db/m3/src/dbnode/storage/namespace" | |||
topotestutils "github.com/m3db/m3/src/dbnode/topology/testutil" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as other spot, tu
would probably be more concise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -1425,6 +1421,72 @@ func (s commitLogSource) maybeAddToIndex( | |||
return err | |||
} | |||
|
|||
func (s *commitLogSource) availability( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: mind adding 1-2 lines explaining intent behind the method -- i.e. that anything not initialized is by definition not available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
default: | ||
panic( | ||
fmt.Sprintf("encountered unknown shard state: %s", shardState.String())) | ||
// TODO(rartoul): Make this a hard error once we refactor the interface to support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
// NewOptions creates a new Options. | ||
func NewOptions() Options { | ||
return &options{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm why not initialise these to result.NewOptions()
and instrument.NewOptions()
?
asking because almost always NewOptions()
returns something that's valid by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid issues like we've had in the past where we cant find metrics because (somewhere) in the deep configuration pipeline / setup we forgot to set InstrumentOptions()
This seems like a reasonable compromise to make sure that we don't run into that type of stuff anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't necessarily disagree but doing it in one place like this breaks from the convention everywhere else in the code.
} | ||
|
||
// The basic idea for the algorithm is that on a shard-by-shard basis we | ||
// need to determine if the namespace is "new" in the sense that it has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of namespace is "new"
--> cluster is "new"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, not sure why I kept making that mistake. I guess its easy to think of the topology as being per-namespace but actually shards are for the whole cluster not a namespace
// in the topology are "available"). | ||
// In order to determine this, we simply count the number of hosts in the | ||
// "initializing" state. If this number is larger than zero, than the | ||
// namespace is "new". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
} | ||
|
||
shardHasNeverBeenCompletelyInitialized := numInitializing-numLeaving > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm won't this break for a replace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, should be fine for everything except a replication factor change. Will add a comment
@@ -68,6 +68,8 @@ data: | |||
bootstrappers: | |||
- filesystem | |||
- commitlog | |||
- peers | |||
- uninitialized_topology |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i hate it, but i hate all others more =P
} | ||
|
||
// NewOptions creates a new Options. | ||
func NewOptions() Options { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was tracing through uses of this type, don't think you need the resultOptions - it's unused afaict, and the instrumentOptions are only required because of the logger, which you can avoid once the interface change is done to allow error propagation back up. Maybe leave a note to delete the type in the future once it's unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the base bootstrapper uses it for some things in the provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I also tested the following flows on the test cluster (with bootstrapping configuration filesystem,commitlog,peers,uninitialized):
We will need to address #900 in a separate P.R