Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: use RawNode for node's event loop; clean up bootstrap #10892

Merged
merged 4 commits into from
Jul 19, 2019

Conversation

tbg
Copy link
Contributor

@tbg tbg commented Jul 15, 2019

raft: use RawNode for node's event loop

It has always bugged me that any new feature essentially needed to be
tested twice due to the two ways in which apps can use raft (`*node` and
`*RawNode`). Due to upcoming testing work for joint consensus, now is a
good time to rectify this somewhat.

This commit removes most logic from `(*node).run` and uses `*RawNode`
internally. This simplifies the logic and also lead (via debugging) to some
insight on how the semantics of the approaches differ, which is now
documented in the comments.

raft: clean up bootstrap

This is the first (maybe not last) step in cleaning up the bootstrap code
around StartNode.

Initializing a Raft group for the first time is awkward, since a
configuration has to be pulled from thin air. The way this is solved today
is unclean: The app is supposed to pass peers to StartNode(), we add
configuration changes for them to the log, immediately pretend that they
are applied, but actually leave them unapplied (to give the app a chance to
observe them, though if the app did decide to not apply them things would
really go off the rails), and then return control to the app. The app will
then process the initial Readys and as a result the configuration will be
persisted to disk; restarts of the node then use RestartNode which doesn't
take any peers.

The code that did this lived awkwardly in two places fairly deep down the
callstack, though it was really only necessary in StartNode(). This commit
refactors things to make this more obvious: only StartNode does this dance
now. In particular, RawNode does not support this at all any more; it
expects the app to set up its Storage correctly.

Future work may provide helpers to make this "preseeding" of the Storage
more user-friendly. It isn't entirely straightforward to do so since the
Storage interface doesn't provide the right accessors for this purpose.
Briefly speaking, we want to make sure that a non-bootstrapped node can
never catch up via the log so that we can implicitly use one of the
"skipped" log entries to represent the configuration change into the
bootstrap configuration. This is an invasive change that affects all
consumers of raft, and it is of lower urgency since the code (post this
commit) already encapsulates the complexity sufficiently.

@tbg tbg force-pushed the rawnode-everywhere-attempt3 branch from 4d598d8 to ecf60c2 Compare July 16, 2019 09:08
@tbg tbg requested a review from bdarnell July 18, 2019 08:30
@tbg tbg force-pushed the rawnode-everywhere-attempt3 branch from ecf60c2 to 069125b Compare July 18, 2019 11:42
@bdarnell
Copy link
Contributor

The first commit LGTM. I agree in spirit with the second commit, but worry about backwards compatibility. For better or worse, RawNode has supported the same auto-initialization as Node, so someone other than CockroachDB could be depending on it (at least the signature change on NewRawNode will break their build so the failure can't pass silently).

I think that since the first commit reduces the duplication so the bootstrapping only lives in one place, it's probably best if that one place continues to be in RawNode instead of StartNode, at least until the anticipated helpers are available.

tbg added 3 commits July 19, 2019 09:59
It has always bugged me that any new feature essentially needed to be
tested twice due to the two ways in which apps can use raft (`*node` and
`*RawNode`). Due to upcoming testing work for joint consensus, now is a
good time to rectify this somewhat.

This commit removes most logic from `(*node).run` and uses `*RawNode`
internally. This simplifies the logic and also lead (via debugging) to
some insight on how the semantics of the approaches differ, which is now
documented in the comments.
This is the first (maybe not last) step in cleaning up the bootstrap
code around StartNode.

Initializing a Raft group for the first time is awkward, since a
configuration has to be pulled from thin air. The way this is solved
today is unclean: The app is supposed to pass peers to StartNode(),
we add configuration changes for them to the log, immediately pretend
that they are applied, but actually leave them unapplied (to give the
app a chance to observe them, though if the app did decide to not apply
them things would really go off the rails), and then return control to
the app. The app will then process the initial Readys and as a result
the configuration will be persisted to disk; restarts of the node then
use RestartNode which doesn't take any peers.

The code that did this lived awkwardly in two places fairly deep down
the callstack, though it was really only necessary in StartNode(). This
commit refactors things to make this more obvious: only StartNode does
this dance now. In particular, RawNode does not support this at all any
more; it expects the app to set up its Storage correctly.

Future work may provide helpers to make this "preseeding" of the Storage
more user-friendly. It isn't entirely straightforward to do so since
the Storage interface doesn't provide the right accessors for this
purpose. Briefly speaking, we want to make sure that a non-bootstrapped
node can never catch up via the log so that we can implicitly use one
of the "skipped" log entries to represent the configuration change into
the bootstrap configuration. This is an invasive change that affects
all consumers of raft, and it is of lower urgency since the code (post
this commit) already encapsulates the complexity sufficiently.
We are worried about breaking backwards compatibility for any
application out there that may have relied on the old behavior. Their
RawNode invocation would have been broken by the removal of the peers
argument so it would not have changed silently; an associated comment
tells callers how to fix it.
@tbg tbg force-pushed the rawnode-everywhere-attempt3 branch from 0aadfa8 to 500af91 Compare July 19, 2019 08:02
@tbg
Copy link
Contributor Author

tbg commented Jul 19, 2019

I think that since the first commit reduces the duplication so the bootstrapping only lives in one place, it's probably best if that one place continues to be in RawNode instead of StartNode, at least until the anticipated helpers are available.

Makes sense. I didn't want to bring this mostly-useless extra arg back though, so I added RawNode.Bootstrap which is now called from StartNode. A comment on NewRawNode indicates to callers that calling Bootstrap(peers) replaces the previous peers parameter.

It has a data race between the test's call to `reduceUncommittedSize`
and a corresponding call during Ready handling in `(*node).run()`.
The corresponding RawNode test still verifies the functionality, so
instead of fixing the test we can remove it.
@tbg tbg merged commit 3c5e2f5 into etcd-io:master Jul 19, 2019
@tbg tbg deleted the rawnode-everywhere-attempt3 branch July 19, 2019 12:30
tbg added a commit to tbg/etcd that referenced this pull request Jul 23, 2019
I changed `(*RawNode).Ready`'s behavior in etcd-io#10892 in a problematic way.
Previously, `Ready()` would create and immediately "accept" a Ready
(i.e. commit the app to actually handling it). In etcd-io#10892, Ready() became
a pure read-only operation and the "accepting" was moved to
`Advance(rd)`.  As a result it was illegal to use the RawNode in certain
ways while the Ready was being handled. Failure to do so would result in
dropped messages (and perhaps worse). For example, with the following
operations

1. `rd := rawNode.Ready()`
2. `rawNode.Step(someMsg)`
3. `rawNode.Advance(rd)`

`someMsg` would be dropped, because `Advance()` would clear out the
outgoing messages thinking that they had all been handled by the client.
I mistakenly assumed that this restriction had existed prior, but this
is incorrect.

I noticed this while trying to pick up the above PR in CockroachDB,
where it caused unit test failures, precisely due to the above example.

This PR reestablishes the previous behavior (result of `Ready()` must
be handled by the app) and adds a regression test.

While I was there, I carried out a few small clarifying refactors.
absolute8511 added a commit to youzan/ZanRedisDB that referenced this pull request Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants