Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce follower replication #249

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Fullstop000
Copy link
Member

@Fullstop000 Fullstop000 commented Jul 1, 2019

#136

New features

  • Add some new fields in Message
  • New concepts Group and Delegate in the cluster topology
  • A message now could be sent to a proxy and the proxy redirects the message to the destination

TODOs

  • The leader can specify the Delegate of a Group
  • The Delegate can bcast_append or send_append to the other members in the Group
  • Stable group delegate
  • Scenarios based testing

Problems

The ConfChange protocol for raft groups might need to be discussed. Solved as GroupConfig is now volatile.

The implementation has been ported to etcd: etcd-io/etcd#11455.

@Fullstop000 Fullstop000 changed the title [WIP] Introduce follower replication Introduce follower replication Aug 27, 2019
@Fullstop000
Copy link
Member Author

I've finished the main part of this PR. But I'm not sure the best way to implement the ConfChange for altering raft groups dynamically so I want a discussion with your guys.

PTAL @siddontang @hicqu @BusyJay @Hoverbear

@Hoverbear
Copy link
Contributor

@Fullstop000 Can you please include links to any design docs you worked on?

@Fullstop000
Copy link
Member Author

@Hoverbear Can I make a WIP PR in tikv/rfc ?

@Hoverbear
Copy link
Contributor

@Fullstop000 I think it's OK. :) I'd still really appreciate seeing an RFC (like https://docs.google.com/document/d/1Sp9Tnc_nk_i0feTOgLGPT3jZYVFjfP7C6pwA-wuVHko/edit?ts=5c8c5fe3 which you worked on)

@Fullstop000 Fullstop000 force-pushed the follower_replication branch 4 times, most recently from ae47fee to 3502119 Compare October 29, 2019 09:45
@Hoverbear Hoverbear added the Feature Related to a major feature. label Nov 7, 2019
@Hoverbear
Copy link
Contributor

@hicqu PTAL

src/raft_log.rs Show resolved Hide resolved
src/raft.rs Outdated Show resolved Hide resolved
src/raft.rs Outdated Show resolved Hide resolved
src/raft.rs Outdated Show resolved Hide resolved
src/group/mod.rs Outdated Show resolved Hide resolved
src/group/mod.rs Outdated

/// The max number of members of a group of which contains leader and the leader never pick a delegate.
/// If the group size is larger than this, a delegate will be picked even the leader belongs to this group.
pub max_leader_group_no_delegate: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer to remove this, because without explicit information about group_id, follower replication may be not helpful. For example 7 peers in one datacenter, follower replication can't save any network traffics.

src/group/mod.rs Outdated Show resolved Hide resolved
src/raft.rs Outdated
// Also, A peer with smaller 'match' is able to receive more un-compacted entries from leader
// and then send them to others in same group.
//
fn pick_delegate(&self, group: &[u64], prs: &ProgressSet) -> u64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems in the new implementation, choose the peer with most raft logs is better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems more reasonable. Actually maybe we can choose any unpaused peer?

src/raft.rs Outdated
@@ -130,6 +130,9 @@ pub struct Raft<T: Storage> {
/// The list of messages.
pub msgs: Vec<Message>,

/// The especial map for messages to be sent to group delegates.
pub delegated_msgs: HashMap<u64, Message>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to put it with groups

src/raft.rs Outdated
@@ -545,41 +546,159 @@ impl<T: Storage> Raft<T> {
is_batched
}

// Whether the given peer could use Follower Replication
fn use_delegate(&self, to: u64) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer

fn use_delegate(&self, to: u64) -> bool {
    self.is_leader() &&
        self.groups.get_members(to).map_or(0, |members| members.len()) > 1 &&
        !self.groups.in_same_group(self.id, to)
}

And we need to make sure that peers with invalid group_id shouldn't exist in groups.

src/raft.rs Outdated
/// of produced message should be set to the leader id.
pub fn send_append(&mut self, to: u64, prs: &mut ProgressSet) {
if self.use_delegate(to) {
if let Some(gid) = self.groups.get_group_id(to) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many indents. Maybe we can define use_delegate(to) -> Option<group_id>.

src/raft.rs Outdated
None => self.pick_delegate(&members, prs),
};
if delegate != INVALID_ID {
if let Some(pr) = prs.get_mut(delegate) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling unwrap is better, because the progress must exist.

src/raft.rs Outdated
}
None => {
let members = self.groups.get_members(to).cloned().unwrap(); // this is safe because of the checking in `self.use_delegate`
let delegate = match self.groups.get_delegate(gid) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer let delegate = self.pick_delegate(prs), which already contains get_delegate logic. And, its valid only if the pr is not paused.

src/raft.rs Outdated
}
*maybe_commit = true;
}

fn handle_append_response_in_delegate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems only difference between this and handle_append_response is the latter needs to handle leader transferring. That logic is compatible for delegates. So I think this function is unnecessary.

src/raft.rs Outdated
self.set_prs(prs);
} else {
let members = self.groups.get_members(m.from_delegate).cloned();
self.bcast_append(members.as_ref());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bcast_append calls send_append, which checks members' delegate again. I believe it can be improved, maybe we can take a try?

src/raft.rs Outdated
@@ -1811,14 +1993,63 @@ impl<T: Storage> Raft<T> {
};
self.read_states.push(rs);
}
MessageType::MsgAppendResponse => {
if self.prs().get(m.from).is_none() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a common case for receiving responses from removed peers. So I think the log is unnecessary. Without it the code could be cleaner.

src/raft.rs Outdated
//
// The delegate must satisfy conditions below:
// 1. Must be 'recent_active'
// 2. The progress state should be 'Replicate' but not 'paused'
Copy link
Contributor

@hicqu hicqu Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems Probe needs also to be allowed? I think calling is_paused here is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probe might not be suitable for a delegate in the leader's view because the Probe member can probably just recover from a crash or is suffering the network partition.

src/raft.rs Outdated
}

#[inline]
fn is_follower_replication_enabled(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not accurate. Every peer can't know follower replication is enabled or not. It just choose a strategy after receives followers' messages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add some comments to make this clear.

harness/src/interface.rs Outdated Show resolved Hide resolved
src/raft.rs Outdated Show resolved Hide resolved
src/raft.rs Outdated Show resolved Hide resolved
src/raft.rs Outdated Show resolved Hide resolved
src/raft.rs Outdated Show resolved Hide resolved
src/raft.rs Outdated Show resolved Hide resolved
src/raft.rs Outdated Show resolved Hide resolved
src/raft.rs Outdated Show resolved Hide resolved
src/raft.rs Outdated Show resolved Hide resolved
src/progress/progress_set.rs Outdated Show resolved Hide resolved
hicqu
hicqu previously approved these changes Dec 17, 2019
Signed-off-by: Fullstop000 <fullstop1005@gmail.com>
@Fullstop000 Fullstop000 force-pushed the follower_replication branch 2 times, most recently from 88f583e to 1da637c Compare February 25, 2020 08:31
Signed-off-by: Fullstop000 <fullstop1005@gmail.com>
@TennyZhuang
Copy link

Any updates?

@Xuanwo
Copy link
Member

Xuanwo commented Aug 24, 2021

This PR is discontinued for so long, any updates here? Do we have a plan to get it merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Related to a major feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants