-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add ReplicatedLogletController cluster scheduler component #2045
WIP: Add ReplicatedLogletController cluster scheduler component #2045
Conversation
a547de7
to
9b4de7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this PR @pcholakov. Some quick comments from a first pass.
// todo: can we replace both sets of servers above with a simple? | ||
// nodes: NodesConfiguration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be possible.
|
||
// Extra info - is it necessary for scheduling decisions? | ||
node_roles: BTreeMap<PlainNodeId, EnumSet<Role>>, | ||
log_server_states: BTreeMap<PlainNodeId, StorageState>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should get this information from the NodesConfiguration
, I believe.
|
||
/// The loglet controller is responsible for safely configuring loglet segments based on overall | ||
/// policy and the available log servers, and for transitioning segments to new node sets as cluster | ||
/// members come and go. It repeatedly decides on a loglet scheduling plan which it writes to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess with loglet scheduling plan you are referring to a LogletConfig
with the respective LogletParams
? Or do you have something different in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly right, you can interpret it as referring to the output (Option<SchedulingPlan>, Vec<LogletEffect>)
- and of SchedulingPlan
, it only updates the logs: BTreeMap<LogId, TargetLogletState>
. So it may be easier to just return the latter directly, and have the outer shell what to do with that.
This is an outdated comment from when I still thought about this as having its own memory; I think it should be possible to make it purely functional so it won't need any references to the metadata store.
scheduling_plan: &SchedulingPlan, // latest schedule from metadata store | ||
cluster_state: &ObservedClusterState, // observed cluster state pertinent to replicated loglets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the fields in self if we pass in this information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be vastly simplified, first order of business today.
// todo: what if we previously proposed this action? how do we avoid doing this unnecessarily? | ||
effects.push(LogletEffect::AddLogletSegment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could filter it out when trying to write the updated Logs configuration.
cluster_state | ||
.healthy_workers | ||
.iter() | ||
.to_owned() | ||
.for_each(|(node_id, _)| { | ||
nodes_config.upsert_node(NodeConfig::new( | ||
format!("node-{}", node_id), | ||
cluster_state.healthy_workers[node_id], | ||
format!("unix:/tmp/my_socket-{}", node_id).parse().unwrap(), | ||
Role::LogServer.into(), | ||
LogServerConfig { | ||
storage_state: cluster_state.log_server_states[node_id], | ||
}, | ||
)); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cluster controller won't decide which node runs as a log server. That's something that the node is started with or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a giant hack, I should have left a comment! This will be passed in 🙈
|
||
Some(target_state) => { | ||
// Check if loglet configuration requires any remediating actions | ||
let mut nodes_config = NodesConfiguration::default(); // todo: this should come from observed state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could come from metadata()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup!
#[serde_as(as = "serde_with::Seq<(_, _)>")] | ||
pub logs: BTreeMap<LogId, TargetLogletState>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it important that the scheduling decision of PPs and logs are stored together? Would it be enough if the target state of the logs is stored in Logs
?
.cloned() | ||
.choose_multiple( | ||
&mut rand::thread_rng(), | ||
self.config.replication.num_copies() as usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to pick the configured nodeset size and not the replication property. Otherwise a single loss of a node will make sealing impossible. If replication property is 2, then the nodeset size would be 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I understood this at the time I wrote it, but I misused the ReplicationProperty
type here. In config, I mean this to be "preferred maximum replication goal" rather than "hard requirement". But then I muddled up the two different meanings in use 😅 I think what I want ultimately is to implement a new type of SelectorStrategy
that isn't Flood
but more like Optimum(PreferredPlacement)
but I didn't want to go there just yet. I've updated with a simple usize desired replication goal config property which will be in the next iteration.
05aa2b1
to
0a516e2
Compare
0a516e2
to
568c381
Compare
No description provided.