Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add ReplicatedLogletController cluster scheduler component #2045

Closed

Conversation

pcholakov
Copy link
Contributor

No description provided.

@pcholakov pcholakov force-pushed the feat/replicated-loglet-controller branch 3 times, most recently from a547de7 to 9b4de7a Compare October 8, 2024 19:18
Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @pcholakov. Some quick comments from a first pass.

Comment on lines 67 to 77
// todo: can we replace both sets of servers above with a simple?
// nodes: NodesConfiguration,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be possible.


// Extra info - is it necessary for scheduling decisions?
node_roles: BTreeMap<PlainNodeId, EnumSet<Role>>,
log_server_states: BTreeMap<PlainNodeId, StorageState>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get this information from the NodesConfiguration, I believe.


/// The loglet controller is responsible for safely configuring loglet segments based on overall
/// policy and the available log servers, and for transitioning segments to new node sets as cluster
/// members come and go. It repeatedly decides on a loglet scheduling plan which it writes to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess with loglet scheduling plan you are referring to a LogletConfig with the respective LogletParams? Or do you have something different in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly right, you can interpret it as referring to the output (Option<SchedulingPlan>, Vec<LogletEffect>) - and of SchedulingPlan, it only updates the logs: BTreeMap<LogId, TargetLogletState>. So it may be easier to just return the latter directly, and have the outer shell what to do with that.

This is an outdated comment from when I still thought about this as having its own memory; I think it should be possible to make it purely functional so it won't need any references to the metadata store.

Comment on lines +99 to +93
scheduling_plan: &SchedulingPlan, // latest schedule from metadata store
cluster_state: &ObservedClusterState, // observed cluster state pertinent to replicated loglets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the fields in self if we pass in this information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be vastly simplified, first order of business today.

Comment on lines 168 to 169
// todo: what if we previously proposed this action? how do we avoid doing this unnecessarily?
effects.push(LogletEffect::AddLogletSegment);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could filter it out when trying to write the updated Logs configuration.

Comment on lines 175 to 184
cluster_state
.healthy_workers
.iter()
.to_owned()
.for_each(|(node_id, _)| {
nodes_config.upsert_node(NodeConfig::new(
format!("node-{}", node_id),
cluster_state.healthy_workers[node_id],
format!("unix:/tmp/my_socket-{}", node_id).parse().unwrap(),
Role::LogServer.into(),
LogServerConfig {
storage_state: cluster_state.log_server_states[node_id],
},
));
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster controller won't decide which node runs as a log server. That's something that the node is started with or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a giant hack, I should have left a comment! This will be passed in 🙈


Some(target_state) => {
// Check if loglet configuration requires any remediating actions
let mut nodes_config = NodesConfiguration::default(); // todo: this should come from observed state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could come from metadata().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup!

Comment on lines +48 to +49
#[serde_as(as = "serde_with::Seq<(_, _)>")]
pub logs: BTreeMap<LogId, TargetLogletState>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that the scheduling decision of PPs and logs are stored together? Would it be enough if the target state of the logs is stored in Logs?

.cloned()
.choose_multiple(
&mut rand::thread_rng(),
self.config.replication.num_copies() as usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to pick the configured nodeset size and not the replication property. Otherwise a single loss of a node will make sealing impossible. If replication property is 2, then the nodeset size would be 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! I understood this at the time I wrote it, but I misused the ReplicationProperty type here. In config, I mean this to be "preferred maximum replication goal" rather than "hard requirement". But then I muddled up the two different meanings in use 😅 I think what I want ultimately is to implement a new type of SelectorStrategy that isn't Flood but more like Optimum(PreferredPlacement) but I didn't want to go there just yet. I've updated with a simple usize desired replication goal config property which will be in the next iteration.

@pcholakov pcholakov force-pushed the feat/replicated-loglet-controller branch 2 times, most recently from 05aa2b1 to 0a516e2 Compare October 9, 2024 21:06
@pcholakov pcholakov force-pushed the feat/replicated-loglet-controller branch from 0a516e2 to 568c381 Compare October 9, 2024 21:25
@pcholakov pcholakov closed this Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants