WIP: Add ReplicatedLogletController cluster scheduler component #2045

pcholakov · 2024-10-08T15:29:36Z

No description provided.

tillrohrmann

Thanks for creating this PR @pcholakov. Some quick comments from a first pass.

tillrohrmann · 2024-10-09T07:31:02Z

crates/admin/src/cluster_controller/replicated_loglet_controller.rs

+    // todo: can we replace both sets of servers above with a simple?
+    // nodes: NodesConfiguration,


I think this should be possible.

tillrohrmann · 2024-10-09T07:31:28Z

crates/admin/src/cluster_controller/replicated_loglet_controller.rs

+
+    // Extra info - is it necessary for scheduling decisions?
+    node_roles: BTreeMap<PlainNodeId, EnumSet<Role>>,
+    log_server_states: BTreeMap<PlainNodeId, StorageState>,


We should get this information from the NodesConfiguration, I believe.

tillrohrmann · 2024-10-09T07:32:57Z

crates/admin/src/cluster_controller/replicated_loglet_controller.rs

+
+/// The loglet controller is responsible for safely configuring loglet segments based on overall
+/// policy and the available log servers, and for transitioning segments to new node sets as cluster
+/// members come and go. It repeatedly decides on a loglet scheduling plan which it writes to the


I guess with loglet scheduling plan you are referring to a LogletConfig with the respective LogletParams? Or do you have something different in mind?

Exactly right, you can interpret it as referring to the output (Option<SchedulingPlan>, Vec<LogletEffect>) - and of SchedulingPlan, it only updates the logs: BTreeMap<LogId, TargetLogletState>. So it may be easier to just return the latter directly, and have the outer shell what to do with that.

This is an outdated comment from when I still thought about this as having its own memory; I think it should be possible to make it purely functional so it won't need any references to the metadata store.

tillrohrmann · 2024-10-09T07:36:48Z

crates/admin/src/cluster_controller/replicated_loglet_controller.rs

+        scheduling_plan: &SchedulingPlan, // latest schedule from metadata store
+        cluster_state: &ObservedClusterState, // observed cluster state pertinent to replicated loglets


Do we need the fields in self if we pass in this information?

This can be vastly simplified, first order of business today.

tillrohrmann · 2024-10-09T07:39:25Z

crates/admin/src/cluster_controller/replicated_loglet_controller.rs

+                        // todo: what if we previously proposed this action? how do we avoid doing this unnecessarily?
+                        effects.push(LogletEffect::AddLogletSegment);


I guess we could filter it out when trying to write the updated Logs configuration.

tillrohrmann · 2024-10-09T07:40:37Z

crates/admin/src/cluster_controller/replicated_loglet_controller.rs

+                        cluster_state
+                            .healthy_workers
+                            .iter()
+                            .to_owned()
+                            .for_each(|(node_id, _)| {
+                                nodes_config.upsert_node(NodeConfig::new(
+                                    format!("node-{}", node_id),
+                                    cluster_state.healthy_workers[node_id],
+                                    format!("unix:/tmp/my_socket-{}", node_id).parse().unwrap(),
+                                    Role::LogServer.into(),
+                                    LogServerConfig {
+                                        storage_state: cluster_state.log_server_states[node_id],
+                                    },
+                                ));
+                            });


The cluster controller won't decide which node runs as a log server. That's something that the node is started with or not.

This is a giant hack, I should have left a comment! This will be passed in 🙈

tillrohrmann · 2024-10-09T07:41:27Z

crates/admin/src/cluster_controller/replicated_loglet_controller.rs

+
+                    Some(target_state) => {
+                        // Check if loglet configuration requires any remediating actions
+                        let mut nodes_config = NodesConfiguration::default(); // todo: this should come from observed state


This could come from metadata().

tillrohrmann · 2024-10-09T07:45:10Z

crates/types/src/cluster_controller.rs

+    #[serde_as(as = "serde_with::Seq<(_, _)>")]
+    pub logs: BTreeMap<LogId, TargetLogletState>,


Is it important that the scheduling decision of PPs and logs are stored together? Would it be enough if the target state of the logs is stored in Logs?

tillrohrmann · 2024-10-09T07:49:09Z

crates/admin/src/cluster_controller/replicated_loglet_controller.rs

+                                    .cloned()
+                                    .choose_multiple(
+                                        &mut rand::thread_rng(),
+                                        self.config.replication.num_copies() as usize,


I think we need to pick the configured nodeset size and not the replication property. Otherwise a single loss of a node will make sealing impossible. If replication property is 2, then the nodeset size would be 3.

Yeah! I understood this at the time I wrote it, but I misused the ReplicationProperty type here. In config, I mean this to be "preferred maximum replication goal" rather than "hard requirement". But then I muddled up the two different meanings in use 😅 I think what I want ultimately is to implement a new type of SelectorStrategy that isn't Flood but more like Optimum(PreferredPlacement) but I didn't want to go there just yet. I've updated with a simple usize desired replication goal config property which will be in the next iteration.

… pass

pcholakov force-pushed the feat/replicated-loglet-controller branch 3 times, most recently from a547de7 to 9b4de7a Compare October 8, 2024 19:18

tillrohrmann reviewed Oct 9, 2024

View reviewed changes

pcholakov force-pushed the feat/replicated-loglet-controller branch 2 times, most recently from 05aa2b1 to 0a516e2 Compare October 9, 2024 21:06

WIP: Add ReplicatedLogletController cluster scheduler component

568c381

pcholakov force-pushed the feat/replicated-loglet-controller branch from 0a516e2 to 568c381 Compare October 9, 2024 21:25

pcholakov added 5 commits October 10, 2024 08:25

Iteration: split required from preferred replication in config, first…

000eab7

… pass

Iteration: add LogIds to outputs

2a2845c

Iteration: pass in nodes_config as part of cluster state

96e9c08

wip - broken tests

c26a489

Iteration: fixup trivial bug

01e9d10

pcholakov closed this Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add ReplicatedLogletController cluster scheduler component #2045

WIP: Add ReplicatedLogletController cluster scheduler component #2045

pcholakov commented Oct 8, 2024

tillrohrmann left a comment

tillrohrmann Oct 9, 2024

tillrohrmann Oct 9, 2024

tillrohrmann Oct 9, 2024

pcholakov Oct 10, 2024

tillrohrmann Oct 9, 2024

pcholakov Oct 10, 2024

tillrohrmann Oct 9, 2024

tillrohrmann Oct 9, 2024

pcholakov Oct 10, 2024

tillrohrmann Oct 9, 2024

pcholakov Oct 10, 2024

tillrohrmann Oct 9, 2024

tillrohrmann Oct 9, 2024

pcholakov Oct 10, 2024

		// todo: can we replace both sets of servers above with a simple?
		// nodes: NodesConfiguration,

		scheduling_plan: &SchedulingPlan, // latest schedule from metadata store
		cluster_state: &ObservedClusterState, // observed cluster state pertinent to replicated loglets

		// todo: what if we previously proposed this action? how do we avoid doing this unnecessarily?
		effects.push(LogletEffect::AddLogletSegment);

		#[serde_as(as = "serde_with::Seq<(_, _)>")]
		pub logs: BTreeMap<LogId, TargetLogletState>,

WIP: Add ReplicatedLogletController cluster scheduler component #2045

WIP: Add ReplicatedLogletController cluster scheduler component #2045

Conversation

pcholakov commented Oct 8, 2024

tillrohrmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment