You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In operator-k8s, we need to develop a controller for reconciling the XlineCluster resources. Currently, the main implementation version coordinates the xline pods using a built-in k8s controller called StatefulSet. It manages the creation, deletion, and rebuilding of Pods, allocates stable network identifiers to pods, and binds persistent volume to pods.
Given the upcoming version updates, it might be necessary to make certain changes to the design of the Controller.
Issue
In most cases, StatefulSet as the controller for stateful services meets the requirements effectively.
However, when it comes to stateful services like xline that rely on relationships between nodes, certain behaviors of StatefulSet might not be the most optimal.
For instance, when scale down a cluster, the StatefulSet will initiate deletions starting from the one with the highest identifier number, without taking into account whether the leader exists within the nodes being removed. In this scenario, removing non-leader nodes is beneficial for enhancing the cluster's availability (This issue can be alleviated through leadership transfer extension (Section 3.10), but not completely resolved).
Solution
etcd-operator implemented a dedicated controller that maintains certain states in memory. This implementation doesn't appear to be good enough at the moment because once the operator crashes, the state will vanish.
Creating a controller similar to StatefulSet is quite challenging. Fortunately, there's a well-developed solution AdvancedStatefulSet available to address the issues mentioned earlier.
Details
We offer two implementations of the StatefulSet similar to the risingwave-operator. One is the built-in StatefulSet, and the other is the AdvancedStatefulSet by OpenKruise.
Given that AdvancedStatefulSet compats with the fields of StatefulSet, certain codes can be reused (the construction of components in StatefulSet).
Scale Up
The operator sends a scale request to the StatefulSet. Each newly started node's sidecar is responsible for adding this node to the existing cluster using membership change before starting the node (If it's the first initialized node, then there's no need for membership change).
Scale Down
In the heartbeat of the sidecar, the leader needs to be marked. If AdvancedStatefulSet is used, the operator will retain the leader node and delete the reduced number of follower nodes. If StatefulSet is used, deletion will commence from the last numbered pod.
When deleting a pod, a termination signal will be sent to the sidecar inside the pod. The sidecar needs to capture this signal to perform cleanup tasks: it will send a membership change request to the cluster to remove its own node and ultimately be deleted by the controller.
Membership Clean Task
If a sidecar crashes upon termination or due to network issues, the cluster should also proactively remove this node.
Therefore, we should introduce a membership clean task in each sidecar to accomplish this task. We will discuss the finer details of this design in the upcoming PR.
Description about the feature
Backgroud
In
operator-k8s
, we need to develop a controller for reconciling theXlineCluster
resources. Currently, the main implementation version coordinates thexline
pods using a built-in k8s controller calledStatefulSet
. It manages the creation, deletion, and rebuilding of Pods, allocates stable network identifiers to pods, and binds persistent volume to pods.Given the upcoming version updates, it might be necessary to make certain changes to the design of the Controller.
Issue
In most cases,
StatefulSet
as the controller for stateful services meets the requirements effectively.However, when it comes to stateful services like
xline
that rely on relationships between nodes, certain behaviors ofStatefulSet
might not be the most optimal.For instance, when scale down a cluster, the
StatefulSet
will initiate deletions starting from the one with the highest identifier number, without taking into account whether the leader exists within the nodes being removed. In this scenario, removing non-leader nodes is beneficial for enhancing the cluster's availability (This issue can be alleviated through leadership transfer extension (Section 3.10), but not completely resolved).Solution
etcd-operator
implemented a dedicated controller that maintains certain states in memory. This implementation doesn't appear to be good enough at the moment because once the operator crashes, the state will vanish.Creating a controller similar to StatefulSet is quite challenging. Fortunately, there's a well-developed solution AdvancedStatefulSet available to address the issues mentioned earlier.
Details
We offer two implementations of the StatefulSet similar to the risingwave-operator. One is the built-in
StatefulSet
, and the other is theAdvancedStatefulSet
by OpenKruise.Given that
AdvancedStatefulSet
compats with the fields of StatefulSet, certain codes can be reused (the construction of components inStatefulSet
).Scale Up
The operator sends a scale request to the
StatefulSet
. Each newly started node'ssidecar
is responsible for adding this node to the existing cluster usingmembership change
before starting the node (If it's the first initialized node, then there's no need formembership change
).Scale Down
In the heartbeat of the sidecar, the leader needs to be marked. If
AdvancedStatefulSet
is used, the operator will retain the leader node and delete the reduced number of follower nodes. If StatefulSet is used, deletion will commence from the last numbered pod.When deleting a pod, a termination signal will be sent to the
sidecar
inside the pod. Thesidecar
needs to capture this signal to perform cleanup tasks: it will send amembership change
request to the cluster to remove its own node and ultimately be deleted by the controller.Membership Clean Task
If a sidecar crashes upon termination or due to network issues, the cluster should also proactively remove this node.
Therefore, we should introduce a membership clean task in each sidecar to accomplish this task. We will discuss the finer details of this design in the upcoming PR.
Links
Code of Conduct
The text was updated successfully, but these errors were encountered: