-
Notifications
You must be signed in to change notification settings - Fork 75
Action Details
Want to learn more about how KubeTurbo works to execute actions? Read on, but first please visit the Use Cases Overview page to have context on how our unique analysis is driving value.
- Resizing or Vertical Scaling of Containerized Workloads
- Node Provision and Suspend (Cluster Scaling)
- SLO Horizontal Scaling
- Turbonomic Pod Moves (continuous rescheduling) and technical details
Resizing action execution can be accomplished by any one of the following ways:
- In the running environment by KubeTurbo
- The user gets the action (via UI, or API) and then manually makes the change
- Integrate Turbonomic resize actions into your CD Pipeline / GitOps process to have your pipeline execute the change
When KubeTurbo resizes a running workload, this is essentially the equivalent of running kubectl edit workloadType foo
and modifying the container specs. KubeTurbo will apply all container spec changes at the same time, minimizing the number of restarts to only 1. The workload will follow your specified deployment strategy of rolling update, etc. Once successful, you will be able to see the history of executed actions on the Workload Controller
.
Is your workload managed by an Operator? You can still leverage Turbonomic to execute the resizing actions in the running environment by leveraging an Operator Resource Mapping. Samples provided here.
The Turbonomic Action Execution Framework also supports:
- Scheduling actions
- Integration to workflow orchestration such as ServiceNow
- Action modes of Automated, Manual, Recommend only, and
- Policies that can be applied to any scope of a single workload, group of workloads by namespace and/or cluster
- A declarative k8s native approach to integrate into your GitOps process
StatefulSet and DaemonSet execution in a running environment is supported in version 8.3.5, but the user needs to determine the level of disruption this action can cause, and whether to leverage a pipeline schedule instead.
The key differentiator of Turbonomic resizing action is that ALL replicas that ever ran for a workload will be included in rightsizing, not just the current running pod. This history will be used in our unique analysis to generate actionable decisions that manage tradeoffs in limits and requests for performance and efficiency.
Turbonomic will analyze the demand of all workloads, and make decisions when clusters need more or less nodes based on looking at the tradeoffs of efficiency (can workload safely consolidate), performance (AVOIDING node congestion) and compliance (understanding node selection policies). Relying on pod pending is too late, and wastes resources. Turbonomic allows you to save money by not over-provisioning, while at the same time assuring that pods can run on nodes that are not congested.
You can also execute Node Provision (create a new node from a specific node pool or machineset) or Node Suspend (delete a node from a specific node pool or machineset) actions directly from Turbonomic if you have one of the following capabilities:
- Cluster API
- OpenShift 4.x Machine Operator
- AKS node pool (with the Azure subscription as a target)
Turbonomic uses the apis available from these solutions to invoke creation of a node, or deletion.
Node provision and suspend actions can also be seen at a Node Pool level to understand the cost impact of managing cluster capacity.
Turbonomic does allow a user to execute any action through our action framework to make calls to third party workflow orchestrators such as Service Now, Ansible and Terraform. Consult Turbonomic User Guide for more information.
Coming soon: Turbonomic is working on Google and AWS support in future releases to scale node pools in these public cloud hosted kubernetes services for GKE and EKS.
KubeTurbo can also execute actions to horizontally scale a deployment that has an SLO policy defined. KubeTurbo will execute the equivalent of kubectl scale deployment foo --replicas=x
when there is an action to provision or suspend (delete) a pod.
Requirements:
- SLO Metrics of response time or transaction throughput / rate metric is captured either with Instana as a target, or the use of custom metrics taken from Prometheus via Prometurbo.
- define an SLO Policy to set your targeted SLO, min / max replicas
- Set your Policy either through the Turbo UI, or leverage a Kubeturbo SLO Custom Resource Definitions. Define your SLO policy in one CR, and the services you want to bind them to in another policy binding CR
Key differentiator is Turbonomic will not only recommend and automate number of replicas needed to maintain SLO, but the analysis will also tell you how many nodes are required to run these replicas along with the other workload in the cluster. Smart scaling with intelligent pod placement.
Read the blog here
- Manage horizontal scaling of services without thresholds
- Manage the trade-offs of performance, availability of resources, and compliance
- Leverage your SLO data to add response time and throughput leveraging telemetry data collected - Istio, Prometheus, etc
Mitigate Congestion while Defragmenting Cluster Resources This is unique to Turbonomic and warrants some details.
The Kubernetes scheduler does the work of scheduling a pod based on the workload specification (requests, tolerations, etc) and the assessment of the cluster’s available resources to find the best possible, compliant node. This decision is made every time a workload is placed in the scheduler’s queue.
But after the pod is scheduled, and workload demand fluctuates, there is nothing that asks the question “Is this node the best place to continue to run this workload?” The only recourse for node congestion is to wait for pods to get evicted, thereby placing them back into the queue. Eviction is not a good strategy to ensure application performance. More stateful workloads may suffer availability as pods are killed, and if the pressure is high enough, not only do all workloads suffer, but the node itself can become unusable, forcing all pods to have to get rescheduled elsewhere – if there is capacity.
Analysis with prescriptive actions is needed to be able to manage workloads before node pressure causes eviction, while understanding fluctuating demand to determine where and when additional resources are need. Turbonomic uniquely solves this problem, by continuously analyzing workload demand to drive placement decisions that assure performance, drives efficiency, while being compliant to placement rules. Turbonomic will use 5 data dimensions of Memory usage, Memory Requests, CPU usage, CPU Requests, and Pod density, along with discovered compliance policies of node selector strategies whether node labels, taints/tolerations, explicit affinity or anti-affinity rules to make the assessment of which Pod to move when and where (which node).
Turbonomic generates Pod Move actions that will show the user the main performance, compliance or efficiency risk being mitigated, along with the impact of this and other actions on this node to show improvements in the nodes that are impacted.
Figure 1: Turbonomic Pod Move Action to Address Node VMEM Congestion
The user can also see the benefits across all the compute nodes in the cluster by seeing before and after simulation of actions executed, providing more proof of benefits to take the actions .
Figure 2: Turbonomic Projection of Node Utilization Improvement Achieved Through Taking Actions
In the event that there is no available compliant node capacity left, Turbonomic will generate a preventative and pre-emptive Node provision action that, when executed, will allow Turbonomic to move pods to this new node, assuring the node’s usage without waiting for pods to get evicted.
Turbonomic’s Pod Move actions can be executed through Turbonomic, and are designed to coordinate with the Kubernetes Controller that is managing the workload desired state and number of replicas. Turbonomic uses a mediation pod called KubeTurbo that is running within the Kubernetes Cluster to discover, gather data, and execute this action.
KubeTurbo provides pod moves through a series of steps that validates the state of the workload is running and ready. When Turbonomic executes a move, KubeTurbo will first launch a copy of the pod on the determined destination node. KubeTurbo waits for this copy to be running and ready, meeting liveliness and readiness probe goals. If the copy does not reach Ready state, then the action is gracefully failed, and the reason is logged. When the copy does reach Ready state, the workload now has N+1 replica running, assuring availability. KubeTurbo will next orchestrate deleting the original pod, and introducing the copy to the controller as the one to satisfy the desired replica count of N, through the use of labels, resulting in the original controller "adopting" the pod spun up by KubeTurbo.
This sequence of events assures availability of the workload, and smoothly handles action failure because the original pod is not deleted until the copy is ready for business!
For workloads that may not handle a replica being spun up because of RWO data access on a PV, or a backend service will not allow another replica to connect, Turbonomic will support an alternative delete – copy move action. Every action executed by Turbonomic is logged, with an audit trail of who executed the action, whether manually invoked or done under automation.
Want to know how it works? Review the GitHub project here for details on how the action is executed.
Analysis and Execution: Turbonomic will recommend workload move actions taking the following into account:
- DaemonSet and Mirror (Node controller) pods will not move. The analysis will never recommend moving these pods.
- StatefulSets by default will fail because of the attached PV will likely not allow 2 consumers to RW at the same time, and today kubeturbo cannot determine if a StatefulSet can function with 2 copies running at the same time. The user can either set up a Turbo Automation Policy on Container Pods and select the default group of all StatefulSets (per cluster), and set Move actions to "Do Not Generate" or consider the alternate Move action execution order of operations to be enabled. See Pods With PVs below.
- Custom Controllers may fail execution if Kubeturbo cannot spin up a copy, and may require support of the controller. You can open a bug with Turbonomic. Other workload types with PVs. These moves will fail for the same reason as StatefulSets. Consider the alternate Move action execution order of operations Pods With PVs.
Turbonomic will wait for a workload to reach its own readiness value before deleting the original pod and the action is considered successful. Failed actions will occur if for other reasons the moved (copy) pod does not reach readiness, and the move will fail leaving the original in place.
In OpenShift environments, you will need to supply an argument to kubeturbo for scc context, or all moves will fail.
- straight yamls - modify the
deployment
spec:
template:
spec:
containers:
args:
- --sccsupport:*
-
helm chart - provide this parameter
-set args.sccsupport=*
-
operator - add to the CR
spec:
args:
sccsupport: '*'
Deploymentconfigs in openshift can be configured with triggers and in some cases the specified trigger might require rollouts manually (Read the warning). Additionally the imageChange
trigger type also does not roll out the deploymentconfig on any changes to the spec fields. As turbos container resize action executions update the parents only and expects the in cluster resource controllers to ensure the running pods are in sync with the pod template specified in the parent controller, the deploymentconfigs with the above listed triggers after a turbo action will not result in update of the pods.
Although this is consistent with turbo's action execution mechanism, users have an option to configure kubeturbo to let it additionally roll out the updated deploymentconfig, if kubeturbo finds the triggers configured to the above stated types.
To enable this behaviour below feature gate needs to be enabled in kubeturbo configmap (disabled by default):
"featureGates": {
"ForceDeploymentConfigRollout": true
}
Pods with PVs that are RWO will need to have an alternative mechanism to be able to relocate the Pod onto another compliant node, since 2 copies of the pod will not be able to attach the same PV at the same time. For those workloads, we can first delete the original pod which would allow the copy to bind to the PV to get into a ready state.
- straight yamls - modify the
deployment
spec:
template:
spec:
containers:
args:
- --fail-volume-pod-moves=false
-
heml chart - provide this parameter
-set args.failVolumePodMoves=false
-
operator - add to the CR
spec:
args:
failVolumePodMoves: 'false'
Introduction
Kubeturbo Use Cases
Kubeturbo Deployment
Kubeturbo Config Details and Custom Configurations
Actions and how to leverage them
- Overview
-
Resizing or Vertical Scaling of Containerized Workloads
a. DeploymentConfigs with manual triggers in OpenShift Environments - Node Provision and Suspend (Cluster Scaling)
- SLO Horizontal Scaling
- Turbonomic Pod Moves (continuous rescheduling)
-
Pod move action technical details
a. Red Hat Openshift Environments
b. Pods with PVs
IBM Cloud Pak for Data & Kubeturbo:Evaluation Edition
Troubleshooting
- Startup and Connectivity Issues
- KubeTurbo Health Notification
- Logging: kubeturbo log collection and configuration options
- Startup or Validation Issues
- Stitching Issues
- Data Collection Issues
- Collect data for investigating Kubernetes deployment issue
- Changes to Cluster Role Names and Cluster Role Binding Names