Skip to content

Action Details

Eva Tuczai edited this page Feb 7, 2022 · 37 revisions

Want to learn more about how KubeTurbo works to execute actions? Read on, but first please visit the Use Cases Overview page to have context on how our unique analysis is driving value.

Resizing (Vertical Scaling) of Containerized Workloads

Resizing action execution can be accomplished by any one of the following ways:

  1. In the running environment by KubeTurbo
  2. The user gets the action (via UI, or API) and then manually makes the change
  3. Integrate Turbonomic resize actions into your CD Pipeline / GitOps process to have your pipeline execute the change

When KubeTurbo resizes a running workload, this is essentially the equivalent of running kubectl edit workloadType foo and modifying the container specs. KubeTurbo will apply all container spec changes at the same time, minimizing the number of restarts to only 1. The workload will follow your specified deployment strategy of rolling update, etc. Once successful, you will be able to see the history of executed actions on the Workload Controller.

Is your workload managed by an Operator? You can still leverage Turbonomic to execute the resizing actions in the running environment by leveraging an Operator Resource Mapping. Samples provided here.

The Turbonomic Action Execution Framework also supports:

  • Scheduling actions
  • Integration to workflow orchestration such as ServiceNow
  • Action modes of Automated, Manual, Recommend only, and
  • Policies that can be applied to any scope of a single workload, group of workloads by namespace and/or cluster
  • Coming Soon a declarative k8s native approach to integrate into your GitOps process

StatefulSet and DaemonSet execution in a running environment is supported in version 8.3.5, but the user needs to determine the level of disruption this action can cause, and whether to leverage a pipeline schedule instead.

The key differentiator of Turbonomic resizing action is that ALL replicas that ever ran for a workload will be included in rightsizing, not just the current running pod. This history will be used in our unique analysis to generate actionable decisions that manage tradeoffs in limits and requests for performance and efficiency.

Node Provision and Suspend (Cluster Scaling)

Turbonomic will execute Node Provision (create a new node from a specific node pool or machineset) or Node Suspend (delete a node from a specific node pool or machineset) actions for you if you have one of the following capabilities:

  1. Cluster API
  2. OpenShift 4.x Machine Operator

KubeTurbo uses the apis available from these solutions to invoke creation of a node, or deletion.

Turbonomic will also in future releases provide support to scale node pools in public cloud hosted kubernetes services that require interaction with the cloud provider - such as AKS, EKS, GKE.

Node provision and suspend actions can also be seen at a Node Pool level to understand the cost impact of managing cluster capacity.

SLO Horizontal Scaling (Private Preview)

KubeTurbo can also execute actions to horizontally scale a deployment that has an SLO policy defined. KubeTurbo will execute the equivalent of kubectl scale deployment foo --replicas=x when there is an action to provision or suspend (delete) a pod.

Turbonomic Pod Moves (continuous rescheduling)

Mitigate Congestion while Defragmenting Cluster Resources This is unique to Turbonomic and warrants some details.

The Problem:

The Kubernetes scheduler does the work of scheduling a pod based on the workload specification (requests, tolerations, etc) and the assessment of the cluster’s available resources to find the best possible, compliant node. This decision is made every time a workload is placed in the scheduler’s queue.

But after the pod is scheduled, and workload demand fluctuates, there is nothing that asks the question “Is this node the best place to continue to run this workload?” The only recourse for node congestion is to wait for pods to get evicted, thereby placing them back into the queue. Eviction is not a good strategy to ensure application performance. More stateful workloads may suffer availability as pods are killed, and if the pressure is high enough, not only do all workloads suffer, but the node itself can become unusable, forcing all pods to have to get rescheduled elsewhere – if there is capacity.

Turbonomic Continuous Placement Solution

Analysis with prescriptive actions is needed to be able to manage workloads before node pressure causes eviction, while understanding fluctuating demand to determine where and when additional resources are need. Turbonomic uniquely solves this problem, by continuously analyzing workload demand to drive placement decisions that assure performance, drives efficiency, while being compliant to placement rules. Turbonomic will use 5 data dimensions of Memory usage, Memory Requests, CPU usage, CPU Requests, and Pod density, along with discovered compliance policies of node selector strategies whether node labels, taints/tolerations, explicit affinity or anti-affinity rules to make the assessment of which Pod to move when and where (which node).

Turbonomic generates Pod Move actions that will show the user the main performance, compliance or efficiency risk being mitigated, along with the impact of this and other actions on this node to show improvements in the nodes that are impacted.

Figure 1: Turbonomic Pod Move Action to Address Node VMEM Congestion

The user can also see the benefits across all the compute nodes in the cluster by seeing before and after simulation of actions executed, providing more proof of benefits to take the actions .

Figure 2: Turbonomic Projection of Node Utilization Improvement Achieved Through Taking Actions

In the event that there is no available compliant node capacity left, Turbonomic will generate a preventative and pre-emptive Node provision action that, when executed, will allow Turbonomic to move pods to this new node, assuring the node’s usage without waiting for pods to get evicted.

Pod Move Actions that Assure Availability

Turbonomic’s Pod Move actions can be executed through Turbonomic, and are designed to coordinate with the Kubernetes Controller that is managing the workload desired state and number of replicas. Turbonomic uses a mediation pod called KubeTurbo that is running within the Kubernetes Cluster to discover, gather data, and execute this action.

KubeTurbo provides pod moves through a series of steps that validates the state of the workload is running and ready. When Turbonomic executes a move, KubeTurbo will first launch a copy of the pod on the determined destination node. KubeTurbo waits for this copy to be running and ready, meeting liveliness and readiness probe goals. If the copy does not reach Ready state, then the action is gracefully failed, and the reason is logged. When the copy does reach Ready state, the workload now has N+1 replica running, assuring availability. KubeTurbo will next orchestrate deleting the original pod, and introducing the copy to the controller as the one to satisfy the desired replica count of N, through the use of labels, resulting in the original controller "adopting" the pod spun up by KubeTurbo.

This sequence of events assures availability of the workload, and smoothly handles action failure because the original pod is not deleted until the copy is ready for business!

For workloads that may not handle a replica being spun up because of RWO data access on a PV, or a backend service will not allow another replica to connect, Turbonomic will support an alternative delete – copy move action. Every action executed by Turbonomic is logged, with an audit trail of who executed the action, whether manually invoked or done under automation.

Pod Move Actions Technical Details

Review the GitHub project here for details on how the action is executed.

OpenShift Environments

In OpenShift environments, you will need to supply an argument to kubeturbo for scc context.

  • straight yamls - modify the deployment
spec:
  template:
    spec:
      containers:
        args:
          - --sccsupport: *
  • heml chart - provide this parameter -set args.sccsupport=*

  • operator - add to the CR

spec:
  args:
    sccsupport: '*' 
Pods with PVs

Pods with PVs that are RWO will need to have an alternative mechanism to be able to relocate the Pod onto another compliant node, since 2 copies of the pod will not be able to attach the same PV at the same time. For those workloads, we can first delete the original pod which would allow the copy to bind to the PV to get into a ready state.

  • straight yamls - modify the deployment
spec:
  template:
    spec:
      containers:
        args:
          - --fail-volume-pod-moves=false
  • heml chart - provide this parameter -set args.fail-volume-pod-moves=false

  • operator - add to the CR

spec:
  args:
    failVolumePodMoves: 'false' 

Kubeturbo

Introduction
  1. What's new
  2. Supported Platforms
Kubeturbo Use Cases
  1. Overview
  2. Getting Started
  3. Full Stack Management
  4. Optimized Vertical Scaling
  5. Effective Cluster Management
  6. Intelligent SLO Scaling
  7. Proactive Rescheduling
  8. Better Cost Management
  9. GitOps Integration
  10. Observability and Reporting
Kubeturbo Deployment
  1. Deployment Options Overview
  2. Prerequisites
  3. Turbonomic Server Credentials
  4. Deployment by Helm Chart
    a. Updating Kubeturbo image
  5. Deployment by Yaml
    a. Updating Kubeturbo image
  6. Deployment by Operator
    a. Updating Kubeturbo image
  7. Deployment by Red Hat OpenShift OperatorHub
    a. Updating Kubeturbo image
Kubeturbo Config Details and Custom Configurations
  1. Turbonomic Server Credentials
  2. Working with a Private Repo
  3. Node Roles: Control Suspend and HA Placement
  4. CPU Frequency Getter Job Details
  5. Logging
  6. Actions and Special Cases
Actions and how to leverage them
  1. Overview
  2. Resizing or Vertical Scaling of Containerized Workloads
    a. DeploymentConfigs with manual triggers in OpenShift Environments
  3. Node Provision and Suspend (Cluster Scaling)
  4. SLO Horizontal Scaling
  5. Turbonomic Pod Moves (continuous rescheduling)
  6. Pod move action technical details
    a. Red Hat Openshift Environments
    b. Pods with PVs
IBM Cloud Pak for Data & Kubeturbo:Evaluation Edition
Troubleshooting
  1. Startup and Connectivity Issues
  2. KubeTurbo Health Notification
  3. Logging: kubeturbo log collection and configuration options
  4. Startup or Validation Issues
  5. Stitching Issues
  6. Data Collection Issues
  7. Collect data for investigating Kubernetes deployment issue
  8. Changes to Cluster Role Names and Cluster Role Binding Names
Kubeturbo and Server version mapping
  1. Turbonomic - Kubeturbo version mappings
Clone this wiki locally