Skip to content

Latest commit

 

History

History

eks-ondemand

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Demonstration Cluster for CloudPilot AI

See the AWS documentation for additional details on Amazon EKS managed node groups.

Prerequisites

Run the following command to configure the aws cli with your aws credentials:

aws configure

Usage

Download the repository:

git clone https://github.com/cloudpilot-ai/examples.git
cd examples/clusters/eks-ondemand

Execute the following command to create the EKS cluster:

terraform init
terraform apply --auto-approve

After finish the process, run the following command to get the kubeconfig:

export KUBECONFIG=~/.kube/demo
aws eks update-kubeconfig --name cluster-demonstration

Then testing the cluster:

kubectl get nodes

When you finish the testing, please run the following command to destroy the cluster:

Please note, if you have workloads deployed in your cluster, you need to remove these workloads first. Especially, services associated with load balancers (LB) may cause dependencies that are cumbersome to resolve manually if not addressed before cluster termination.

terraform destroy --auto-approve

More

If you want to test GPU nodes rebalance, please do as follows before installing cloudpilot agent:

  • Uncommand the code as follows in main.tf:
  # Please run the following command after the cluster ready
  # kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.15.0/deployments/static/nvidia-device-plugin.yml
  gpu_node_group = {
      name = "cloudpilot-gpu"
      ami_type = "AL2_x86_64_GPU"

      instance_types = ["g3s.xlarge"]

      min_size     = 0
      max_size     = 4
      desired_size = 1

      capacity_type = "ON_DEMAND"
      spot_max_price = "10"
  }
  • Init the cluster.
  • Install nvidia device plugin:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.15.0/deployments/static/nvidia-device-plugin.yml
  • Create the GPU workload.
kubectl apply -f manifest/gpu-workload.yaml