[Benchmark] KubeRay memory / scalability benchmark (#1324)

KubeRay memory / scalability benchmark
ray-project · Aug 14, 2023 · 9c28b7d · 9c28b7d
1 parent 2b8947c
commit 9c28b7d
Show file tree

Hide file tree

Showing 6 changed files with 276 additions and 0 deletions.
diff --git a/benchmark/memory_benchmark/images/benchmark_architecture.png b/benchmark/memory_benchmark/images/benchmark_architecture.png
diff --git a/benchmark/memory_benchmark/images/benchmark_result.png b/benchmark/memory_benchmark/images/benchmark_result.png
diff --git a/benchmark/memory_benchmark/memory_benchmark.md b/benchmark/memory_benchmark/memory_benchmark.md
@@ -0,0 +1,84 @@
+# KubeRay memory benchmark
+
+# Running benchmark experiments on a Google GKE Cluster
+
+## Architecture
+
+![benchmark architecture](images/benchmark_architecture.png)
+
+This architecture is not a good practice, but it can fulfill the current requirements.
+
+## Step 1: Create a new Kubernetes cluster
+
+We will create a GKE cluster with autoscaling enabled.
+The following command creates a Kubernetes cluster named `kuberay-benchmark-cluster` on Google GKE.
+The cluster can scale up to 16 nodes, and each node of type `e2-highcpu-16` has 16 CPUs and 16 GB of memory.
+The following experiments may create up to ~150 Pods in the Kubernetes cluster, and each Ray Pod requires 1 CPU and 1 GB of memory.
+
+```sh
+gcloud container clusters create kuberay-benchmark-cluster \
+    --num-nodes=1 --min-nodes 0 --max-nodes 16 --enable-autoscaling \
+    --zone=us-west1-b --machine-type e2-highcpu-16
+```
+
+## Step 2: Install Prometheus and Grafana
+
+```sh
+# Path: kuberay/
+./install/prometheus/install.sh
+```
+
+Follow "Step 2: Install Kubernetes Prometheus Stack via Helm chart" in [prometheus-grafana.md](https://github.com/ray-project/kuberay/blob/master/docs/guidance/prometheus-grafana.md#step-2-install-kubernetes-prometheus-stack-via-helm-chart) to install the [kube-prometheus-stack v48.2.1](https://github.com/prometheus-community/helm-charts/tree/kube-prometheus-stack-48.2.1/charts/kube-prometheus-stack) chart and related custom resources.
+
+## Step 3: Install a KubeRay operator
+
+Follow [this document](https://github.com/ray-project/kuberay/blob/master/helm-chart/kuberay-operator/README.md) to install the latest stable KubeRay operator via Helm repository.
+
+## Step 4: Run experiments
+
+* Step 4.1: Make sure the `kubectl` CLI can connect to your GKE cluster. If not, please run `gcloud auth login`.
+* Step 4.2: Run an experiment
+  ```sh
+  # You can modify `memory_benchmark_utils` to run the experiment you want to run.
+  # (path: benchmark/memory_benchmark/scripts)
+  python3 memory_benchmark_utils.py | tee benchmark_log
+  ```
+* Step 4.3: Follow [prometheus-grafana.md](https://github.com/ray-project/kuberay/blob/master/docs/guidance/prometheus-grafana.md#step-2-install-kubernetes-prometheus-stack-via-helm-chart) to access Grafana's dashboard.
+  * Sign into the Grafana dashboard.
+  * Click on "Dashboards"
+  * Select "Kubernetes / Compute Resources / Pod"
+  * You will see the "Memory Usage" panel for the KubeRay operator Pod.
+  * Select the time range, then click on "Inspect" followed by "Data" to download the memory usage data of the KubeRay operator Pod.
+* Step 4.4: Delete all RayCluster custom resources.
+  ```sh
+  kubectl delete --all rayclusters.ray.io --namespace=default
+  ```
+* Step 4.5: Repeat Step 4.2 to Step 4.4 for other experiments.
+
+# Experiments
+
+We've designed three benchmark experiments:
+
+* Experiment 1: Launch a RayCluster with 1 head and no workers. A new cluster is initiated every 20 seconds until there are a total of 150 RayCluster custom resources.
+* Experiment 2: In the Kubernetes cluster, there is only 1 RayCluster. Add 5 new worker Pods to this RayCluster every 60 seconds until the total reaches 150 Pods.
+* Experiment 3: Create a 5-node (1 head + 4 workers) RayCluster every 60 seconds until there are 30 RayCluster custom resources.
+
+Based on [the survey](https://forms.gle/KtMLzjXcKoeSTj359) for KubeRay users, we decided to set our benchmark target at 150 Ray Pods which can cover most use cases.
+
+## Experiment results (KubeRay v0.6.0)
+
+![benchmark result](images/benchmark_result.png)
+
+* You can generate the figure by running:
+  ```sh
+  # (path: benchmark/memory_benchmark/scripts)
+  python3 experiment_figures.py
+  # The output image `benchmark_result.png` will be stored in `scripts/`.
+  ```
+
+* As shown in the figure, the memory usage of the KubeRay operator Pod is highly positively correlated to the number of Pods in the Kubernetes cluster.
+In addition, the number of custom resources in the Kubernetes cluster does not have a big impact on the memory usage.
+* Note that the x-axis "Number of Pods" is the number of Pods that are created rather than running.
+If the Kubernetes cluster does not have enough computing resources, the GKE Autopilot will add a new Kubernetes node into the cluster.
+This process may take a few minutes, so some Pods may be pending in the process.
+This may be the reason why the memory usage is somewhat throttled.
diff --git a/benchmark/memory_benchmark/scripts/experiment_figures.py b/benchmark/memory_benchmark/scripts/experiment_figures.py
@@ -0,0 +1,75 @@
+import matplotlib.pyplot as plt
+
+# [Experiment 1]:
+# Launch a RayCluster with 1 head and no workers. A new cluster is initiated every 20 seconds until
+# there are a total of 150 RayCluster custom resources.
+num_pods_diff20 = [0, 20, 40, 60, 80, 100, 120, 140, 150]
+experiment1 = [
+    20.71875,
+    23.2421875,
+    26.6015625,
+    29.453125,
+    31.25390625,
+    35.21484375,
+    34.52734375,
+    35.73046875,
+    36.19921875,
+]
+
+# [Experiment 2]
+# In the Kubernetes cluster, there is only 1 RayCluster. Add 5 new worker Pods to this
+# RayCluster every 60 seconds until the total reaches 150 Pods.
+num_pods_diff10 = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150]
+experiment2 = [
+    24.49609375,
+    24.6875,
+    25.9609375,
+    28.59375,
+    27.8984375,
+    29.15625,
+    29.7734375,
+    32.015625,
+    32.74609375,
+    33.3203125,
+    34.140625,
+    34.8515625,
+    36.75,
+    37.28125,
+    38.34375,
+    40.4453125,
+]
+# [Experiment 3]
+# Create a 5-node (1 head + 4 workers) RayCluster every 60 seconds until there are 30 RayCluster custom resources.
+experiment3 = [
+    19.7578125,
+    20.8515625,
+    22.99609375,
+    23.19921875,
+    26.0234375,
+    25.8984375,
+    26.1640625,
+    29.43359375,
+    29.0859375,
+    33.3359375,
+    32.89453125,
+    34.78125,
+    37.890625,
+    39.125,
+    39.078125,
+    41.6328125,
+]
+
+# Plotting
+plt.figure(figsize=(12, 7))
+plt.plot(num_pods_diff20, experiment1, label="Exp 1", marker="o")
+plt.plot(num_pods_diff10, experiment2, label="Exp 2", marker="o")
+plt.plot(num_pods_diff10, experiment3, label="Exp 3", marker="o")
+plt.xlabel("Number of Pods")
+plt.ylabel("Memory (MB)")
+plt.title("Memory usage vs. Number of Pods")
+plt.ylim(0, max(max(experiment1), max(experiment2), max(experiment3)) + 5)
+plt.legend()
+plt.grid(True, which="both", linestyle="--", linewidth=0.5)
+plt.tight_layout()
+plt.savefig("benchmark_result.png")
+plt.show()
diff --git a/benchmark/memory_benchmark/scripts/memory_benchmark_utils.py b/benchmark/memory_benchmark/scripts/memory_benchmark_utils.py
@@ -0,0 +1,60 @@
+"""Create RayCluster CR periodically"""
+from string import Template
+from datetime import datetime
+
+import subprocess
+import tempfile
+import time
+
+RAYCLUSTER_TEMPLATE = "ray-cluster.benchmark.yaml.template"
+
+
+def create_ray_cluster(template, cr_name, num_pods):
+    """Replace the template with the name of the RayCluster CR and create the CR"""
+    now = datetime.now()
+    current_time = now.strftime("%H:%M:%S")
+    print(
+        f"Current Time = {current_time}, RayCluster CR: {cr_name} ({num_pods} Pods) is created"
+    )
+
+    with open(template, encoding="utf-8") as ray_cluster_template:
+        template = Template(ray_cluster_template.read())
+        yamlfile = template.substitute(
+            {
+                "raycluster_name": cr_name,
+                "num_worker_pods": num_pods - 1,
+            }
+        )
+    with tempfile.NamedTemporaryFile(
+        "w", suffix="_ray_cluster_yaml"
+    ) as ray_cluster_yaml:
+        ray_cluster_yaml.write(yamlfile)
+        ray_cluster_yaml.flush()
+        # Execute a command "kubectl apply -f $ray_cluster_yaml.name"
+        command = f"kubectl apply -f {ray_cluster_yaml.name}"
+        subprocess.run(command, shell=True, check=False)
+
+
+def period_create_cr(num_cr, period, num_pods):
+    """Create RayCluster CR periodically"""
+    for i in range(num_cr):
+        create_ray_cluster(RAYCLUSTER_TEMPLATE, f"raycluster-{i}", num_pods)
+        subprocess.run("kubectl get raycluster", shell=True, check=False)
+        time.sleep(period)
+
+
+def period_update_cr(cr_name, period, diff_pods, num_iter):
+    for i in range(num_iter):
+        create_ray_cluster(RAYCLUSTER_TEMPLATE, cr_name, (i + 1) * diff_pods)
+        subprocess.run("kubectl get raycluster", shell=True, check=False)
+        time.sleep(period)
+
+
+# [Experiment 1]: Create a 1-node (1 head + 0 worker) RayCluster every 20 seconds until there are 150 RayCluster custom resources.
+period_create_cr(150, 30, 1)
+
+# [Experiment 2]: In the Kubernetes cluster, there is only 1 RayCluster. Add 5 new worker Pods to this RayCluster every 60 seconds until the total reaches 150 Pods.
+# period_update_cr("raycluster-0", 60, 5, 30)
+
+# [Experiment 3]: Create a 5-node (1 head + 4 workers) RayCluster every 60 seconds until there are 30 RayCluster custom resources.
+# period_create_cr(30, 60, 5)
diff --git a/benchmark/memory_benchmark/scripts/ray-cluster.benchmark.yaml.template b/benchmark/memory_benchmark/scripts/ray-cluster.benchmark.yaml.template
@@ -0,0 +1,57 @@
+apiVersion: ray.io/v1alpha1
+kind: RayCluster
+metadata:
+  name: '$raycluster_name'
+spec:
+  rayVersion: '2.5.0'
+  # Ray head pod template
+  headGroupSpec:
+    rayStartParams:
+      dashboard-host: '0.0.0.0'
+    template:
+      spec:
+        containers:
+        - name: ray-head
+          image: rayproject/ray:2.5.0
+          resources:
+            limits:
+              cpu: 1
+              memory: 1Gi
+            requests:
+              cpu: 1
+              memory: 1Gi
+          ports:
+          - containerPort: 6379
+            name: gcs-server
+          - containerPort: 8265
+            name: dashboard
+          - containerPort: 10001
+            name: client
+  workerGroupSpecs:
+  - replicas: $num_worker_pods
+    minReplicas: 0
+    maxReplicas: 200
+    groupName: small-group
+    rayStartParams: {}
+    template:
+      spec:
+        containers:
+        - name: ray-worker
+          image: rayproject/ray:2.5.0
+          lifecycle:
+            preStop:
+              exec:
+                command: ["/bin/sh","-c","ray stop"]
+          volumeMounts:
+            - mountPath: /tmp/ray
+              name: ray-logs
+          resources:
+            limits:
+              cpu: "1"
+              memory: "1G"
+            requests:
+              cpu: "1"
+              memory: "1G"
+        volumes:
+          - name: ray-logs
+            emptyDir: {}