-
Notifications
You must be signed in to change notification settings - Fork 74
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Kube-burner refactor + HyperShift multi-endpoint (#545)
* HyperShift multi-endpoint Wrapper to run kube-burner using the new --metrics-endpoints flag from kube-burner that allows to grab metrics and evaluate alerts from different Prometheus endpoints: For the HyperShift scenario work we need: - OBO Stack: This metrics scraped from this endpoint are metrics from Hosted CPs, such as etcd/API latencies, etc. As this endpoint is not public exposed by default, the script sets up a route to allow kube-burner to reach it. No authentication is required currently. - Management cluster Prometheus: The metrics we use from this endpoint are mgmt cluster container metrics (from the hosted control-plane namespace) and worker node metrics (these ones are required to measure the usage in the worker nodes hosting the HCP) - Hosted cluster Prometheus: From here we scrape data-plane container metrics, and also metrics from kube-state-metrics that are mostly used to count and get resources from the cluster. Signed-off-by: Raul Sevilla <rsevilla@redhat.com> * Add docs Signed-off-by: Raul Sevilla <rsevilla@redhat.com> * Add QPS, BURST and GC variables Signed-off-by: Raul Sevilla <rsevilla@redhat.com> * Update kube-apiserver metric expressions Signed-off-by: Raul Sevilla <rsevilla@redhat.com> * Bump kube-burner version Signed-off-by: Raul Sevilla <rsevilla@redhat.com> * Improve EXTRA_FLAGS docs Signed-off-by: Raul Sevilla <rsevilla@redhat.com> * Use regex for HCP_NAMESPACE Signed-off-by: Raul Sevilla <rsevilla@redhat.com> --------- Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
- Loading branch information
1 parent
f0c980d
commit 01a2258
Showing
9 changed files
with
356 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Kube-burner | ||
|
||
The `./run.sh` script is just a small wrapper on top of kube-burner to be used as entrypoint of some of its flags. The supported workloads are described in the [OpenShift OCP wrapper section](https://kube-burner.readthedocs.io/en/latest/ocp/) of the kube-burner docs. | ||
|
||
In order to run a workload you have to set the `WORKLOAD` environment variable to one of the workloads supported by kube-burner. Example | ||
|
||
```shell | ||
$ ITERATIONS=5 WORKLOAD=cluster-density-v2 ./run.sh | ||
/tmp/kube-burner ocp cluster-density-v2 --log-level=info --iterations=5 --churn=true --es-server=https://search-perfscale-dev-chmf5l4sh66lvxbnadi4bznl3a.us-west-2.es.amazonaws.com --es-index=ripsaw-kube-burner --qps=20 --burst=20 | ||
INFO[2023-03-13 16:39:57] 📁 Creating indexer: elastic | ||
INFO[2023-03-13 16:39:59] 👽 Initializing prometheus client with URL: <truncated> | ||
INFO[2023-03-13 16:40:00] 🔔 Initializing alert manager for prometheus: <truncated> | ||
INFO[2023-03-13 16:40:00] 🔥 Starting kube-burner (1.4.3@a575df584a6b520a45e2fe7903e608a34e722e5f) with UUID 69022407-7c55-4b8a-add2-5e40e6b4c593 | ||
INFO[2023-03-13 16:40:00] 📈 Creating measurement factory | ||
INFO[2023-03-13 16:40:00] Registered measurement: podLatency | ||
INFO[2023-03-13 16:40:00] Job cluster-density-v2: 5 iterations with 1 ImageStream replicas | ||
INFO[2023-03-13 16:40:00] Job cluster-density-v2: 5 iterations with 1 Build replicas | ||
INFO[2023-03-13 16:40:00] Job cluster-density-v2: 5 iterations with 3 Deployment replicas | ||
INFO[2023-03-13 16:40:00] Job cluster-density-v2: 5 iterations with 2 Deployment replicas | ||
<truncated> | ||
``` | ||
|
||
## Environment variables | ||
|
||
This wrapper supports some variables to tweak some basic parameters of the workloads: | ||
|
||
- **ES_SERVER**: De***REMOVED***nes the ElasticSearch/OpenSearch endpoint. By default it points the development instance. | ||
- **ES_INDEX**: De***REMOVED***nes the ElasticSearch/OpenSearch index name. By default `ripsaw-kube-burner` | ||
- **QPS** and **BURST**: De***REMOVED***nes client-go QPS and BURST parameters for kube-burner. 20 by default | ||
- **GC**: Garbage collect created namespaces. true by default | ||
- **EXTRA_FLAGS**: Extra flags that will be appended to the underlying kube-burner ocp command, by default empty. | ||
|
||
### Using the EXTRA_FLAGS variable | ||
|
||
All the flags that can be appeneded through the `EXTRA_FLAGS` variable can be found in the [kube-burner docs](https://kube-burner.readthedocs.io/en/latest/ocp/) | ||
For example, we can tweak the churning behaviour of the cluster-density workload with: | ||
|
||
```shell | ||
$ export EXTRA_FLAGS="--churn-duration=1d --churn-percent=5 --churn-delay=5m" | ||
$ ITERATIONS=500 WORKLOAD=cluster-density-v2 ./run.sh | ||
``` | ||
|
||
Or disable the namespace garbage collection: | ||
|
||
``` | ||
$ EXTRA_FLAGS="--gc=false" ITERATIONS=500 WORKLOAD=cluster-density-v2 ./run.sh | ||
``` | ||
|
||
|
||
### Cluster-density and cluster-density-v2 | ||
|
||
- **ITERATIONS**: De***REMOVED***nes the number of iterations of the workload to run. No default value | ||
- **CHURN**: Enables workload churning. Workload churning is enabled by default with `churn-duration=1h`, `churn-delay=2m` and `churn-percent=10`. These parameters can be tuned through the `EXTRA_FLAGS` variable as noted previously. | ||
|
||
## HyperShift | ||
|
||
It's possible to use this script with HyperShift hosted clusters. The particularity of this is that kube-burner will grab metrics from different Prometheus endpoints: | ||
|
||
- Hosted control-plane stack or OBO: Hosted control-plane application metrics such as etcd, API latencies, etc. | ||
- Management cluster stack: Hardware utilization metrics from its worker nodes and hosted control-plane pods. | ||
- Hosted cluster stack: From this endpoint kube-burner collects data-plane metrics. | ||
|
||
In order to use it, the hosted cluster kubecon***REMOVED***g must be set upfront. These environment variables are also required: | ||
|
||
- **MC_KUBECONFIG**: This variable points to the valid management cluster kubecon***REMOVED***g |
7 changes: 7 additions & 0 deletions
7
workloads/kube-burner-ocp-wrapper/alerts-profiles/hosted-cluster-alerts.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
- expr: up{job=~"crio|kubelet"} == 0 | ||
description: "{{$labels.node}}/{{$labels.job}} down" | ||
severity: warning | ||
|
||
- expr: up{job="ovnkube-node"} == 0 | ||
description: "{{$labels.instance}}/{{$labels.pod}} {{$labels.job}} down" | ||
severity: warning |
36 changes: 36 additions & 0 deletions
36
workloads/kube-burner-ocp-wrapper/alerts-profiles/hosted-cp-alerts.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# etcd | ||
|
||
- expr: avg_over_time(histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}"}[2m]))[10m:]) > 0.01 | ||
description: 10 minutes avg. 99th etcd fsync latency on {{$labels.pod}} higher than 10ms. {{$value}}s | ||
severity: warning | ||
|
||
- expr: avg_over_time(histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}"}[2m]))[10m:]) > 0.03 | ||
description: 10 minutes avg. 99th etcd commit latency on {{$labels.pod}} higher than 30ms. {{$value}}s | ||
severity: warning | ||
|
||
- expr: rate(etcd_server_leader_changes_seen_total{namespace=~".+{{.HCP_NAMESPACE}}"}[2m]) > 0 | ||
description: etcd leader changes observed | ||
severity: warning | ||
|
||
# API server | ||
- expr: avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}", apiserver="kube-apiserver", verb=~"POST|PUT|DELETE|PATCH", subresource!~"log|exec|portforward|attach|proxy"}[2m])) by (le, resource, verb))[10m:]) > 1 | ||
description: 10 minutes avg. 99th mutating API call latency for {{$labels.verb}}/{{$labels.resource}} higher than 1 second. {{$value}}s | ||
severity: warning | ||
|
||
- expr: avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}", apiserver="kube-apiserver", verb=~"LIST|GET", subresource!~"log|exec|portforward|attach|proxy", scope="resource"}[2m])) by (le, resource, verb, scope))[5m:]) > 1 | ||
description: 5 minutes avg. 99th read-only API call latency for {{$labels.verb}}/{{$labels.resource}} in scope {{$labels.scope}} higher than 1 second. {{$value}}s | ||
severity: warning | ||
|
||
- expr: avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}", apiserver="kube-apiserver", verb=~"LIST|GET", subresource!~"log|exec|portforward|attach|proxy", scope="namespace"}[2m])) by (le, resource, verb, scope))[5m:]) > 5 | ||
description: 5 minutes avg. 99th read-only API call latency for {{$labels.verb}}/{{$labels.resource}} in scope {{$labels.scope}} higher than 5 seconds. {{$value}}s | ||
severity: warning | ||
|
||
- expr: avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}", apiserver="kube-apiserver", verb=~"LIST|GET", subresource!~"log|exec|portforward|attach|proxy", scope="cluster"}[2m])) by (le, resource, verb, scope))[5m:]) > 30 | ||
description: 5 minutes avg. 99th read-only API call latency for {{$labels.verb}}/{{$labels.resource}} in scope {{$labels.scope}} higher than 30 seconds. {{$value}}s | ||
severity: warning | ||
|
||
# Control plane pods | ||
|
||
- expr: up{namespace=~".+{{.HCP_NAMESPACE}}", job=~"kube-controller-manager|openshift-controller-manager|ovnkube-master|etcd-clien|openshift-apiserver"} == 0 | ||
description: "{{$labels.namespace}}/{{$labels.pod}} down" | ||
severity: error |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
- endpoint: {{.MC_OBO}} | ||
pro***REMOVED***le: metrics-pro***REMOVED***les/hosted-cp-metrics.yml | ||
alertPro***REMOVED***le: alerts-pro***REMOVED***les/hosted-cp-alerts.yml | ||
- endpoint: {{.MC_PROMETHEUS}} | ||
token: {{.MC_PROMETHEUS_TOKEN}} | ||
pro***REMOVED***le: metrics-pro***REMOVED***les/mc-metrics.yml | ||
- endpoint: {{.HOSTED_PROMETHEUS}} | ||
token: {{.HOSTED_PROMETHEUS_TOKEN}} | ||
pro***REMOVED***le: metrics-pro***REMOVED***les/hosted-cluster-metrics.yml | ||
alertPro***REMOVED***le: alerts-pro***REMOVED***les/hosted-cluster-alerts.yml |
76 changes: 76 additions & 0 deletions
76
workloads/kube-burner-ocp-wrapper/metrics-profiles/hosted-cluster-metrics.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# Hosted cluster metrics | ||
# Collected metrics about CPU/memory usage in worker and infra nodes | ||
# Average container CPU and memory | ||
# kube_state_metrics | ||
# Containers & pod metrics | ||
|
||
- query: (avg(irate(container_cpu_usage_seconds_total{name!="",container!="POD",namespace=~"openshift-(sdn|ovn-kubernetes|ingress)"}[2m]) * 100 and on (node) kube_node_role{role="worker"}) by (namespace, pod, container)) > 0 | ||
metricName: containerCPU-Workers | ||
|
||
- query: (sum(irate(container_cpu_usage_seconds_total{name!="",container!="POD",namespace=~"openshift-(monitoring|sdn|ovn-kubernetes|ingress|image-registry)"}[2m]) * 100) by (container, pod, namespace, node) and on (node) kube_node_role{role="infra"}) > 0 | ||
metricName: containerCPU-Infra | ||
|
||
- query: avg(container_memory_rss{name!="",container!="POD",namespace=~"openshift-(sdn|ovn-kubernetes|ingress)"} and on (node) kube_node_role{role="worker"}) by (pod, container, namespace) | ||
metricName: containerMemory-Workers | ||
|
||
- query: (sum(container_memory_rss{name!="",container!="POD",namespace=~"openshift-(sdn|ovn-kubernetes|ingress|monitoring|image-registry)"}) by (container, pod, namespace, node) and on (node) kube_node_role{role="infra"}) > 0 | ||
metricName: containerMemory-Infra | ||
|
||
# Node metrics: CPU & Memory | ||
|
||
- query: sum(irate(node_cpu_seconds_total{}[2m])) by (mode,instance) and on (instance) bottomk(5,avg_over_time((sum(irate(node_cpu_seconds_total{mode="idle"}[2m])) by (mode,instance) and on (instance) label_replace(kube_node_role{role="worker"}, "instance", "$1", "node", "(.+)"))[{{ .elapsed }}:])) | ||
metricName: nodeCPU-Workers | ||
|
||
# Management Node metrics: CPU & Memory | ||
- query: (sum(irate(node_cpu_seconds_total[2m])) by (mode,instance) and on (instance) label_replace(kube_node_role{role="infra"}, "instance", "$1", "node", "(.+)")) > 0 | ||
metricName: nodeCPU-Infra | ||
|
||
- query: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) and on (instance) bottomk(5,min_over_time((irate(node_memory_MemAvailable_bytes[2m]) and on (instance) label_replace(kube_node_role{role="worker"}, "instance", "$1", "node", "(.+)"))[{{ .elapsed }}:])) | ||
metricName: nodeMemoryUtilization-Workers | ||
|
||
- query: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) and on (instance) label_replace(kube_node_role{role="infra"}, "instance", "$1", "node", "(.+)") | ||
metricName: nodeMemoryUtilization-Infra | ||
|
||
# Cluster metrics | ||
|
||
- query: sum(kube_namespace_status_phase) by (phase) > 0 | ||
metricName: namespaceCount | ||
|
||
- query: sum(kube_pod_status_phase{}) by (phase) | ||
metricName: podStatusCount | ||
|
||
- query: count(kube_secret_info{}) | ||
metricName: secretCount | ||
instant: true | ||
|
||
- query: count(kube_deployment_labels{}) | ||
metricName: deploymentCount | ||
instant: true | ||
|
||
- query: count(kube_con***REMOVED***gmap_info{}) | ||
metricName: con***REMOVED***gmapCount | ||
instant: true | ||
|
||
- query: count(kube_service_info{}) | ||
metricName: serviceCount | ||
instant: true | ||
|
||
- query: kube_node_role | ||
metricName: nodeRoles | ||
|
||
- query: sum(kube_node_status_condition{status="true"}) by (condition) | ||
metricName: nodeStatus | ||
|
||
# Kubelet & CRI-O runtime metrics | ||
|
||
- query: irate(process_cpu_seconds_total{service="kubelet",job="kubelet"}[2m]) * 100 and on (node) topk(5,avg_over_time(irate(process_cpu_seconds_total{service="kubelet",job="kubelet"}[2m])[{{ .elapsed }}:]) and on (node) kube_node_role{role="worker"}) | ||
metricName: kubeletCPU | ||
|
||
- query: process_resident_memory_bytes{service="kubelet",job="kubelet"} and on (node) topk(5,max_over_time(irate(process_resident_memory_bytes{service="kubelet",job="kubelet"}[2m])[{{ .elapsed }}:]) and on (node) kube_node_role{role="worker"}) | ||
metricName: kubeletMemory | ||
|
||
- query: irate(process_cpu_seconds_total{service="kubelet",job="crio"}[2m]) * 100 and on (node) topk(5,avg_over_time(irate(process_cpu_seconds_total{service="kubelet",job="crio"}[2m])[{{ .elapsed }}:]) and on (node) kube_node_role{role="worker"}) | ||
metricName: crioCPU | ||
|
||
- query: process_resident_memory_bytes{service="kubelet",job="crio"} and on (node) topk(5,max_over_time(irate(process_resident_memory_bytes{service="kubelet",job="crio"}[2m])[{{ .elapsed }}:]) and on (node) kube_node_role{role="worker"}) | ||
metricName: crioMemory |
46 changes: 46 additions & 0 deletions
46
workloads/kube-burner-ocp-wrapper/metrics-profiles/hosted-cp-metrics.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Hosted control-plane metrics | ||
# All these metrics should use the namespace=~".+{{.HCP_NAMESPACE}}" ***REMOVED***lter | ||
# Collected metrics about API, OVN, etcd and cluster_version provided by the CVO | ||
|
||
# API server | ||
|
||
- query: irate(apiserver_request_total{namespace=~".+{{.HCP_NAMESPACE}}", verb="POST", resource="pods", subresource="binding",code="201"}[2m]) > 0 | ||
metricName: schedulingThroughput | ||
|
||
- query: histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}", job="kube-apiserver", verb=~"LIST|GET", subresource!~"log|exec|portforward|attach|proxy"}[2m])) by (le, resource, verb, scope)) > 0 | ||
metricName: readOnlyAPICallsLatency | ||
|
||
- query: histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}", job="kube-apiserver", verb=~"POST|PUT|DELETE|PATCH", subresource!~"log|exec|portforward|attach|proxy"}[2m])) by (le, resource, verb, scope)) > 0 | ||
metricName: mutatingAPICallsLatency | ||
|
||
- query: sum(irate(apiserver_request_total{namespace=~".+{{.HCP_NAMESPACE}}", job="kube-apiserver",verb!="WATCH"}[2m])) by (verb,resource,instance) > 0 | ||
metricName: APIRequestRate | ||
|
||
# OVN service sync latency | ||
|
||
- query: histogram_quantile(0.99, sum(rate(ovnkube_master_network_programming_duration_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}", kind="service"}[2m])) by (le)) | ||
metricName: serviceSyncLatency | ||
|
||
# Etcd metrics | ||
|
||
- query: sum(rate(etcd_server_leader_changes_seen_total{namespace=~".+{{.HCP_NAMESPACE}}"}[2m])) | ||
metricName: etcdLeaderChangesRate | ||
|
||
- query: histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}"}[2m])) | ||
metricName: 99thEtcdDiskBackendCommitDurationSeconds | ||
|
||
- query: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}"}[2m])) | ||
metricName: 99thEtcdDiskWalFsyncDurationSeconds | ||
|
||
- query: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket{namespace=~".+{{.HCP_NAMESPACE}}"}[5m])) | ||
metricName: 99thEtcdRoundTripTimeSeconds | ||
|
||
- query: sum by (cluster_version)(etcd_cluster_version) | ||
metricName: etcdVersion | ||
instant: true | ||
|
||
# Cluster version | ||
|
||
- query: cluster_version{type="completed", namespace=~".+{{.HCP_NAMESPACE}}"} | ||
metricName: clusterVersion | ||
instant: true |
18 changes: 18 additions & 0 deletions
18
workloads/kube-burner-ocp-wrapper/metrics-profiles/mc-metrics.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Management cluster metrics | ||
# They should be pref***REMOVED***xed with mgmt- to distinguish from the hosted cluster ones | ||
# Only collecting container and worker nodes CPU/Memory metrics | ||
|
||
# ControlPlane Containers & pod metrics | ||
|
||
- query: (sum(irate(container_cpu_usage_seconds_total{name!="",container!="POD",namespace=~".+{{.HCP_NAMESPACE}}"}[2m]) * 100) by (container, pod, namespace, node)) > 0 | ||
metricName: mgmt-containerCPU | ||
|
||
- query: sum(container_memory_rss{name!="",container!="POD",namespace=~".+{{.HCP_NAMESPACE}}"}) by (container, pod, namespace, node) | ||
metricName: mgmt-containerMemory | ||
|
||
# Management Node metrics: CPU & Memory | ||
- query: (sum(irate(node_cpu_seconds_total[2m])) by (mode,instance) and on (instance) label_replace(kube_node_role{role="worker"}, "instance", "$1", "node", "(.+)")) > 0 | ||
metricName: mgmt-nodeCPU-Workers | ||
|
||
- query: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) and on (instance) label_replace(kube_node_role{role="worker"}, "instance", "$1", "node", "(.+)") | ||
metricName: mgmt-nodeMemoryUtilization-Workers |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
labels: | ||
app.kubernetes.io/instance: hypershift-monitoring-stack | ||
name: prometheus-hypershift-monitoring-stack | ||
namespace: openshift-observability-operator | ||
spec: | ||
ports: | ||
- port: 9090 | ||
protocol: TCP | ||
targetPort: web | ||
selector: | ||
app.kubernetes.io/instance: hypershift-monitoring-stack | ||
--- | ||
apiVersion: route.openshift.io/v1 | ||
kind: Route | ||
metadata: | ||
name: prometheus-hypershift | ||
namespace: openshift-observability-operator | ||
spec: | ||
port: | ||
targetPort: 9090 | ||
to: | ||
name: prometheus-hypershift-monitoring-stack |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
***REMOVED*** -e | ||
|
||
set -e | ||
|
||
ES_SERVER=${ES_SERVER:-https://search-perfscale-dev-chmf5l4sh66lvxbnadi4bznl3a.us-west-2.es.amazonaws.com} | ||
LOG_LEVEL=${LOG_LEVEL:-info} | ||
KUBE_BURNER_VERSION=${KUBE_BURNER_VERSION:-1.5} | ||
CHURN=${CHURN:-true} | ||
WORKLOAD=${WORKLOAD:?} | ||
QPS=${QPS:-20} | ||
BURST=${BURST:-20} | ||
GC=${GC:-true} | ||
EXTRA_FLAGS=${EXTRA_FLAGS:-} | ||
|
||
download_binary(){ | ||
KUBE_BURNER_URL=https://github.com/cloud-bulldozer/kube-burner/releases/download/v${KUBE_BURNER_VERSION}/kube-burner-${KUBE_BURNER_VERSION}-Linux-x86_64.tar.gz | ||
curl -sS -L ${KUBE_BURNER_URL} | tar -xzC /tmp/ kube-burner | ||
} | ||
|
||
hypershift(){ | ||
echo "HyperShift detected" | ||
# Get hosted cluster ID and name | ||
HC_ID=$(oc get infrastructure cluster -o go-template --template='{{.status.infrastructureName}}') | ||
HC_NAME=$(oc get infrastructure cluster -o go-template --template='{{range .status.platformStatus.aws.resourceTags}}{{if eq .key "api.openshift.com/name" }}{{.value}}{{end}}{{end}}') | ||
|
||
if [[ -z ${HC_ID} ]] || [[ -z ${HC_NAME} ]]; then | ||
echo "Couldn't obtain hosted cluster id and/or hosted cluster name" | ||
echo -e "HC_ID: ${HC_ID}\nHC_NAME: ${HC_NAME}" | ||
exit 1 | ||
***REMOVED*** | ||
|
||
# Hosted control-plane namespace is composed by the cluster ID plus the cluster name | ||
HCP_NAMESPACE=${HC_ID}-${HC_NAME} | ||
|
||
echo "Creating OBO route" | ||
oc --kubecon***REMOVED***g=${MC_KUBECONFIG} apply -f obo-route.yml | ||
echo "Fetching OBO endpoint" | ||
MC_OBO=http://$(oc --kubecon***REMOVED***g=${MC_KUBECONFIG} get route -n openshift-observability-operator prometheus-hypershift -o jsonpath="{.spec.host}") | ||
MC_PROMETHEUS=https://$(oc --kubecon***REMOVED***g=${MC_KUBECONFIG} get route -n openshift-monitoring prometheus-k8s -o jsonpath="{.spec.host}") | ||
MC_PROMETHEUS_TOKEN=$(oc --kubecon***REMOVED***g=${MC_KUBECONFIG} sa new-token -n openshift-monitoring prometheus-k8s) | ||
HOSTED_PROMETHEUS=https://$(oc get route -n openshift-monitoring prometheus-k8s -o jsonpath="{.spec.host}") | ||
HOSTED_PROMETHEUS_TOKEN=$(oc sa new-token -n openshift-monitoring prometheus-k8s) | ||
echo "Exporting required vars" | ||
cat << EOF | ||
MC_OBO: ${MC_OBO} | ||
MC_PROMETHEUS: ${MC_PROMETHEUS} | ||
MC_PROMETHEUS_TOKEN: <truncated> | ||
HOSTED_PROMETHEUS: ${HOSTED_PROMETHEUS} | ||
HOSTED_PROMETHEUS_TOKEN: <truncated> | ||
HCP_NAMESPACE: ${HCP_NAMESPACE} | ||
EOF | ||
export MC_OBO MC_PROMETHEUS MC_PROMETHEUS_TOKEN HOSTED_PROMETHEUS HOSTED_PROMETHEUS_TOKEN HCP_NAMESPACE | ||
} | ||
|
||
download_binary | ||
cmd="/tmp/kube-burner ocp ${WORKLOAD} --log-level=${LOG_LEVEL} --qps=${QPS} --burst=${BURST} --gc=${GC}" | ||
if [[ ${WORKLOAD} =~ "cluster-density" ]]; then | ||
ITERATIONS=${ITERATIONS:?} | ||
cmd+=" --iterations=${ITERATIONS} --churn=${CHURN}" | ||
***REMOVED*** | ||
if [[ -n ${MC_KUBECONFIG} ]]; then | ||
cmd+=" --metrics-endpoint=metrics-endpoint.yml" | ||
hypershift | ||
***REMOVED*** | ||
# If ES_SERVER is speci***REMOVED***ed | ||
if [[ -n ${ES_SERVER} ]]; then | ||
cmd+=" --es-server=${ES_SERVER} --es-index=ripsaw-kube-burner" | ||
***REMOVED*** | ||
cmd+=" ${EXTRA_FLAGS}" | ||
echo $cmd | ||
exec $cmd |