Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube-burner refactor + HyperShift multi-endpoint #545

Merged
merged 7 commits into from
Apr 18, 2023

Conversation

rsevilla87
Copy link
Member

@rsevilla87 rsevilla87 commented Mar 9, 2023

Description

Wrapper to run kube-burner using the OCP wrapper, also providing support for the new --metrics-endpoints flag that allows to grab metrics and evaluate alerts from different Prometheus endpoints, needed for HyperShift clusters:

For the HyperShift scenario work we need:

  • OBO Stack: This metrics scraped from this endpoint are metrics from Hosted CPs, such as etcd/API latencies, etc. As this endpoint is not public exposed by default, the script sets up a route to allow kube-burner to reach it. No authentication is required currently.
  • Management cluster Prometheus: The metrics we use from this endpoint are mgmt cluster container metrics (from the hosted control-plane namespace) and worker node metrics (these ones are required to measure the usage in the worker nodes hosting the HCP)
  • Hosted cluster Prometheus: From here we scrape data-plane container/node metrics, and also metrics from kube-state-metrics that are mostly used to count and get resources from the cluster.

Example output

$ export MC_KUBECONFIG=~/kubeconfigs/kubeconfig_21obelta8npo531iqeg4lcd28lb7u1pp WORKLOAD=cluster-density
$ ./run.sh                                                                                                                                                            
Creating OBO route
route.route.openshift.io/prometheus-operated configured
Fetching OBO endpoint
Exporting required vars
MC_OBO: <truncated>
MC_PROMETHEUS: <truncated>
MC_PROMETHEUS_TOKEN: <truncated>
HOSTED_PROMETHEUS: <truncated>
HOSTED_PROMETHEUS_TOKEN: <truncated>
Running: kube-burner ocp cluster-density --iterations=5 --es-server=https://search-perfscale-dev-chmf5l4sh66lvxbnadi4bznl3a.us-west-2.es.amazonaws.com --es-index=ripsaw-kube-burner --metrics-endpoint=metrics-endpoint.yml --churn=false
INFO[2023-03-09 21:18:34] 📁 Creating indexer: elastic                  
INFO[2023-03-09 21:18:36] 👽 Initializing prometheus client with URL: http://prometheus-operated-openshift-observability-operator.apps.hs-mc-0vfs0e6gg.iau9.s1.devshift.org 
INFO[2023-03-09 21:18:37] 🔔 Initializing alert manager for prometheus: http://prometheus-operated-openshift-observability-operator.apps.hs-mc-0vfs0e6gg.iau9.s1.devshift.org 
INFO[2023-03-09 21:18:37] 👽 Initializing prometheus client with URL: https://prometheus-k8s-openshift-monitoring.apps.hs-mc-0vfs0e6gg.iau9.s1.devshift.org 
INFO[2023-03-09 21:18:38] 👽 Initializing prometheus client with URL: https://prometheus-k8s-openshift-monitoring.apps.rsibm.rosa.rsibm.plw3.s3.devshift.org 
INFO[2023-03-09 21:18:38] 🔔 Initializing alert manager for prometheus: https://prometheus-k8s-openshift-monitoring.apps.rsibm.rosa.rsibm.plw3.s3.devshift.org 
INFO[2023-03-09 21:18:38] 🔥 Starting kube-burner (render-metrics-endpoint@40d2b646065ffb7f239c5eec6e8f103534af4075) with UUID 34454e6b-9183-4576-9885-67aa5d9831e7 
INFO[2023-03-09 21:18:38] 📈 Creating measurement factory               
INFO[2023-03-09 21:18:38] Registered measurement: podLatency           
INFO[2023-03-09 21:18:38] Job cluster-density: 5 iterations with 1 ImageStream replicas 
INFO[2023-03-09 21:18:38] Job cluster-density: 5 iterations with 1 Build replicas 
INFO[2023-03-09 21:18:38] Job cluster-density: 5 iterations with 5 Deployment replicas 
INFO[2023-03-09 21:18:38] Job cluster-density: 5 iterations with 5 Service replicas 
INFO[2023-03-09 21:18:38] Job cluster-density: 5 iterations with 1 Route replicas 
INFO[2023-03-09 21:18:38] Job cluster-density: 5 iterations with 10 Secret replicas 
INFO[2023-03-09 21:18:38] Job cluster-density: 5 iterations with 10 ConfigMap replicas 
INFO[2023-03-09 21:18:38] QPS: 20                                      
INFO[2023-03-09 21:18:38] Burst: 20                                    
INFO[2023-03-09 21:18:38] Pre-load: images from job cluster-density    
INFO[2023-03-09 21:18:38] Namespace preload-kube-burner already exists 
INFO[2023-03-09 21:18:39] Pre-load: Creating DaemonSet using image registry.k8s.io/pause:3.1 in namespace preload-kube-burner 
INFO[2023-03-09 21:18:39] Pre-load: Sleeping for 30s                   
INFO[2023-03-09 21:19:09] Pre-load: Deleting namespace preload-kube-burner 
INFO[2023-03-09 21:19:09] Deleting namespaces with label kube-burner-preload=true 
INFO[2023-03-09 21:19:09] Waiting for namespaces to be definitely deleted 
INFO[2023-03-09 21:19:22] Triggering job: cluster-density              
INFO[2023-03-09 21:19:23] Creating Pod latency watcher for cluster-density 
INFO[2023-03-09 21:19:24] Running job cluster-density                  
INFO[2023-03-09 21:19:31] Waiting up to 3h0m0s for actions to be completed 
INFO[2023-03-09 21:19:40] Actions in namespace cluster-density-1 completed 
INFO[2023-03-09 21:19:41] Actions in namespace cluster-density-2 completed 
INFO[2023-03-09 21:19:41] Actions in namespace cluster-density-5 completed 
INFO[2023-03-09 21:19:42] Actions in namespace cluster-density-4 completed 
INFO[2023-03-09 21:19:42] Actions in namespace cluster-density-3 completed 
INFO[2023-03-09 21:19:42] Finished the create job in 18s               
INFO[2023-03-09 21:19:42] Verifying created objects                    
INFO[2023-03-09 21:19:42] imagestreams found: 5 Expected: 5            
INFO[2023-03-09 21:19:42] builds found: 5 Expected: 5                  
INFO[2023-03-09 21:19:42] deployments found: 25 Expected: 25           
INFO[2023-03-09 21:19:42] services found: 25 Expected: 25              
INFO[2023-03-09 21:19:42] routes found: 5 Expected: 5                  
INFO[2023-03-09 21:19:43] secrets found: 50 Expected: 50               
INFO[2023-03-09 21:19:43] configmaps found: 50 Expected: 50            
INFO[2023-03-09 21:19:43] Stopping measurement: podLatency             
INFO[2023-03-09 21:19:43] Indexing pod latency data for job: cluster-density 
INFO[2023-03-09 21:19:43] Indexing metric podLatencyMeasurement        
INFO[2023-03-09 21:19:45] Indexing metric podLatencyQuantilesMeasurement 
INFO[2023-03-09 21:19:46] cluster-density: PodScheduled 50th: 0 99th: 0 max: 0 avg: 0 
INFO[2023-03-09 21:19:46] cluster-density: ContainersReady 50th: 2632 99th: 6221 max: 6221 avg: 2983 
INFO[2023-03-09 21:19:46] cluster-density: Initialized 50th: 0 99th: 1924 max: 1924 avg: 114 
INFO[2023-03-09 21:19:46] cluster-density: Ready 50th: 2632 99th: 6221 max: 6221 avg: 2983 
INFO[2023-03-09 21:19:46] Job cluster-density took 23.94 seconds       
INFO[2023-03-09 21:19:46] Indexing metric jobSummary                   
INFO[2023-03-09 21:19:47] Waiting 30s extra before scraping prometheus endpoint
INFO[2023-03-09 21:20:17] Evaluating alerts for prometheus: http://prometheus-operated-openshift-observability-operator.apps.hs-mc-0vfs0e6gg.iau9.s1.devshift.org 
INFO[2023-03-09 21:20:17] Evaluating expression: 'avg_over_time(histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket{namespace="22bldc7qapudm85d19ssk4p431kmrb54-rsibm"}[2m]))[10m:]) > 0.01' 
INFO[2023-03-09 21:20:17] Evaluating expression: 'avg_over_time(histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket{namespace="22bldc7qapudm85d19ssk4p431kmrb54-rsibm"}[2m]))[10m:]) > 0.03' 
INFO[2023-03-09 21:20:17] Evaluating expression: 'rate(etcd_server_leader_changes_seen_total{namespace="22bldc7qapudm85d19ssk4p431kmrb54-rsibm"}[2m]) > 0' 
INFO[2023-03-09 21:20:17] Evaluating expression: 'avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{namespace="22bldc7qapudm85d19ssk4p431kmrb54-rsibm", apiserver="kube-apiserver", verb=~"POST|PUT|DELETE|PATCH", subresource!~"log|exec|portforward|attach|proxy"}[2m])) by (le, resource, verb))[10m:]) > 1' 
INFO[2023-03-09 21:20:18] Evaluating expression: 'avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{namespace="22bldc7qapudm85d19ssk4p431kmrb54-rsibm", apiserver="kube-apiserver", verb=~"LIST|GET", subresource!~"log|exec|portforward|attach|proxy", scope="resource"}[2m])) by (le, resource, verb, scope))[5m:]) > 1' 
INFO[2023-03-09 21:20:18] Evaluating expression: 'avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{namespace="22bldc7qapudm85d19ssk4p431kmrb54-rsibm", apiserver="kube-apiserver", verb=~"LIST|GET", subresource!~"log|exec|portforward|attach|proxy", scope="namespace"}[2m])) by (le, resource, verb, scope))[5m:]) > 5' 
INFO[2023-03-09 21:20:18] Evaluating expression: 'avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{namespace="22bldc7qapudm85d19ssk4p431kmrb54-rsibm", apiserver="kube-apiserver", verb=~"LIST|GET", subresource!~"log|exec|portforward|attach|proxy", scope="cluster"}[2m])) by (le, resource, verb, scope))[5m:]) > 30' 
INFO[2023-03-09 21:20:18] Evaluating expression: 'up{namespace="22bldc7qapudm85d19ssk4p431kmrb54-rsibm", job=~"kube-controller-manager|openshift-controller-manager|ovnkube-master|etcd-clien|openshift-apiserver"} == 0' 
INFO[2023-03-09 21:20:18] Scraping for the prometheus entry with params - Endpoint: http://prometheus-operated-openshift-observability-operator.apps.hs-mc-0vfs0e6gg.iau9.s1.devshift.org, Profile: metrics-profiles/hosted-cp-metrics.yml, Start: 2023-03-09T20:19:22Z, End: 2023-03-09T20:19:46Z 
INFO[2023-03-09 21:20:18] Indexing metrics with UUID 34454e6b-9183-4576-9885-67aa5d9831e7 
INFO[2023-03-09 21:20:18] 🔍 Scraping prometheus metrics for benchmark from 2023-03-09T20:19:22Z to 2023-03-09T20:19:46Z 
INFO[2023-03-09 21:20:18] Indexing metric schedulingThroughput         
INFO[2023-03-09 21:20:19] Indexing metric readOnlyAPICallsLatency      
INFO[2023-03-09 21:20:19] Indexing metric mutatingAPICallsLatency      
INFO[2023-03-09 21:20:19] Indexing metric APIRequestRate               
INFO[2023-03-09 21:20:19] Indexing metric serviceSyncLatency           
INFO[2023-03-09 21:20:19] Indexing metric etcdLeaderChangesRate        
INFO[2023-03-09 21:20:19] Indexing metric 99thEtcdDiskBackendCommitDurationSeconds 
INFO[2023-03-09 21:20:20] Indexing metric 99thEtcdDiskWalFsyncDurationSeconds 
INFO[2023-03-09 21:20:20] Indexing metric 99thEtcdRoundTripTimeSeconds 
INFO[2023-03-09 21:20:20] Indexing metric etcdVersion                  
INFO[2023-03-09 21:20:21] Indexing metric clusterVersion               
INFO[2023-03-09 21:20:21] Scraping for the prometheus entry with params - Endpoint: https://prometheus-k8s-openshift-monitoring.apps.hs-mc-0vfs0e6gg.iau9.s1.devshift.org, Profile: metrics-profiles/mc-metrics.yml, Start: 2023-03-09T20:19:22Z, End: 2023-03-09T20:19:46Z 
INFO[2023-03-09 21:20:21] Indexing metrics with UUID 34454e6b-9183-4576-9885-67aa5d9831e7 
INFO[2023-03-09 21:20:21] 🔍 Scraping prometheus metrics for benchmark from 2023-03-09T20:19:22Z to 2023-03-09T20:19:46Z 
INFO[2023-03-09 21:20:21] Indexing metric mgmt-containerCPU            
INFO[2023-03-09 21:20:21] Indexing metric mgmt-containerMemory         
INFO[2023-03-09 21:20:21] Indexing metric mgmt-nodeCPU-Workers         
INFO[2023-03-09 21:20:24] Indexing metric mgmt-nodeMemoryAvailable-Workers 
INFO[2023-03-09 21:20:26] Indexing metric mgmt-nodeMemoryUtilization-Workers 
INFO[2023-03-09 21:20:32] Evaluating alerts for prometheus: https://prometheus-k8s-openshift-monitoring.apps.rsibm.rosa.rsibm.plw3.s3.devshift.org 
INFO[2023-03-09 21:20:32] Evaluating expression: 'up{job=~"crio|kubelet"} == 0' 
INFO[2023-03-09 21:20:32] Evaluating expression: 'up{job="ovnkube-node"} == 0' 
INFO[2023-03-09 21:20:32] Scraping for the prometheus entry with params - Endpoint: https://prometheus-k8s-openshift-monitoring.apps.rsibm.rosa.rsibm.plw3.s3.devshift.org, Profile: metrics-profiles/hosted-cluster-metrics.yml, Start: 2023-03-09T20:19:22Z, End: 2023-03-09T20:19:46Z 
INFO[2023-03-09 21:20:32] Indexing metrics with UUID 34454e6b-9183-4576-9885-67aa5d9831e7 
INFO[2023-03-09 21:20:32] 🔍 Scraping prometheus metrics for benchmark from 2023-03-09T20:19:22Z to 2023-03-09T20:19:46Z 
INFO[2023-03-09 21:20:33] Indexing metric containerCPU-AggregatedWorkers 
INFO[2023-03-09 21:20:35] Indexing metric containerCPU-Infra           
INFO[2023-03-09 21:20:35] Indexing metric containerMemory-AggregatedWorkers 
INFO[2023-03-09 21:20:38] Indexing metric containerMemory-Infra        
INFO[2023-03-09 21:20:38] Indexing metric nodeCPU-AggregatedWorkers    
INFO[2023-03-09 21:20:40] Indexing metric nodeMemoryAvailable-AggregatedWorkers 
INFO[2023-03-09 21:20:41] Indexing metric nodeMemoryAvailable-Infra    
INFO[2023-03-09 21:20:41] Indexing metric namespaceCount               
INFO[2023-03-09 21:20:42] Indexing metric podStatusCount               
INFO[2023-03-09 21:20:43] Indexing metric secretCount                  
INFO[2023-03-09 21:20:43] Indexing metric deploymentCount              
INFO[2023-03-09 21:20:44] Indexing metric configmapCount               
INFO[2023-03-09 21:20:44] Indexing metric serviceCount                 
INFO[2023-03-09 21:20:45] Indexing metric nodeRoles                    
INFO[2023-03-09 21:20:46] Indexing metric nodeStatus                   
INFO[2023-03-09 21:20:47] Finished execution with UUID: 34454e6b-9183-4576-9885-67aa5d9831e7 
INFO[2023-03-09 21:20:47] Garbage collecting created namespaces        
INFO[2023-03-09 21:20:47] Deleting namespaces with label kube-burner-uuid=34454e6b-9183-4576-9885-67aa5d9831e7 
INFO[2023-03-09 21:20:48] Waiting for namespaces to be definitely deleted 
INFO[2023-03-09 21:21:06] Cluster metadata indexed correctly           
INFO[2023-03-09 21:21:06] 👋 Exiting kube-burner 34454e6b-9183-4576-9885-67aa5d9831e7 

@rsevilla87 rsevilla87 changed the title HyperShift multi-endpoint [RFE] HyperShift multi-endpoint Mar 9, 2023
@rsevilla87 rsevilla87 changed the title [RFE] HyperShift multi-endpoint [WIP] HyperShift multi-endpoint Mar 9, 2023
@rsevilla87 rsevilla87 added the WIP Work in Progress label Mar 9, 2023
@rsevilla87 rsevilla87 marked this pull request as ready for review March 9, 2023 20:22
@rsevilla87 rsevilla87 force-pushed the hypershift-multi-endpoint branch 3 times, most recently from b5cf530 to 4d00695 Compare March 10, 2023 12:07
@vishnuchalla vishnuchalla self-requested a review March 10, 2023 13:41
workloads/kube-burner-ocp-wrapper/run.sh Outdated Show resolved Hide resolved
workloads/kube-burner-ocp-wrapper/run.sh Show resolved Hide resolved
workloads/kube-burner-ocp-wrapper/run.sh Outdated Show resolved Hide resolved
@smalleni
Copy link
Collaborator

Should we remove https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/workloads/kube-burner/grafana-agent.yaml as it will no longer be required with prom multi-endpoint support?

Also, we have a cluster-density-ms workload in e2e, is it work creating a wrapper for that in kube-burner?

@smalleni smalleni requested a review from mukrishn March 14, 2023 19:01
@rsevilla87 rsevilla87 changed the title [WIP] HyperShift multi-endpoint Kube-burner refactor + HyperShift multi-endpoint Mar 21, 2023
@rsevilla87
Copy link
Member Author

rsevilla87 commented Mar 21, 2023

Should we remove https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/workloads/kube-burner/grafana-agent.yaml as it will no longer be required with prom multi-endpoint support?

Also, we have a cluster-density-ms workload in e2e, is it work creating a wrapper for that in kube-burner?

cluster-density-ms is similar to cluster-density with the difference that its metrics-profile is rendered on the fly to grab metrics from thanos. With this new implementation Thanos won't be required anymore.

@rsevilla87 rsevilla87 force-pushed the hypershift-multi-endpoint branch from 329d75c to 6f8de5a Compare March 21, 2023 13:55
@rsevilla87 rsevilla87 removed the WIP Work in Progress label Mar 21, 2023
@mukrishn
Copy link
Collaborator

cluster-density-ms is similar to cluster-density with the difference that its metrics-profile is rendered on the fly to grab metrics from thanos. With this new implementation Thanos won't be required anymore.

cluster-density-ms workload is much lighter than our regular CD to match production like managed-service cluster load. We have reduced the object and resource counts based on the available telemetry data, because Hypershift team doesn't want to load it with max resources on all 80HC while running concurrent workload.

@rsevilla87 rsevilla87 force-pushed the hypershift-multi-endpoint branch 3 times, most recently from 065ed2b to 17a07c2 Compare March 31, 2023 11:26
Wrapper to run kube-burner using the new --metrics-endpoints flag from
kube-burner that allows to grab metrics and evaluate alerts from
different Prometheus endpoints:

For the HyperShift scenario work we need:
- OBO Stack: This metrics scraped from this endpoint are metrics from
  Hosted CPs, such as etcd/API latencies, etc. As this endpoint is not public exposed by
  default, the script sets up a route to allow kube-burner to reach it.  No authentication is required currently.
- Management cluster Prometheus: The metrics we use from this endpoint
  are mgmt cluster container metrics (from the hosted control-plane namespace)
  and worker node metrics (these ones are required to measure the usage in the worker nodes hosting the HCP)
- Hosted cluster Prometheus: From here we scrape data-plane container
  metrics, and also metrics from kube-state-metrics that are mostly used
  to count and get resources from the cluster.

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
@rsevilla87 rsevilla87 force-pushed the hypershift-multi-endpoint branch from 17a07c2 to ed0c7cf Compare March 31, 2023 13:30
@rsevilla87
Copy link
Member Author

Gotcha, this workload is still not integrated within the kube-burner OCP wrapper.
On the flip side, rather than adding (and maintaining) yet other workload have you considered running fewer iterations of the original cluster-density (or v2) workload?

@dry923
Copy link
Member

dry923 commented Apr 4, 2023

Gotcha, this workload is still not integrated within the kube-burner OCP wrapper. On the flip side, rather than adding (and maintaining) yet other workload have you considered running fewer iterations of the original cluster-density (or v2) workload?

Unfortunately, the numbers don't line up in a way that makes that fit exactly. I don't remember the complete delta but it was large enough for us to make this a separate entity in the first place.

@mukrishn
Copy link
Collaborator

mukrishn commented Apr 4, 2023

Here is the actual resource from prod managed-service telemetry data, we run P75 load.

@rsevilla87
Copy link
Member Author

@dry923 IIRC, you mentioned something about the expression used to obtain the HCP namespace. Mind you refresh my mind?

@rsevilla87 rsevilla87 force-pushed the hypershift-multi-endpoint branch from b484cff to ac42d2d Compare April 11, 2023 09:28
Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
@rsevilla87 rsevilla87 force-pushed the hypershift-multi-endpoint branch from ac42d2d to 6eb305e Compare April 11, 2023 10:10
### Cluster-density and cluster-density-v2

- **ITERATIONS**: Defines the number of iterations of the workload to run. No default value
- **CHURN**: Enables workload churning. By default is true
Copy link
Member

@krishvoor krishvoor Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CHURN is enabled by default, can you kindly add the default CHURN_DURATION as well?

Copy link
Member Author

@rsevilla87 rsevilla87 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!, it's already possible to customize churning options by using the EXTRA_FLAGS variable.
i.e:

$ export EXTRA_FLAGS="--churn-duration=1d --churn-percent=5 --churn-delay=5m"
$ ITERATIONS=500 WORKLOAD=cluster-density-v2 ./run.sh

The reason I didn't add this variable, and others was to keep this implementation as simple as possible and not to start adding more and more variables w/o control as the previous kube-burner e2e-benchmarking implementation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the default CHURN values documented here would make life easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll add some examples to the docs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Raul!

Copy link
Member

@jtaleric jtaleric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, only a small nit on the docs -- it would be good to have the default churn values documented here... Need this change to stop sitting in a holding pattern.

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
@rsevilla87 rsevilla87 force-pushed the hypershift-multi-endpoint branch from 668fe28 to 2249947 Compare April 18, 2023 08:48
Copy link
Member

@krishvoor krishvoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
Copy link
Collaborator

@mukrishn mukrishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well, I am already forking it and adding few more metrics to enable multiple HC dashboard.

@mukrishn mukrishn merged commit 5c2a3a5 into cloud-bulldozer:master Apr 18, 2023
vishnuchalla pushed a commit that referenced this pull request Sep 6, 2023
* HyperShift multi-endpoint

Wrapper to run kube-burner using the new --metrics-endpoints flag from
kube-burner that allows to grab metrics and evaluate alerts from
different Prometheus endpoints:

For the HyperShift scenario work we need:
- OBO Stack: This metrics scraped from this endpoint are metrics from
  Hosted CPs, such as etcd/API latencies, etc. As this endpoint is not public exposed by
  default, the script sets up a route to allow kube-burner to reach it.  No authentication is required currently.
- Management cluster Prometheus: The metrics we use from this endpoint
  are mgmt cluster container metrics (from the hosted control-plane namespace)
  and worker node metrics (these ones are required to measure the usage in the worker nodes hosting the HCP)
- Hosted cluster Prometheus: From here we scrape data-plane container
  metrics, and also metrics from kube-state-metrics that are mostly used
  to count and get resources from the cluster.

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>

* Add docs

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>

* Add QPS, BURST and GC variables

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>

* Update kube-apiserver metric expressions

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>

* Bump kube-burner version

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>

* Improve EXTRA_FLAGS docs

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>

* Use regex for HCP_NAMESPACE

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>

---------

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants