Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale tests with CO-RE BPF #1321

Open
erthalion opened this issue Sep 11, 2023 · 3 comments
Open

Scale tests with CO-RE BPF #1321

erthalion opened this issue Sep 11, 2023 · 3 comments

Comments

@erthalion
Copy link
Contributor

erthalion commented Sep 11, 2023

The ideal result is:

  • Incorporate relevant workload generator into the KubeBurner

  • Perform ACS Scale tests using the configuration above against a cluster with
    core_bpf collection method

  • Collect resources usage metrics from the test for further analysis

  • Verify memory consumption

@JoukoVirtanen
Copy link
Contributor

Currently for testing the releases two long running clusters are created. One of them has load generated by kube-burner. That kube-burner runs berserker containers that generate process and listening endpoints load. Collector runs in the same cluster with the CORE_BPF collection method. The config files used by kube-burner can be found at https://github.com/stackrox/stackrox/tree/master/scripts/release-tools/kube-burner-configs

The long running cluster for 4.3.0-rc1 is currently running and it is being monitored on a loop with kubectl -n stackrox top pod and by getting the metrics from the collector and sensor pods.

Here are some of the relevant PRs that has contributed to this work.

ROX-19857: long running collector should have listening endpoints load
stackrox/stackrox#7929

Jv rox 17741 long running cluster should include collector
stackrox/actions#20

Jv rox 19896 long running collector should use core bpf
stackrox/actions#34

Jv rox 17741 long running cluster should include collector kube burner configs
https://github.com/stackrox/test-gh-actions/pull/116

I will add here the results from the long running cluster with real load.

Let me know if anything else is needed.

@JoukoVirtanen
Copy link
Contributor

output_plot

output_plot_cpu

The above are the plots of memory and CPU usage for the 4.3 long running cluster.

@JoukoVirtanen
Copy link
Contributor

JoukoVirtanen commented Feb 13, 2024

I did the following to create a long running cluster for master

cdrox
git checkout master
smart-branch jv-test-long-running-with-tag-2
git commit -m "Empty commit to trigger ci" --allow-empty
git tag -a 0.0.8 -m "Test tag for long running cluster"
git push origin 0.0.8
git push origin HEAD

The master commit was ca0b6ba29d4ab50f34b5f022b64078a18e3482de

I then created a PR and waited for the images to be built and pushed.

I then went to https://github.com/stackrox/test-gh-actions/actions/workflows/create-clusters.yml
clicked on "Run workflow", changed the version to 0.0.8, and selected "Create a long-running cluster on RC1". I waited for the github action to finish.

To get the Grafana plots I did the following

infractl artifacts long-real-load-0-0-8 --download-dir /tmp/artifacts-long-real-load-0-0-8
export KUBECONFIG=/tmp/artifacts-long-real-load-0-0-8/kubeconfig
kubectl -n stackrox port-forward service/monitoring 48443:8443 > /dev/null 2>&1 &

Go to https://localhost:48443/?orgId=1 in your browser. Enter admin for the username and stackrox for the password. In the toolbar on the left select Dashboard->Manage. Click on Core Dashboard. After about 7 days the core dashboard showed the following

Screenshot from 2024-02-08 15-41-22

Note that with release versions it is not possible to do profiling as it is disabled. With this version I was able to do profiling, though it doesn't seem right. I checked out the collector commit in COLLECTOR_VERSION and built it locally. I then did the following to get the profiles and visualize one of them

cdrox
./scripts/secured-cluster-diagnostics.sh
cd /tmp/k8s-service-logs/stackrox/metrics/
pprof /home/jvirtane/projects/collector/cmake-build/collector/collector collector-zhl6m-heap.prof -web

Screenshot from 2024-02-08 15-52-52

Screenshot from 2024-02-08 15-53-09

Screenshot from 2024-02-08 15-53-28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants