Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Less flaky results from scenario validation #166

Merged
merged 3 commits into from
May 1, 2024

Conversation

dav1do
Copy link
Contributor

@dav1do dav1do commented Apr 30, 2024

We keep getting failures during longer simulations because the manager and workers disconnect at some point, and no metrics are reported by goose, however the prometheus metrics demonstrate that we would have still passed. Now we rely on goose for time, but keep a backup (this will be slightly longer and therefore won't bias us toward a better result), and use the prom metrics exclusively.

I also deleted the old recon keys only scenario as it no longer applies and is just noise.

Results from a 3 minute local test:
image

we have grafana and time, we can try to calculate values even if goose metrics failed. this happens if the workers disconnect temporarily. there is likely some job settings + manager/worker changes to solve this, but this is easy for now.
Copy link
Contributor

@3benbox 3benbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@dav1do dav1do added this pull request to the merge queue May 1, 2024
Merged via the queue into main with commit 44c796e May 1, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants