chore: Less flaky results from scenario validation #166

dav1do · 2024-04-30T21:41:44Z

We keep getting failures during longer simulations because the manager and workers disconnect at some point, and no metrics are reported by goose, however the prometheus metrics demonstrate that we would have still passed. Now we rely on goose for time, but keep a backup (this will be slightly longer and therefore won't bias us toward a better result), and use the prom metrics exclusively.

I also deleted the old recon keys only scenario as it no longer applies and is just noise.

Results from a 3 minute local test:

we have grafana and time, we can try to calculate values even if goose metrics failed. this happens if the workers disconnect temporarily. there is likely some job settings + manager/worker changes to solve this, but this is easy for now.

3benbox

SGTM

dav1do added 3 commits April 30, 2024 15:20

chore: delete old recon sync scenario

ff9a941

chore: remove deleted simulation from guide

901ffcd

dav1do requested review from 3benbox, samika98 and smrz2001 April 30, 2024 21:41

3benbox approved these changes May 1, 2024

View reviewed changes

dav1do added this pull request to the merge queue May 1, 2024

Merged via the queue into main with commit 44c796e May 1, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Less flaky results from scenario validation #166

chore: Less flaky results from scenario validation #166

dav1do commented Apr 30, 2024 •

edited

Loading

3benbox left a comment

chore: Less flaky results from scenario validation #166

chore: Less flaky results from scenario validation #166

Conversation

dav1do commented Apr 30, 2024 • edited Loading

3benbox left a comment

Choose a reason for hiding this comment

dav1do commented Apr 30, 2024 •

edited

Loading