Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Proposal: Where possible, make auto-rerun tests run on a different machine #5764

Open
adamfarley opened this issue Nov 20, 2024 · 4 comments
Assignees

Comments

@adamfarley
Copy link
Contributor

adamfarley commented Nov 20, 2024

Summary
Proposal for the auto-rerun test feature to mandate (where possible) that the reruns happen on a different host.

Details
If a specific unit test failure is caused by something specific to a particular host, this change allows us to avoid that problem.

Statistics Summary
Tests which failed and reran on the same host: 84
...of which this many failed: 78
Tests which failed and reran on a different host: 173
...of which this many failed: 124

Statistics Source

RerunStatsByHost.groovy.txt

@adamfarley adamfarley changed the title Feature Proposal: Where possible, make auto-rerun tests run on a different machine WIP: Feature Proposal: Where possible, make auto-rerun tests run on a different machine Nov 20, 2024
@smlambert
Copy link
Contributor

For context, this issue was created based on a discussion in the retrospective where I suggested that we could consider grabbing the hostname of where the initial run occurred and set ADDITIONAL_LABEL=!hostname, but before making such a change, we could actually look at the some of the metrics around auto_reruns in TRSS (related: #5121), whether such a change is actually needed or if many of the auto_reruns naturally land on different machines or whether the intermittent failures we have at the project are less likely to be machine-related causes.

@adamfarley
Copy link
Contributor Author

adamfarley commented Nov 21, 2024

I've written a program to provide some numbers for/against this proposal.

It looks at the last 10 pipelines per LTS version, identifies all rerun tests per build, and compares the host names (and also logs the pass/fail of the rerun).

The program is taking a while to run, but I can see the progress it's making and will update this issue with the results in a minute.

Here is the source:
RerunStatsByHost.groovy.txt

And here is the output:

Tests which failed and reran on the same host: 84
...of which this many failed: 78
Tests which failed and reran on a different host: 173
...of which this many failed: 124

So the percentages seem to indicate that a different host is best.

@adamfarley adamfarley changed the title WIP: Feature Proposal: Where possible, make auto-rerun tests run on a different machine Feature Proposal: Where possible, make auto-rerun tests run on a different machine Nov 21, 2024
@adamfarley adamfarley moved this from Todo to In Progress in Adoptium Backlog Nov 21, 2024
@adamfarley
Copy link
Contributor Author

Please add me to this task as an assignee, and change the project to the Q4 one.

Also, I'll be training tomorrow, so others can feel free to add their name too for further discussion and/or pr creation.

Ta very much. :)

@smlambert
Copy link
Contributor

Great :)

One other dimension to this is that we have now enabled taking 'problem machines' offline if a certain type of failure occurs, so it would not be available to send the rerun job too. It'd be good to look at those that were sent to the same machine and failed in the rerun, to see the nature of the failures (would those failures now trigger taking those machines offline).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants