Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remember succeeded builds for better defense against spurious bors failures #39005

Closed
est31 opened this issue Jan 12, 2017 · 3 comments
Closed
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@est31
Copy link
Member

est31 commented Jan 12, 2017

Currently, there are a couple of spurious errors that cause problems in the bors queue. This is only a subset of those issues, as some, like network errors, don't have a github issue: https://github.com/rust-lang/rust/issues?q=is%3Aissue+is%3Aopen+label%3AA-spurious

It would be great to have "defense in depth" against them.

The idea would be to remember which builds succeeded already for a given commit hash, and then mark them as succeeded automatically in later runs (e.g. those with a retry). This will do two things:

  • If those builds have some of their own spurious issues, they don't pose a risk anymore for the retry run
  • It frees up workers which may be used to test the previously failed platforms in parallel, forestalling a subsequent retry.

cc @alexcrichton

@alexcrichton
Copy link
Member

Seems reasonable to me, and I think this'd specifically be a feature of homu somehow probably. I don't know how to implement this, though, on AppVeyor/Travis.

I personally like running lots of tests and seeing lots of spurious failures as it adds a lot of pressure to deal with them, but I also don't mind trying to reduce it by default to help PRs land. It's definitely frustrating dealing with spurious tests.

@est31
Copy link
Member Author

est31 commented Jan 12, 2017

About implementing, one could upload the info whether some build for a given commit hash (or maybe the two parent hashes, if homu generates a new merge commit when it does retry) was successful to a server and then check that server at startup time. Maybe this can even be combined with #38748?

Its bit of a hack, but I guess homu is a hack on top of travis already :/.

For the parallel test runners, one could have a branch auto2 and only push there on a retry and when some of the builds finished early. This is even hackier :)

I personally like running lots of tests and seeing lots of spurious failures as it adds a lot of pressure to deal with them

Yeah that's a good point, but I think causing less issues for PRs would be better overall.

@nox
Copy link
Contributor

nox commented Mar 2, 2017

All the spurious network failures can be avoided by just implementing retry, and that's mandatory anyway when doing stuff with THE CLOUD and its Chaos Monkey on S3 and whatnot.

@Mark-Simulacrum Mark-Simulacrum added the T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. label May 24, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017
@est31 est31 closed this as completed Oct 19, 2018
@rust-lang rust-lang locked and limited conversation to collaborators Oct 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants