Remember succeeded builds for better defense against spurious bors failures #39005

est31 · 2017-01-12T01:06:24Z

Currently, there are a couple of spurious errors that cause problems in the bors queue. This is only a subset of those issues, as some, like network errors, don't have a github issue: https://github.com/rust-lang/rust/issues?q=is%3Aissue+is%3Aopen+label%3AA-spurious

It would be great to have "defense in depth" against them.

The idea would be to remember which builds succeeded already for a given commit hash, and then mark them as succeeded automatically in later runs (e.g. those with a retry). This will do two things:

If those builds have some of their own spurious issues, they don't pose a risk anymore for the retry run
It frees up workers which may be used to test the previously failed platforms in parallel, forestalling a subsequent retry.

cc @alexcrichton

alexcrichton · 2017-01-12T07:31:04Z

Seems reasonable to me, and I think this'd specifically be a feature of homu somehow probably. I don't know how to implement this, though, on AppVeyor/Travis.

I personally like running lots of tests and seeing lots of spurious failures as it adds a lot of pressure to deal with them, but I also don't mind trying to reduce it by default to help PRs land. It's definitely frustrating dealing with spurious tests.

est31 · 2017-01-12T13:03:05Z

About implementing, one could upload the info whether some build for a given commit hash (or maybe the two parent hashes, if homu generates a new merge commit when it does retry) was successful to a server and then check that server at startup time. Maybe this can even be combined with #38748?

Its bit of a hack, but I guess homu is a hack on top of travis already :/.

For the parallel test runners, one could have a branch auto2 and only push there on a retry and when some of the builds finished early. This is even hackier :)

I personally like running lots of tests and seeing lots of spurious failures as it adds a lot of pressure to deal with them

Yeah that's a good point, but I think causing less issues for PRs would be better overall.

nox · 2017-03-02T17:35:03Z

All the spurious network failures can be avoided by just implementing retry, and that's mandatory anyway when doing stuff with THE CLOUD and its Chaos Monkey on S3 and whatnot.

Mark-Simulacrum added the T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. label May 24, 2017

Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017

mark-i-m mentioned this issue Feb 15, 2018

change opt-level 2 to 3 in bootstrap rustflags #48204

Closed

est31 closed this as completed Oct 19, 2018

rust-lang locked and limited conversation to collaborators Oct 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remember succeeded builds for better defense against spurious bors failures #39005

Remember succeeded builds for better defense against spurious bors failures #39005

est31 commented Jan 12, 2017

alexcrichton commented Jan 12, 2017

est31 commented Jan 12, 2017

nox commented Mar 2, 2017

Remember succeeded builds for better defense against spurious bors failures #39005

Remember succeeded builds for better defense against spurious bors failures #39005

Comments

est31 commented Jan 12, 2017

alexcrichton commented Jan 12, 2017

est31 commented Jan 12, 2017

nox commented Mar 2, 2017