-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azure: Investigate seemingly spurious failures on CI #61301
Comments
This doesn't seem to be a spurious failure: these jobs started after bors marked the PR as failed due to other failures on Travis/AppVeyor, so the |
For the submodules weirdness I opened #61322 to investigate. For the git checkout failures (logs) it seems the error is happening on the Azure side, and there is not much we can do on our end. Any idea on what could have caused it, or what steps we can take to investigate it @rylev @johnterickson? |
…e-cloning, r=alexcrichton ci: display more debug information in the init_repo script I'm *really* confused about the error message [while cloning submodules on Windows on Azure](https://dev.azure.com/rust-lang/e71b0ddf-dd27-435a-873c-e30f86eea377/_apis/build/builds/295/logs/506): ``` /usr/bin/tar: You must specify one of the '-Acdtrux', '--delete' or '--test-label' options Try '/usr/bin/tar --help' or '/usr/bin/tar --usage' for more information. ``` It doesn't make sense for it to execute a command without any of those flags since they're clearly added: https://github.com/rust-lang/rust/blob/81970852e172c04322cbf8ba23effabeb491c83c/src/ci/init_repo.sh#L45 So this adds `set -x` to the script to hopefully catch what command it's executing. r? @alexcrichton cc rust-lang#61301
…e-cloning, r=alexcrichton ci: display more debug information in the init_repo script I'm *really* confused about the error message [while cloning submodules on Windows on Azure](https://dev.azure.com/rust-lang/e71b0ddf-dd27-435a-873c-e30f86eea377/_apis/build/builds/295/logs/506): ``` /usr/bin/tar: You must specify one of the '-Acdtrux', '--delete' or '--test-label' options Try '/usr/bin/tar --help' or '/usr/bin/tar --usage' for more information. ``` It doesn't make sense for it to execute a command without any of those flags since they're clearly added: https://github.com/rust-lang/rust/blob/81970852e172c04322cbf8ba23effabeb491c83c/src/ci/init_repo.sh#L45 So this adds `set -x` to the script to hopefully catch what command it's executing. r? @alexcrichton cc rust-lang#61301
I see two trends - let me know if I'm missing another: This looks like the auth to GitHub is not working and so git is falling back to a command prompt for creds. Hmm...
This sounds vaguely like a race we were seeing where we are getting a notification from GitHub before the commit has been replicated everywhere.
|
@johnterickson I'm pretty sure that's our fault and "expected behavior". Every time a new build needs to start bors force-pushes the merge commit on the auto branch, thus deleting the previous build's merge commit. If the new build is started shortly after the previous one (for example if someone cancels the build on the bors side to let an higher priority one start) the checkout step of the previous one might have not started yet and it will try to get the old HEAD, causing that error. I've also seen the error only on builds that ran for ~5 minutes before being killed by our cancelbot because a newer build started, confirming the hypothesis. |
There is another spurious failure which is pretty bad: this build failed to "prepare" but was marked as successful on the GitHub Checks. |
@pietroalbini I raised that "succeeded but actually failed" build with the right team - definitely concerning and I appreciate you reporting it. |
Another two that have come up:
@pietroalbini or @kennytm do y'all know what might be causing that?
|
Maybe the HEAD commit did not have a parent commit? Is the commit 322afaf associated to any branch during |
For IPv6 error |
Ah ok @kennytm that's probably it. It sounds like a specific git history is expected and/or fetch depth, but I'm just pushing up raw commits which probably breaks the script's assumption. For the failing builds I'm just making manual commits and pushing them to |
For what it's worth, I had "fixed" this in my original branch in that it would swallow this error. |
@ethomson said on Discord the checkout issues are now fixed 🎉
|
Thanks for linking me to this issue, @pietroalbini. Looking at the topic, I'm looking at this section:
Naive question: what is the weirdness that you're describing? It's not obvious from the logs, because I don't really know what I'm looking for. |
Ah yeah sorry I should have been more descriptive there! On build 287 the x86_64-msvc-2 build fails the build due to an odd error message (presumably a submodule missing) and the logs for the "Check out submodules (windows)" step, while successful, are sort of funny. I'm not sure if this is really an Azure issue though, it may have been a case that we got a bad tarball from GitHub and swallowed the error by accident (this is what #61322 was hoping to help diagnose). The 295 build is similar except for just a different windows builder (dist-x86_64-msvc) I haven't seen these happen again myself, though, so it may have been just a transient issue. AFAIK most issues we've listed here have been addressed one way or another, so it may actually be time to close this! |
By the way, since I couldn't reproduce that failure locally I also added some extra debug info to see what's actually happening there. Didn't get a chance to see the output when the spurious failure happens yet. |
@pietroalbini :( Thanks. Let me see where the fix for that is in the deployment queue and if we expect that your account has it ... |
@pietroalbini Update on 403's ("could not read Username") - that fix was still making its way through progressive deployment. We explicitly enabled it on your account. Please let me know if you see it again, as that means our fix wasn't actually a fix for the issue. |
Triage: given that we're not longer using Azure pipelines, I imagine this issue can be closed. Anyone from @rust-lang/infra who can confirm? |
Indeed! Closing. We're still using macOS on Azure, but I don't think we've seen these failures there recently at least. |
Git checkout failures
arm-android git checkout failurearmhf-gnu git checkout failuredist-various-2 git checkout failureeven more git checkout failuresSubmodule weirdness on Windows
Other
The text was updated successfully, but these errors were encountered: