Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Windows test flakiness #995

Closed
brson opened this issue Mar 17, 2017 · 12 comments
Closed

Fix Windows test flakiness #995

brson opened this issue Mar 17, 2017 · 12 comments
Labels

Comments

@brson
Copy link
Contributor

brson commented Mar 17, 2017

The flakiness of the Windows test suite is out of hand. Most PR's fail multiple times. It feels like it's getting worse. Lots of weird I/O failures.

I don't know if I am comfortable making another release in this state.

@brson brson added the bug label Mar 17, 2017
@brson brson mentioned this issue Mar 17, 2017
@Diggsey
Copy link
Contributor

Diggsey commented Mar 19, 2017

It's probably rust-lang/rust#29497 in various forms. 😞

@Diggsey
Copy link
Contributor

Diggsey commented Mar 19, 2017

Let's collect examples of spurious failures here. I'm assuming this is one:
https://ci.appveyor.com/project/brson/rustup-rs/build/1.0.846/job/d39vkpcr22q2j0ke

I've been trying to reproduce this locally, but the test suite is stubbornly passing every time 😛 I've tried 32-bit and 64-bit, maybe it's specifically -gnu builds? If we have some more examples it may be possible to spot a pattern.

@Diggsey
Copy link
Contributor

Diggsey commented Mar 19, 2017

Hm, I left the test suite running on loop for several hours in 3 different configurations, and didn't get a single failure...

@brson
Copy link
Contributor Author

brson commented Mar 20, 2017

FWIW, someone is working on a remove_dir_all impl here: rust-lang-deprecated/tempdir#15 (comment)

@brson
Copy link
Contributor Author

brson commented Mar 20, 2017

cc @Aaronepower once you've got that remove_dir_all crate you might plug it into rustup too.

@Diggsey
Copy link
Contributor

Diggsey commented Mar 21, 2017

Here's a table of recent failures:

Configuration Failing Test Error
64-bit GNU uninstall_deletes_cargo_home Access is denied during uninstall.
64-bit GNU uninstall_doesnt_leave_gc_file Panic: tests\cli-self-upd.rs:294
64-bit MSVC uninstall_doesnt_leave_gc_file Access is denied during uninstall.
64-bit GNU uninstall_doesnt_leave_gc_file Panic: tests\cli-self-upd.rs:294
32-bit MSVC uninstall_deletes_bins Access is denied during uninstall.
64-bit MSVC uninstall_deletes_bins Access is denied during uninstall.
32-bit GNU remove_default_toolchain_err_handling Access is denied during uninstall.
64-bit MSVC windows_handle_empty_path_registry_key Access is denied during uninstall.
64-bit GNU multirust_upgrade_works_with_proxy unable to hard link fallback exe (already exists)

It's clearly not related to the configuration. All of them seem related to file or directory removal. However, the last one would not be fixed by an improved remove_dir_all. It looks like it might be caused by tests running in parallel?

@Diggsey
Copy link
Contributor

Diggsey commented Mar 25, 2017

Update:

  • It's not caused by tests running in parallel
  • It's not caused by any virus scanners, the indexing service, or shadow copy
  • Running handle.exe to find handles to files being deleted shows no results over many, many test runs

@Diggsey
Copy link
Contributor

Diggsey commented Mar 25, 2017

  • It doesn't seem to be caused by any kind of network file system weirdness

@Diggsey
Copy link
Contributor

Diggsey commented Mar 30, 2017

@brson Updating to the newer appveyor image seems to have reduced the failure rate quite a bit. Over 40 builds there was only 1 failure, which would be fixed by the remove_dir_all changes, so in combination, I think we should be back down to an acceptable failure rate...

I'll work on getting that PR into a state we can merge (atm it installs more than is strictly required, and increases the build time quite a bit. I might see if it can cache the msys install across builds.) and then I think we can close this.

@XAMPPRocky
Copy link
Member

@brson The remove_dir_all crate is now available, sorry I didn't finish it before you wrote that module. 😅

@mati865
Copy link
Contributor

mati865 commented May 22, 2017

Another victim.
Toolchain: 32-bit MSVC
Test: updater_is_deleted_after_running_rustup

error: could not remove 'setup' file: 'C:\Users\appveyor\AppData\Local\Temp\1\rustup-cargo.2PgHSjcklvu2\ch\bin/rustup-init.exe'
info: caused by: Access is denied. (os error 5)

markschl added a commit to markschl/seqtool that referenced this issue Jan 14, 2018
Problems persist, may be related to
rust-lang/rustup#995
markschl added a commit to markschl/seqtool that referenced this issue Jan 14, 2018
Problems persist, may be related to
rust-lang/rustup#995
markschl added a commit to markschl/seqtool that referenced this issue Jan 15, 2018
Problems persist, may be related to
rust-lang/rustup#995
markschl added a commit to markschl/seqtool that referenced this issue Jan 15, 2018
Problems persist, may be related to
rust-lang/rustup#995
markschl added a commit to markschl/seqtool that referenced this issue Jan 31, 2018
Problems persist, may be related to
rust-lang/rustup#995
@rbtcollins
Copy link
Contributor

Just noting that we do still see the occasional unexplained windows failure, but rare vs common. I'm going to close this as long in the tooth and suggest we open specific tickets when we feel we have a pattern emerging again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants