Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustup is still slow under WSL #1866

Open
rbtcollins opened this issue May 23, 2019 · 15 comments
Open

rustup is still slow under WSL #1866

rbtcollins opened this issue May 23, 2019 · 15 comments
Labels
bug O-linux Linux related O-windows Windows related

Comments

@rbtcollins
Copy link
Contributor

rbtcollins commented May 23, 2019

Problem
We forgot about WSL when making the threading code OS specific.

Steps

  1. Run rustup in WSL on Windows 10 (a WSL backed to an NTFS volume)

Possible Solution(s)
Perhaps thread always? Or perhaps try to detect WSL?

Notes

Output of rustup --version:
Output of rustup show:

@rbtcollins rbtcollins added the bug label May 23, 2019
@kinnison
Copy link
Contributor

So one option is we could include the uname crate if we don't already, and then use uname::uname()?.release.find("Microsoft") to detect WSL.

Alternatively we just enable threading for closes always -- I would like us to have a more robust limit on open handles at that point, (perhaps limited queue length would be sufficient?)

@rbtcollins
Copy link
Contributor Author

rbtcollins commented May 23, 2019 via email

@kinnison
Copy link
Contributor

Sounds good. I don't have a WSL setup (yet) but I can try and get one on my gaming platform (rapidly I'm going to have to think of rustup as a game to justify this) and try. getrlimit() or somesuch seems like a very sensible approach.

@rbtcollins
Copy link
Contributor Author

First experiment, enabling threads under WSL was entertaining. Made no impact. Something in LXCORE.SYS appears to be serialising all IO. Hah. Hah. Hah. Probably this is one of the drivers for WSL2...

@rbtcollins
Copy link
Contributor Author

First experiment, enabling threads under WSL was entertaining. Made no impact. Something in LXCORE.SYS appears to be serialising all IO. Hah. Hah. Hah. Probably this is one of the drivers for WSL2...

Running https://openbenchmarking.org/innhold/802aab108da0cc656d7560a98447d3556af818da under WSL 1 shows all 64 cores used, so its not some sort of green threading madness. microsoft/WSL#873 is likely the upstream bug report to track down. I'll do a little more measurement because puzzles and #1867 will hopefully benefit too, though I haven't read the new strace yet.

@rbtcollins
Copy link
Contributor Author

I've modified that to print out '{} is prime', number) for numbers within the last 1000 and run it with large amounts under strace to make sure that strace doesn't impose single thread execution or anything, it doesn't... moving back onto rustup.

@rbtcollins
Copy link
Contributor Author

And here is a strace.
I wasn't expecting the futex's but they appear to be only between iops within a thread, so I'm going to put them down to the threadpool implementation.

tl;dr: openat is extraordinarily slow in WSL1, with typical times in the 17ms during docs unpack. Similar to close. I can't say if we're facing actual serialisation in the linux emulator or not at this stage.

15291 <... openat resumed> )            = 7 <0.017482>
15549 <... close resumed> )             = 0 <0.017134>
strace -f -T rustup under WSL with threads
15292 clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
15291 write(8, "                     <button id="..., 7163 <unfinished ...>
15292 <... clock_gettime resumed> {tv_sec=336928, tv_nsec=54329700}) = 0 <0.000083>
15291 <... write resumed> )             = 7163 <0.000083>
15292 clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
15291 fchmod(8, 0644 <unfinished ...>
15292 <... clock_gettime resumed> {tv_sec=336928, tv_nsec=54594000}) = 0 <0.000073>
15291 <... fchmod resumed> )            = 0 <0.000081>
15292 epoll_wait(4,  <unfinished ...>
15291 futex(0x7ffff70a9b7c, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
15292 <... epoll_wait resumed> [], 1024, 0) = 0 <0.000068>
15291 <... futex resumed> )             = 1 <0.000069>
15292 clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
15513 <... futex resumed> )             = 0 <0.022419>
15292 <... clock_gettime resumed> {tv_sec=336928, tv_nsec=55036200}) = 0 <0.000087>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/std/alloc/struct.System.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>
15292 clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
15513 futex(0x7ffff70acf00, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
15292 <... clock_gettime resumed> {tv_sec=336928, tv_nsec=55309200}) = 0 <0.000080>
15513 <... futex resumed> )             = 0 <0.000086>
15292 clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
15513 futex(0x7ffff705a370, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
15292 <... clock_gettime resumed> {tv_sec=336928, tv_nsec=55574700}) = 0 <0.000076>
15513 <... futex resumed> )             = 1 <0.000089>
15292 epoll_wait(4,  <unfinished ...>
15291 <... openat resumed> )            = 7 <0.000583>
15513 close(8 <unfinished ...>
15514 <... futex resumed> )             = 0 <0.522373>
15291 write(7, "<!DOCTYPE html><html lang=\"en\"><"..., 8192 <unfinished ...>
15514 futex(0x7ffff70aa00c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
15513 <... close resumed> )             = 0 <0.018550>
15291 <... write resumed> )             = 8192 <0.018733>
15513 futex(0x7ffff705a370, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
15291 write(7, "an>, <span class=\"ident\">layout<"..., 8192) = 8192 <0.000104>
15291 write(7, "inter.html\">*mut </a><a class=\"p"..., 8192) = 8192 <0.000051>
15291 write(7, "/trait.Alloc.html#method.usable_"..., 8192) = 8192 <0.000091>
15291 write(7, "ef='../../std/alloc/trait.Alloc."..., 8192) = 8192 <0.000048>
15291 write(7, "ocblock hidden'><p>Performs the "..., 8192) = 8192 <0.000052>
15291 write(7, "/a> for <a class=\"primitive\" hre"..., 7836) = 7836 <0.000051>
15291 fchmod(7, 0644)                   = 0 <0.000048>
15291 futex(0x7ffff70aa00c, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000037>
15514 <... futex resumed> )             = 0 <0.020706>
15514 futex(0x7ffff706b8a0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/reference/type-system.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>
15514 <... futex resumed> )             = 0 <0.000093>
15514 futex(0x7ffff705a370, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000046>
15515 <... futex resumed> )             = 0 <0.535622>
15514 close(7 <unfinished ...>
15515 futex(0x7ffff70aa08c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
15514 <... close resumed> )             = 0 <0.013684>

If we pull out a larger region and strip the futexes entirely we get:

trimmed trace

Blank lines between different thread activity.

15540 close(7 <unfinished ...>

15291 <... openat resumed> )            = 7 <0.007514>

15540 <... close resumed> )             = 0 <0.007217>

15291 write(7, "<!DOCTYPE html><html lang=\"en\"><"..., 8192) = 8192 <0.000275>
15291 write(7, "pl-Sum%3Ci16%3E' class='anchor'>"..., 8192) = 8192 <0.000070>                                                                                                                              15291 write(7, "sp;I: <a class=\"trait\" href=\"../"..., 8192) = 8192 <0.000043>
15291 write(7, "s=\"struct\" href=\"../../std/num/s"..., 8192) = 8192 <0.000084>
15291 write(7, ">&nbsp;&nbsp;&nbsp;&nbsp;I: <a c"..., 8192) = 8192 <0.000044>
15291 write(7, "f=\"../primitive.i16.html\">i16</a"..., 4837) = 4837 <0.000041>
15291 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000036>
15291 clock_gettime(CLOCK_MONOTONIC, {tv_sec=336928, tv_nsec=581992800}) = 0 <0.000020>
15291 write(7, "ter/traits/accum.rs.html#59-63' "..., 8192) = 8192 <0.000058>
15291 write(7, "m::Wrapping\">Wrapping</a>&lt;<a "..., 8192) = 8192 <0.000043>
15291 write(7, "='in-band'>impl&lt;'a&gt; Sum&lt"..., 8192) = 8192 <0.000067>
15291 write(7, "tion\">Duration</a> <span class=\""..., 5930) = 5930 <0.000044>
15291 fchmod(7, 0644)                   = 0 <0.000050>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/core/iter/trait.Sum.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>

15541 close(7 <unfinished ...>

15291 <... openat resumed> )            = 7 <0.013446>                                                                                                                                                     

15541 <... close resumed> )             = 0 <0.013199>

15291 write(7, "<!DOCTYPE html><html lang=\"en\"><"..., 8192 <unfinished ...>
15291 <... write resumed> )             = 8192 <0.000315>
15291 write(7, "\"struct core::num::Wrapping\">Wra"..., 8192) = 8192 <0.000080>
15291 write(7, "tle=\"struct core::num::Wrapping\""..., 8192) = 8192 <0.000255>
15291 write(7, "-20'>fn <a href='#method.sum-19'"..., 8192) = 8192 <0.000082>
15291 write(7, " id='sum.v-30'>fn <a href='#meth"..., 8192) = 8192 <0.000044>
15291 write(7, "/struct.Wrapping.html\" title=\"st"..., 8192) = 8192 <0.000063>
15291 write(7, "64%3E' class='impl'><code class="..., 8192) = 8192 <0.000043>
15291 write(7, ">\n<h1 id=\"examples\" class=\"secti"..., 3809) = 3809 <0.000046>
15291 fchmod(7, 0644)                   = 0 <0.000042>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/edition-guide/rust-2018/data-types/union-for-an-unsafe-form-of-enum.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>

15542 close(7 <unfinished ...>
15542 <... close resumed> )             = 0 <0.015125>

15291 <... openat resumed> )            = 7 <0.015604>
15291 write(7, "<!DOCTYPE HTML>\n<html lang=\"en\" "..., 8192) = 8192 <0.000279>
15291 write(7, "tml\"><strong aria-hidden=\"true\">"..., 8192) = 8192 <0.000080>
15291 write(7, "aria-label=\"Themes\" role=\"menu\">"..., 7595) = 7595 <0.000044>
15291 fchmod(7, 0644)                   = 0 <0.000047>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/book/ch06-01-defining-an-enum.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>         

15543 close(7 <unfinished ...>
15543 <... close resumed> )             = 0 <0.019588>

15291 <... openat resumed> )            = 7 <0.020056>
15291 write(7, "<!DOCTYPE HTML>\n<html lang=\"en\" "..., 8192) = 8192 <0.000282>
15291 write(7, "h09-01-unrecoverable-errors-with"..., 8192) = 8192 <0.000091>
15291 write(7, "tion\"><li><a href=\"appendix-01-k"..., 8192) = 8192 <0.000046>
15291 write(7, "};\n#}</code></pre></pre>\n<p><spa"..., 8192) = 8192 <0.000071>
15291 write(7, "lion-dollar mistake. At that tim"..., 8192) = 8192 <0.000061>
15291 write(7, "uts=\"Right\">\n                   "..., 855) = 855 <0.000043>
15291 fchmod(7, 0644)                   = 0 <0.000043>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/book/2018-edition/ch06-01-defining-an-enum.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>

15544 close(7 <unfinished ...>

15291 <... openat resumed> )            = 7 <0.020119>

15544 <... close resumed> )             = 0 <0.019882>

15291 write(7, "<!DOCTYPE HTML>\n<html lang=\"en\" "..., 4706 <unfinished ...>
15291 <... write resumed> )             = 4706 <0.000238>
15291 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000024>
15291 clock_gettime(CLOCK_MONOTONIC, {tv_sec=336928, tv_nsec=661097200}) = 0 <0.000035>
15291 write(7, "How Functions Work</a></li><li><"..., 8192) = 8192 <0.000094>
15291 write(7, "State</a></li><li><a href=\"ch16-"..., 8192) = 8192 <0.000047>
15291 write(7, "navigation\">\n                   "..., 2215) = 2215 <0.000046>
15291 fchmod(7, 0644)                   = 0 <0.000045>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/book/second-edition/ch06-01-defining-an-enum.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>

15545 close(7 <unfinished ...>

15291 <... openat resumed> )            = 7 <0.020344>

15545 <... close resumed> )             = 0 <0.020058>

15291 write(7, "<!DOCTYPE HTML>\n<html lang=\"en\" "..., 8192 <unfinished ...>
15291 <... write resumed> )             = 8192 <0.000240>
15291 write(7, "e-syntax.html\"><strong aria-hidd"..., 8192) = 8192 <0.000088>
15291 write(7, "class=\"page-wrapper\">\n\n         "..., 6609) = 6609 <0.000057>
15291 fchmod(7, 0644)                   = 0 <0.000055>                                                                                                                                                     
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/unstable-book/language-features/repr-align-enum.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>

15546 close(7 <unfinished ...>
15546 <... close resumed> )             = 0 <0.019362>

15291 <... openat resumed> )            = 7 <0.019870>
15291 write(7, "<!DOCTYPE HTML>\n<html lang=\"en\" "..., 8192) = 8192 <0.000497>
15291 write(7, "e</a></li><li><a href=\"language-"..., 8192) = 8192 <0.000115>
15291 write(7, "ria-hidden=\"true\">2.106.</strong"..., 8192) = 8192 <0.000058>
15291 write(7, "ore_private_bignum</a></li><li><"..., 8192) = 8192 <0.000078>
15291 write(7, "me_lib</a></li><li><a href=\"libr"..., 8192) = 8192 <0.000047>
15291 write(7, "                 </button>\n     "..., 6528) = 6528 <0.000045>                                                                                                                             15291 fchmod(7, 0644)                   = 0 <0.000044>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/std/keyword.enum.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>

15547 close(7 <unfinished ...>

15291 <... openat resumed> )            = 7 <0.017319>

15547 <... close resumed> )             = 0 <0.017024>

15291 write(7, "<!DOCTYPE html><html lang=\"en\"><"..., 7459) = 7459 <0.000240>
15291 fchmod(7, 0644)                   = 0 <0.000049>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/reference/types/enum.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>

15548 close(7 <unfinished ...>

15291 <... openat resumed> )            = 7 <0.011032>

15548 <... close resumed> )             = 0 <0.010816>

15291 write(7, "<!DOCTYPE HTML>\n<html lang=\"en\" "..., 8192 <unfinished ...>
15291 <... write resumed> )             = 8192 <0.000272>
15291 write(7, "rong aria-hidden=\"true\">8.2.9.</"..., 8192) = 8192 <0.000088>
15291 write(7, "                  <button id=\"se"..., 5880) = 5880 <0.000068>
15291 fchmod(7, 0644)                   = 0 <0.000043>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/rust-by-example/custom_types/enum.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>

15549 close(7 <unfinished ...>

15291 <... openat resumed> )            = 7 <0.017482>

15549 <... close resumed> )             = 0 <0.017134>

15291 write(7, "<!DOCTYPE HTML>\n<html lang=\"en\" "..., 8192 <unfinished ...>
15291 <... write resumed> )             = 8192 <0.000239>
15291 write(7, ".3.</strong> Binding</a></li></o"..., 8192) = 8192 <0.000070>
15291 write(7, "></li><li><a href=\"macros/overlo"..., 8192) = 8192 <0.000058>
15291 write(7, "Ayu</button></li>\n              "..., 7254) = 7254 <0.000065>
15291 fchmod(7, 0644)                   = 0 <0.000041>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/rust-by-example/flow_control/match/destructuring/destructure_enum.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>

15550 close(7 <unfinished ...>

15291 <... openat resumed> )            = 7 <0.017117>

15550 <... close resumed> )             = 0 <0.016832>

15291 write(7, "<!DOCTYPE HTML>\n<html lang=\"en\" "..., 8192 <unfinished ...>
15291 <... write resumed> )             = 8192 <0.000243>
15291 write(7, "e\">8.5.3.</strong> Binding</a></"..., 8192) = 8192 <0.000078>
15291 write(7, "ors</a></li><li><a href=\"macros/"..., 8192) = 8192 <0.000058>
15291 write(7, "\"ayu\">Ayu</button></li>\n        "..., 7323) = 7323 <0.000063>
15291 fchmod(7, 0644)                   = 0 <0.000041>
15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/a1py9fg87dyydo7o_dir/rust-docs/share/doc/rust/html/unstable-book/library-features/core-private-bignum.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>

15551 close(7 <unfinished ...>
15551 <... close resumed> )             = 0 <0.016410>

15291 <... openat resumed> )            = 7 <0.016851>

openat is not always slow:
15291 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000123>
The first really slow openat is

15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/8oqf2gtb510uyjcg_dir/rust-std-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libbitflags-86b7389666f99c01.rlib", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>
15291 <... openat resumed> )            = 8 <0.021601>

which comes after writing

15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/8oqf2gtb510uyjcg_dir/rust-std-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblock_api-e5173794b3d0aaf0.rlib", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>
15291 <... openat resumed> )            = 8 <0.005984>

Though those times are already substantially slower than the first creation openat:

15291 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/2kl7ird7h90wy9qo_dir/rustc/share/man/man1/rustdoc.1", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>
15291 <... openat resumed> )            = 8 <0.000446>

With openat being slow, we can't avoid that latency; so unless we can find a way to convince WSL openat to be fast, or do the work to let us do all the IO in threads in tar-rs, I think we can' t do much more latency hiding in WSLv1.

@rbtcollins
Copy link
Contributor Author

I ran a procmon log at the same time as strace.

so this 10ms open:

15873 20:11:46.501767 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/p70e6lguovhuruo4_dir/rust-docs/manifest.in", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>
15873 20:11:46.513212 <... openat resumed> ) = 8 <0.010782>

can be tracked to this region

procmon trace
8:11:46.5019440 PM	0.0002992	rustup	21796	28868	CreateFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\manifest.in	SUCCESS	Desired Access: Read EA, Write EA, Read Attributes, Write Attributes, Read Control, Disposition: Create, Options: Synchronous IO Non-Alert, Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: 0, OpenResult: Created
8:11:46.5022629 PM	0.0000139	rustup	21796	28868	SetEAFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs	SUCCESS	
8:11:46.5022885 PM	0.0000047	rustup	21796	28868	QueryAllInformationFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\manifest.in	BUFFER OVERFLOW	CreationTime: 26/05/2019 8:11:46 PM, LastAccessTime: 26/05/2019 8:11:46 PM, LastWriteTime: 26/05/2019 8:11:46 PM, ChangeTime: 26/05/2019 8:11:46 PM, FileAttributes: A, AllocationSize: 0, EndOfFile: 0, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x30000001381ab, EaSize: 72, Access: Read EA, Write EA, Read Attributes, Write Attributes, Read Control, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Long
8:11:46.5024861 PM	0.0000370	rustup	21796	28868	CreateFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\manifest.in	SUCCESS	Desired Access: Write Data/Add File, Read EA, Write EA, Read Attributes, Write Attributes, Read Control, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
8:11:46.5118975 PM	0.0119380	rustup	21796	32172	CloseFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\core\intrinsics\fn.cttz.html	SUCCESS	
8:11:46.5126944 PM	0.0000065	MsMpEng.exe	6212	9004	CreateFileMapping	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\core\intrinsics\fn.cttz.html	FILE LOCKED WITH WRITERS	SyncType: SyncTypeCreateSection, PageProtection: PAGE_EXECUTE|PAGE_NOCACHE
8:11:46.5127219 PM	0.0000055	MsMpEng.exe	6212	9004	QueryStandardInformationFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\core\intrinsics\fn.cttz.html	SUCCESS	AllocationSize: 8,192, EndOfFile: 7,410, NumberOfLinks: 1, DeletePending: False, Directory: False
8:11:46.5239729 PM	0.0000297	rustup	21796	28868	SetEAFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\manifest.in	SUCCESS	
8:11:46.5240188 PM	0.0001228	rustup	21796	28868	WriteFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\manifest.in	SUCCESS	Offset: 0, Length: 24, Priority: Normal
8:11:46.5263485 PM	0.0000169	rustup	21796	28868	SetEAFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\manifest.in	SUCCESS	
8:11:46.5284040 PM	0.0004734	rustup	21796	28868	CreateFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-installer-version	SUCCESS	Desired Access: Read EA, Write EA, Read Attributes, Write Attributes, Read Control, Disposition: Create, Options: Synchronous IO Non-Alert, Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: 0, OpenResult: Created
8:11:46.5289006 PM	0.0000139	rustup	21796	28868	SetEAFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir	SUCCESS	
8:11:46.5289319 PM	0.0000058	rustup	21796	28868	QueryAllInformationFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-installer-version	BUFFER OVERFLOW	CreationTime: 26/05/2019 8:11:46 PM, LastAccessTime: 26/05/2019 8:11:46 PM, LastWriteTime: 26/05/2019 8:11:46 PM, ChangeTime: 26/05/2019 8:11:46 PM, FileAttributes: A, AllocationSize: 0, EndOfFile: 0, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x30000001381ac, EaSize: 72, Access: Read EA, Write EA, Read Attributes, Write Attributes, Read Control, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Long
8:11:46.5291378 PM	0.0000381	rustup	21796	28868	CreateFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-installer-version	SUCCESS	Desired Access: Write Data/Add File, Read EA, Write EA, Read Attributes, Write Attributes, Read Control, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
8:11:46.5308153 PM	0.0000160	rustup	21796	28868	SetEAFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-installer-version	SUCCESS	
8:11:46.5308413 PM	0.0000702	rustup	21796	28868	WriteFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-installer-version	SUCCESS	Offset: 0, Length: 2, Priority: Normal
8:11:46.5324145 PM	0.0043939	rustup	21796	16896	CloseFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\manifest.in	SUCCESS	
8:11:46.5333828 PM	0.0000073	MsMpEng.exe	6212	9004	CreateFileMapping	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\manifest.in	FILE LOCKED WITH WRITERS	SyncType: SyncTypeCreateSection, PageProtection: PAGE_EXECUTE|PAGE_NOCACHE
8:11:46.5334101 PM	0.0000052	MsMpEng.exe	6212	9004	QueryStandardInformationFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\manifest.in	SUCCESS	AllocationSize: 24, EndOfFile: 24, NumberOfLinks: 1, DeletePending: False, Directory: False

The key bits we need are this:
In linux, open takes 0.010782 starting at 20:11:46.501767
In Windows, CreateFile takes 0.0002992 at 8:11:46.5019440 - a delay of 0.0001770 , SetEAFile on the containing directory shouldn't be in the same-thread but it is, taking 0.0000139 QueryAllInformationFile takes 0.0000047 and then we open it again for 0.0000370 (what? I dunno).

  0.0001770
  0.0002992
+0.0000139
+0.0000047
+0.0000370
=0.0005318

The last windows syscall is started at 8:11:46.5024861 or 0.0005421s after the first one starts, which implies some but not a lot of inter-call processing.

So we have a 0.0001770 gap, then some syscalls for 0.0005421+0.0000370 - we can explain an openat call that takes 0.0007561s - but Linux saw 0.0107820, a difference of 0.0100259, or 10ms.

The only explanation that makes sense of this for me is a coarse lock around the fd table, and close() holding that lock for the duration of CloseHandle, rather than invalidating the fd (but not releasing it); closing it, then releasing it. Or some similar thing. Does this stack up? Lets have a look at the surrounding strace:

16107 20:11:46.499554 close(8 <unfinished ...>

15873 20:11:46.501767 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/p70e6lguovhuruo4_dir/rust-docs/manifest.in", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>

16103 20:11:46.505732 close(7 <unfinished ...>

16107 20:11:46.512515 <... close resumed> ) = 0 <0.012437>

15873 20:11:46.513212 <... openat resumed> ) = 8 <0.010782>

Here three threads are involved:
at .499554 16107 starts a close();
at .501767 15873 starts and has - on the NTFS side - completed an open by .5025231
at .505732 16103 starts a close();
at .512515 16107 completes it's close().
at .513212 15873 completes it's openat and gets fd 8 - the one just closed.

Sure looks like a coarse lock on the fd<->Handle table to me, but of course, I don't have the WSL source code.

Lets check what that close(8) was for and look at its procmon close data.

15873 20:11:46.474059 openat(AT_FDCWD, "/home/robertc/.rustup/tmp/p70e6lguovhuruo4_dir/rust-docs/share/doc/rust/html/std/intrinsics/fn.cttz.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666 <unfinished ...>16104 20:11:46.474695 futex(0x7ffff66878b0, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000032>
15873 20:11:46.476027 <... openat resumed> ) = 8 <0.001350>
and in procmon
8:11:46.4744071 PM	0.0002928	rustup	21796	28868	CreateFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\std\intrinsics\fn.cttz.html	SUCCESS	Desired Access: Read EA, Write EA, Read Attributes, Write Attributes, Read Control, Disposition: Create, Options: Synchronous IO Non-Alert, Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: 0, OpenResult: Created
8:11:46.4747171 PM	0.0000141	rustup	21796	28868	SetEAFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\std\intrinsics	SUCCESS	
8:11:46.4747451 PM	0.0000049	rustup	21796	28868	QueryAllInformationFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\std\intrinsics\fn.cttz.html	BUFFER OVERFLOW	CreationTime: 26/05/2019 8:11:46 PM, LastAccessTime: 26/05/2019 8:11:46 PM, LastWriteTime: 26/05/2019 8:11:46 PM, ChangeTime: 26/05/2019 8:11:46 PM, FileAttributes: A, AllocationSize: 0, EndOfFile: 0, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x30000001381a9, EaSize: 72, Access: Read EA, Write EA, Read Attributes, Write Attributes, Read Control, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Long
8:11:46.4749776 PM	0.0000376	rustup	21796	28868	CreateFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\std\intrinsics\fn.cttz.html	SUCCESS	Desired Access: Write Data/Add File, Read EA, Write EA, Read Attributes, Write Attributes, Read Control, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
8:11:46.4770175 PM	0.0000146	rustup	21796	28868	SetEAFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\std\intrinsics\fn.cttz.html	SUCCESS	
8:11:46.4770421 PM	0.0001134	rustup	21796	28868	WriteFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\std\intrinsics\fn.cttz.html	SUCCESS	Offset: 0, Length: 7,343, Priority: Normal
8:11:46.4900031 PM	0.0000248	rustup	21796	28868	SetEAFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\std\intrinsics\fn.cttz.html	SUCCESS	
8:11:46.4994827 PM	0.0122879	rustup	21796	30740	CloseFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\std\intrinsics\fn.cttz.html	SUCCESS	
8:11:46.5005271 PM	0.0000078	MsMpEng.exe	6212	9004	CreateFileMapping	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\std\intrinsics\fn.cttz.html	FILE LOCKED WITH WRITERS	SyncType: SyncTypeCreateSection, PageProtection: PAGE_EXECUTE|PAGE_NOCACHE
8:11:46.5005548 PM	0.0000043	MsMpEng.exe	6212	9004	QueryStandardInformationFile	C:\Users\robertc\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs\home\robertc\.rustup\tmp\p70e6lguovhuruo4_dir\rust-docs\share\doc\rust\html\std\intrinsics\fn.cttz.html	SUCCESS	AllocationSize: 8,192, EndOfFile: 7,343, NumberOfLinks: 1, DeletePending: False, Directory: False

At .4994827 close is started and takes 0122879 finishing at .5117706. At 5005271 Defender starts operating on the file. So this lines up very well: defender mmaps the file when close is called, blocks the close, and WSL has a coarse lock on the FD table.

@rbtcollins
Copy link
Contributor Author

So the model I now have is:

  • operations that change the fd<->Handle mapping will contend with each other hugely
  • those operations appear to do their external work:
    • in the lock (close)
    • before the lock (open)
  • defender will make closing files slow - don't know yet if flushing a file will cause it to scan it or only close

Things we haven't modelled:

  • what sort of lock is it? do multiple closes content with each other, or just close with opens?

Possible workarounds to try:

  • opening many files in parallel: even if close is serialised we can get up to hardware-thread opens per close() because close does the OS work outside the lock
  • syncing data before closing (try making close faster)
  • don't close any files until we have hardware-thread-files-ready, then close one per hardware thread at the same time (if close doesn't mutually exclude other closes, these can proceed in parallel)

rbtcollins added a commit to rbtcollins/rustup.rs that referenced this issue May 26, 2019
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or
to a specific number if desired for testing/tuning.

This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse
lock around the fd-handle table inside WSL.
rbtcollins added a commit to rbtcollins/rustup.rs that referenced this issue May 30, 2019
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or
to a specific number if desired for testing/tuning.

This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse
lock around the fd-handle table inside WSL.
rbtcollins added a commit to rbtcollins/rustup.rs that referenced this issue Jun 4, 2019
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or
to a specific number if desired for testing/tuning.

This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse
lock around the fd-handle table inside WSL.
rbtcollins added a commit to rbtcollins/rustup.rs that referenced this issue Jun 5, 2019
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or
to a specific number if desired for testing/tuning.

This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse
lock around the fd-handle table inside WSL.
rbtcollins added a commit to rbtcollins/rustup.rs that referenced this issue Jun 6, 2019
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or
to a specific number if desired for testing/tuning.

This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse
lock around the fd-handle table inside WSL.
@workingjubilee
Copy link
Member

@rustbot label: +O-windows +O-linux

@rustbot rustbot added O-linux Linux related O-windows Windows related labels Apr 29, 2021
@kinnison
Copy link
Contributor

kinnison commented Jun 8, 2021

Does this remain an issue with wsl2 ? If not, I suggest we close this.

@rbtcollins
Copy link
Contributor Author

I haven't tested yet; WSL1 is still the only WSL some users can use due to enterprise rollout channels. If WSL2 is slow it will be for different reasons.

@rbtcollins
Copy link
Contributor Author

just for kicks I ran under WSL2 with .rustup symlinked onto /mnt/c/.

info: profile set to 'default'
info: default host triple is x86_64-unknown-linux-gnu
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'
673.4 KiB / 673.4 KiB (100 %) 552.9 KiB/s in  1s ETA:  0s
info: latest update on 2021-06-17, rust version 1.53.0 (53cb7b09b 2021-06-17)
info: downloading component 'cargo'
  6.0 MiB /   6.0 MiB (100 %)   3.6 MiB/s in  1s ETA:  0s
info: downloading component 'clippy'
info: downloading component 'rust-docs'
 16.1 MiB /  16.1 MiB (100 %)   7.0 MiB/s in  2s ETA:  0s
info: downloading component 'rust-std'
 25.3 MiB /  25.3 MiB (100 %) 747.5 KiB/s in 32s ETA:  0s
info: downloading component 'rustc'
 48.4 MiB /  48.4 MiB (100 %) 778.4 KiB/s in  1m  7s ETA:  0s
info: downloading component 'rustfmt'
  3.7 MiB /   3.7 MiB (100 %) 759.7 KiB/s in  4s ETA:  0s
info: installing component 'cargo'
info: installing component 'clippy'
info: installing component 'rust-docs'
 16.1 MiB /  16.1 MiB (100 %)   1.2 MiB/s in 18s ETA:  0s
info: installing component 'rust-std'
 25.3 MiB /  25.3 MiB (100 %)  10.6 MiB/s in  2s ETA:  0s
info: installing component 'rustc'
 48.4 MiB /  48.4 MiB (100 %)  12.3 MiB/s in  4s ETA:  0s
info: installing component 'rustfmt'
info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'

  stable-x86_64-unknown-linux-gnu installed - rustc 1.53.0 (53cb7b09b 2021-06-17)

Quite slow.

@shenshouer
Copy link

rustc 1.60.0 (7737e0b5c 2022-04-04) very very very slow building under WSL2 on win11

@rbtcollins
Copy link
Contributor Author

Rustc is not rustup, please file your ticket on the rust repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug O-linux Linux related O-windows Windows related
Projects
None yet
Development

No branches or pull requests

5 participants