Download crates in parallel with HTTP/2 #6005

alexcrichton · 2018-09-11T00:28:31Z

This PR revives some of the work of #5161 by refactoring Cargo to make it much easier to add parallel downloads, and then it does so with the curl crate's new http2 feature to compile nghttp2 has a backend.

The primary refactoring done here is to remove the concept of "download this one package" deep within a Source. Instead a Source still has a download method but it's considered to be largely non-blocking. If a crate needs to be downloaded it immediately returns information as to such. The PackageSet abstraction is now a central location for all parallel downloads, and all users of it have been refactored to be amenable to parallel downloads, when added.

Many more details are in the commits...

rust-highfive · 2018-09-11T00:28:36Z

r? @matklad

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2018-09-11T00:28:44Z

r? @ehuss

cc @ishitatsuyuki

Mark-Simulacrum · 2018-09-11T00:33:43Z

Hm, so I've wondered about this in the past -- I sort of intuitively always thought that parallel downloads from the "same" server wouldn't really help much because in most cases the bandwidth limitation is somewhere in the middle (likely closer to the client than the server, in fact, I'd imagine).

Not that I don't think this is a good idea to refactor in general if only to have downloads and builds potentially running in parallel or parallel downloads from different registries. Mostly just interested if there's perhaps some resource anyone knows about that would help clarify this for me :)

alexcrichton · 2018-09-11T01:25:57Z

The claims in #5161 (comment) and #467 provide what I think are some pretty strong data for pursuing something like this. I think it's fine to hold off on landing this until there's at least a proof-of-concept to test out as well.

ishitatsuyuki · 2018-09-11T01:45:04Z

Great work. It was what I was stuck (or maybe just upset) with and this makes implementing the actual part much easier. Thanks!

@Mark-Simulacrum, the whole point of this is to parallelize the initial round trips. The crates.io server in US takes non trivial time to reach from Japan, and thus the time is wasted mostly waiting for a trivial redirect. If we someday absorb the redirect at CDN layer, that could be an improvement as well.

alexcrichton · 2018-09-11T04:12:05Z

So to add to this dataset, I've got a proof-of-concept working that uses http/2 with libcurl to do downloads in Cargo itself. On my machine in the Mozilla office (connected to a presumably very fast network) I removed my ~/.cargo/registry/{src,cache} folders and then executed cargo fetch in Cargo itself. On nightly this takes about 18 seconds. With this PR it takes about 3. That's... wow!

ishitatsuyuki

Some nitpicks.

Overall, this PR introduce a rather complicated abstraction which is probably caused by the way we resolve dependencies currently.

The approach is more like uncontrolled concurrency (like goroutines, or whatever you call), which IMO should be replaced by explicit logic boundaries. Do you know what information is missing from index and requires crate fetch instead? I would like to be aware of them if possible. If you don't know, no worry; this approach will work for the near future, and we can tackle with this when we change the way index works.

ishitatsuyuki · 2018-09-11T12:54:42Z

src/cargo/core/compiler/context/unit_dependencies.rs

+    /// Completes at least one downloading, maybe waiting for more to complete.
+    ///
+    /// This function will block the current thread waiting for at least one
+    /// create to finish downloading. The function may continue to download more


ishitatsuyuki · 2018-09-11T12:54:55Z

src/cargo/core/compiler/context/unit_dependencies.rs

+    /// crates if it looks like there's a long enough queue of crates to keep
+    /// downloading. When only a handful of packages remain this function
+    /// returns, and it's hoped that by returning we'll be able to push more
+    /// packages to download into the queu.


alexcrichton · 2018-09-11T17:35:07Z

@ishitatsuyuki yeah it's not a great abstraction because it's not as simple as "download everything in this list". I initially didn't want to disturb the existing code too much because unit_dependencies has historically been very complicated and attempting to emulate it anywhere else inevitably led to bugs (so I wanted to use the existing code as-is).

While the unit dependencies may be complicated though it may be the case that package dependencies are much simpler, so if desired we can try to do that instead and always use an interface that looks like "download this list of packages".

I'm not sure what you mean about missing index information though? The index is all Cargo looks at before it starts downloading crates

ehuss · 2018-09-11T23:33:30Z

I had to take the new code for a test drive. On my (slow) network, downloading all the crates for a fresh cargo build goes from 48s to 22s!!

ehuss · 2018-09-11T23:43:25Z

If you want some early feedback: The progress indicator only works at the resolution of entire packages. If you have only 1 package to download, and it takes a long time to download, cargo prints nothing until after the download is complete. It would be nice if it could display some progress based on the package size.

alexcrichton · 2018-09-11T23:54:22Z

@ehuss good to hear! That's not quite as large of an improvement as I would have hoped for but still is nice :)

For progress indicators I'm not really sure what to do myself. Previously we'd download one crate at a time which meant it was relatively obvious how to indicate progress, but now we don't have a great way to indicate progress I think because it's a whole soup of crates, and we don't know how big they all are. I initially tried to make a progress bar of "total bytes downloaded" vs "total bytes needed to download", but it was very jumpy and non-continuous, so I wasn't really sure what was going on there.

I don't think though we'll be able to land this with a package resolution, as you've seen with larger crates it's just not enough information compared to what we have today.

ishitatsuyuki · 2018-09-12T00:46:38Z

I'm not sure what you mean about missing index information though? The index is all Cargo looks at before it starts downloading crates

It looks like we have to download some tarballs in order to determine unit dependencies currently? (Is this correct?) If so, what particular information that is not in the index does it need?

It seems having the list of unit dependencies ahead of time is also good for displaying better progress percentages. We don't know the file size anyway, but we can address it some day by changing the index format.

alexcrichton · 2018-09-12T16:39:29Z

Oh so the index gives us the "maximal" crate graph sort of in terms of independent of activated features and which target platform. The unit_dependencies section of code, however, is specifically constructing a set of units that use a the minimial crate graph which doesn't include unnecessary platform-specific dependencies or unactivated feature dependencies.

In that sense what we're doing, creating a minimal resolution crate graph, can require more information than what's in the index. The index doesn't contain information about things like the crate's library targets, for example. I don't think we need to change the index for this, though, we can probably just get smarter in Cargo about constructing the minimal crate graph conservatively.

alexcrichton · 2018-09-13T02:01:36Z

Looking into things a bit, Heroku doesn't support HTTP/2 (ref) which means that we won't get multiplexing when talking to crates.io. Our CDN with static.crates.io, cloudfront, does support HTTP/2 though.

ishitatsuyuki · 2018-09-13T04:26:04Z

@alexcrichton In my former PR I used pipelining which reliably works for Heroku's reverse proxy. It will achieve similar efficiency improvements without creating too much TCP connections.

alexcrichton · 2018-09-13T23:52:53Z

Ok! I've done some various tinkering locally and I think this is at a good spot. This now works on all platforms, I've tested on all platforms, and I'm relatively happy with the new output progress. The output looks like this:

Fast CPU with fast internet

Not as fast computer with throttled LTE-like internet

Some notable points of interest

The "jerkiest" part of the UI is the fact that we extract tarballs synchronously, and we don't have much opportunity to update the UI during this time. This is a bit of a bummer, but something we can fix later! (and would make this even faster from the looks of it).
When we're doing a cargo build we don't actually know how many crates we're downloading. This can cause crate numbers to jump up and down.

Definitely some room for improvement, but I think we're not necessarily hit by any showstoppers!

With the latest commit this should also be good to go in terms of landing, I don't know of any further blockers.

ehuss · 2018-09-14T14:37:42Z

Does this need testing when behind a proxy?

The status is still a little jumpy, but better than before. And I definitely don't think it should hold things up, but maybe we can tweak it in the future. I like the new final summary.

alexcrichton · 2018-09-14T16:45:21Z

Ah right yeah, I should test behind some non-http-2 proxies! I'll do that today.

I also plan on filing a few follow-up issues after this lands to cover the remaining work items of inflating in parallel as well as computing the set of crates to download up-front.

ehuss

I think the typos pointed out by ishitatsuyuki still need to be fixed as well.

ehuss · 2018-09-14T16:30:05Z

src/cargo/core/package.rs

+        // We can continue to iterate on this, but this is hopefully good enough
+        // for now.
+        if !self.progress.borrow().as_ref().unwrap().is_enabled() {
+            self.set.config.shell().status("Downloading", &dl.descriptor)?;


What do you think about saying "Downloaded" instead?

Also, just as a general issue, in this mode nothing is displayed until the first crate is finished, which might be a long time. I'm wondering how common this scenario would be. At least for me, my editor integration does not use ttys, so I only get this mode. Would it be possible to print some generic "Downloading crates..." message at the start when progress is disabled?

ehuss · 2018-09-14T16:59:47Z

src/cargo/ops/cargo_doc.rs

-            packages.get(pkgid)
-        })
-        .collect::<CargoResult<Vec<_>>>()?;
+    let mut pkgs = Vec::new();


Can these 10 lines be shared with the duplicate in compile_ws?

It seems to fix some issues perhaps!

alexcrichton · 2018-09-18T18:38:55Z

Also at the recommentation of curl folks I've disabled HTTP1 pipelining, which may also fix the bug you were seeing @ehuss?

ehuss · 2018-09-18T21:12:34Z

I can confirm that the pipelining change fixes the problems I was seeing.

With the pipelining on, I tried a newer version of squid but got the same problems. It was version 4.2, and a very basic conf file:

http_port 8081
http_access allow localhost manager
http_access deny manager
http_access allow localhost

alexcrichton · 2018-09-18T21:17:50Z

Ok! In that case I think we'll leave that turned off but leave HTTP/2 enabled even when proxies are enabled.

Mind taking a final look at the code for an r+?

ehuss · 2018-09-18T21:30:15Z

@bors r+

What do you think about asking the broader community to help test this after it has been on nightly for a little bit? I'm a little concerned about people with weird networks/proxies possibly having issues.

bors · 2018-09-18T21:30:15Z

📌 Commit 9da2e70 has been approved by ehuss

alexcrichton · 2018-09-18T21:31:41Z

Certainly!

Do you think that we should perhaps move the multiplexing (which I think enables HTTP/2?) behind a config option? That way this would be off by default and the call to test could have instructions about how to enable.

Hm... actually I'm gonna do that.

@bors: r-

Let's hopefully weed out any configuration issues before turning it on by default!

alexcrichton · 2018-09-18T21:57:48Z

@bors: r=ehuss

bors · 2018-09-18T21:57:49Z

📌 Commit f4dcb5c has been approved by ehuss

bors · 2018-09-18T22:21:37Z

⌛ Testing commit f4dcb5c with merge 57ac392...

Download crates in parallel with HTTP/2 This PR revives some of the work of #5161 by refactoring Cargo to make it much easier to add parallel downloads, and then it does so with the `curl` crate's new `http2` feature to compile `nghttp2` has a backend. The primary refactoring done here is to remove the concept of "download this one package" deep within a `Source`. Instead a `Source` still has a `download` method but it's considered to be largely non-blocking. If a crate needs to be downloaded it immediately returns information as to such. The `PackageSet` abstraction is now a central location for all parallel downloads, and all users of it have been refactored to be amenable to parallel downloads, when added. Many more details are in the commits...

bors · 2018-09-19T00:05:59Z

☀️ Test successful - status-appveyor, status-travis
Approved by: ehuss
Pushing 57ac392 to master...

This commit brings back the old one-line-per-crate output that Cargo's had since the beginning of time but was removed in rust-lang#6005. This was requested on our [users forum][1] [1]: https://internals.rust-lang.org/t/testing-cargos-parallel-downloads/8466/2

Regressed in rust-lang#6005 it looks like the build plan requires all packages to be downloaded rather than just those coming out of `unit_dependenices`, so let's make sure to download everything! Closes rust-lang#6082

Print a line per downloaded crate This commit brings back the old one-line-per-crate output that Cargo's had since the beginning of time but was removed in #6005. This was requested on our [users forum][1] [1]: https://internals.rust-lang.org/t/testing-cargos-parallel-downloads/8466/2

Fix `--build-plan` with dev-dependencies Regressed in #6005 it looks like the build plan requires all packages to be downloaded rather than just those coming out of `unit_dependenices`, so let's make sure to download everything! Closes #6082

rust-highfive assigned matklad Sep 11, 2018

rust-highfive assigned ehuss and unassigned matklad Sep 11, 2018

ishitatsuyuki reviewed Sep 11, 2018

View reviewed changes

alexcrichton force-pushed the download-parallel branch from c9421a9 to eed35ed Compare September 11, 2018 21:58

alexcrichton force-pushed the download-parallel branch from eed35ed to 14faed2 Compare September 12, 2018 21:13

alexcrichton force-pushed the download-parallel branch 3 times, most recently from c7cb406 to 23b7a29 Compare September 13, 2018 23:33

alexcrichton changed the title ~~Lay some groundwork for downloading crates in parallel~~ Download crates in parallel with HTTP/2 Sep 13, 2018

alexcrichton force-pushed the download-parallel branch 2 times, most recently from 26374a9 to 955ae8d Compare September 13, 2018 23:52

ehuss reviewed Sep 14, 2018

View reviewed changes

Don't enable HTTP1 pipelining as apparently it's flaky

9da2e70

It seems to fix some issues perhaps!

Disable multiplexing by default for now

f4dcb5c

Let's hopefully weed out any configuration issues before turning it on by default!

bors merged commit f4dcb5c into rust-lang:master Sep 19, 2018

alexcrichton mentioned this pull request Sep 22, 2018

Update Cargo in the RLS rust-lang/rls#1063

Merged

Xanewok mentioned this pull request Sep 22, 2018

Download Cargo crates in parallel in project_model rust-lang/rls#1064

Open

alexcrichton deleted the download-parallel branch September 23, 2018 19:41

ehuss mentioned this pull request Sep 24, 2018

--build-plan fails when dev-dependencies are present #6082

Closed

alexcrichton mentioned this pull request Sep 24, 2018

Print a line per downloaded crate #6085

Merged

alexcrichton mentioned this pull request Sep 24, 2018

Fix --build-plan with dev-dependencies #6086

Merged

dwijnand mentioned this pull request Sep 26, 2018

Network operations for packages should be done in parallel #467

Closed

SimonSapin mentioned this pull request Oct 3, 2018

Timeout while fetching Servo’s dependencies on Windows #6125

Closed

alexcrichton mentioned this pull request Oct 17, 2018

Fix timeouts firing while tarballs are extracted #6130

Merged

Eh2406 mentioned this pull request Oct 23, 2018

Network based dependency resolution #5597

Closed

ehuss added this to the 1.31.0 milestone Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download crates in parallel with HTTP/2 #6005

Download crates in parallel with HTTP/2 #6005

alexcrichton commented Sep 11, 2018 •

edited

Loading

rust-highfive commented Sep 11, 2018

alexcrichton commented Sep 11, 2018

Mark-Simulacrum commented Sep 11, 2018

alexcrichton commented Sep 11, 2018

ishitatsuyuki commented Sep 11, 2018 •

edited

Loading

alexcrichton commented Sep 11, 2018

ishitatsuyuki left a comment

ishitatsuyuki Sep 11, 2018

ishitatsuyuki Sep 11, 2018

alexcrichton commented Sep 11, 2018

ehuss commented Sep 11, 2018

ehuss commented Sep 11, 2018

alexcrichton commented Sep 11, 2018

ishitatsuyuki commented Sep 12, 2018

alexcrichton commented Sep 12, 2018

alexcrichton commented Sep 13, 2018

ishitatsuyuki commented Sep 13, 2018

alexcrichton commented Sep 13, 2018

ehuss commented Sep 14, 2018

alexcrichton commented Sep 14, 2018

ehuss left a comment

ehuss Sep 14, 2018

ehuss Sep 14, 2018

alexcrichton commented Sep 18, 2018

ehuss commented Sep 18, 2018

alexcrichton commented Sep 18, 2018

ehuss commented Sep 18, 2018

bors commented Sep 18, 2018

alexcrichton commented Sep 18, 2018

alexcrichton commented Sep 18, 2018

bors commented Sep 18, 2018

bors commented Sep 18, 2018

bors commented Sep 19, 2018

Download crates in parallel with HTTP/2 #6005

Download crates in parallel with HTTP/2 #6005

Conversation

alexcrichton commented Sep 11, 2018 • edited Loading

rust-highfive commented Sep 11, 2018

alexcrichton commented Sep 11, 2018

Mark-Simulacrum commented Sep 11, 2018

alexcrichton commented Sep 11, 2018

ishitatsuyuki commented Sep 11, 2018 • edited Loading

alexcrichton commented Sep 11, 2018

ishitatsuyuki left a comment

Choose a reason for hiding this comment

ishitatsuyuki Sep 11, 2018

Choose a reason for hiding this comment

ishitatsuyuki Sep 11, 2018

Choose a reason for hiding this comment

alexcrichton commented Sep 11, 2018

ehuss commented Sep 11, 2018

ehuss commented Sep 11, 2018

alexcrichton commented Sep 11, 2018

ishitatsuyuki commented Sep 12, 2018

alexcrichton commented Sep 12, 2018

alexcrichton commented Sep 13, 2018

ishitatsuyuki commented Sep 13, 2018

alexcrichton commented Sep 13, 2018

ehuss commented Sep 14, 2018

alexcrichton commented Sep 14, 2018

ehuss left a comment

Choose a reason for hiding this comment

ehuss Sep 14, 2018

Choose a reason for hiding this comment

ehuss Sep 14, 2018

Choose a reason for hiding this comment

alexcrichton commented Sep 18, 2018

ehuss commented Sep 18, 2018

alexcrichton commented Sep 18, 2018

ehuss commented Sep 18, 2018

bors commented Sep 18, 2018

alexcrichton commented Sep 18, 2018

alexcrichton commented Sep 18, 2018

bors commented Sep 18, 2018

bors commented Sep 18, 2018

bors commented Sep 19, 2018

alexcrichton commented Sep 11, 2018 •

edited

Loading

ishitatsuyuki commented Sep 11, 2018 •

edited

Loading