-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalable jobserver for rustc #7731
Conversation
aaaf59a
to
747a7f6
Compare
CI here looks to be mostly failing because the current branch expects rustc to have -Zjobserver-token-requests. |
☔ The latest upstream changes (presumably #7737) made this pull request unmergeable. Please resolve the merge conflicts. |
Can this get a doc-comment in |
Hm, do you mean job_queue.rs? job_server.rs doesn't appear to exist in this repository. I would be happy to provide such a doc comment -- it's worth noting that I'm guessing we'll be landing this behind some sort of -Z flag as well initially (as we test it out and such) just because it's a non-trivial overhaul of the protocol we use with rustc. |
Oops, yea, typo. Should be |
b6e9714
to
b180ea8
Compare
I've added some (somewhat) high level docs to job_queue and rebased. |
Ok I've read over this and it looks pretty good to me, certainly for the purposes of testing this looks like it'll cover all our bases. @Mark-Simulacrum do you want to update your PR in rust-lang/rust to have a Cargo submodule with this commit, and then make a try build so we can test that out? I think longer-term I'd want to refactor the jobserver management in Cargo here, it feels a bit spaghetti-like and I'd like to make it a bit more concise (if we can) to make it a bit more robust. Nothing concrete I can think of, just a general feeling of "this feels bolted on" which is not at all your fault, just something for me to stew on :) Also, for the vecs, I think they can be relatively trivially converted to hash maps? I'd expect something like: let rustcs_with_a_token: HashMap<u32, Vec<Token>>;
let rustcs_wanting_a_token: HashMap<u32, (usize, Client)>; // `usize` == how many tokens requested |
Yeah, I think the HashMaps would work. I moved between a few different schemes during development and had started out with Vecs. I'll go ahead and get the rustc branch to point at this branch (and probably rebase both, while I'm at it) and kickoff a try build. |
b180ea8
to
e2d8d98
Compare
Okay I've force pushed this and the rust-lang/rust branch and kicked off a try build over there, hopefully that all works out. I didn't look at implementing the HashMap stuff as I'm also a bit strapped for time :) |
☔ The latest upstream changes (presumably #7776) made this pull request unmergeable. Please resolve the merge conflicts. |
e2d8d98
to
c38d324
Compare
CI failure is.. scary "E: Failed to fetch https://packages.microsoft.com/ubuntu/16.04/prod/dists/xenial/main/binary-amd64/Packages Writing more data than expected (622228 > 606754)" But seems unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overall looks very good to me now, thanks!
On closer reading I think the only "major" thing I'd like to see is a refactoring of the job queue code to have all of its assorted local variables wrapped up in a struct
with a name, methods, etc. I think that'll make edits a bit easier to track and also easier to document things.
src/cargo/core/compiler/mod.rs
Outdated
@@ -1081,8 +1101,9 @@ fn on_stderr_line( | |||
package_id: PackageId, | |||
target: &Target, | |||
options: &mut OutputOptions, | |||
rustc_client: Option<&Client>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may perhaps be a bit cleaner to instead of passing rustc_client
around and sending it through messages to instead store it elsewhere. The processes spawned by the job queue I think should all have an optional Client
associated with them (stored internally in ProcessBuilder
), and job_queue
could keep track of a map from id to Client
in that case.
@@ -1210,6 +1232,36 @@ fn on_stderr_line_inner( | |||
} | |||
} | |||
|
|||
#[derive(serde::Deserialize)] | |||
struct JobserverNotification { | |||
jobserver_event: Event, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll probably want to bikeshed this a bit, but the json messages coming out of rustc aren't exactly the most structured things in terms of precedent anyway.
I'd ideally prefer to not ignore errors if Cargo is sure that event has to do with jobserver events, whereas right now if there's a slight mismatch Cargo will ignore everything coming from rustc. In any case this is probably a concern best left for stabilization.
babeba6
to
5cb8b08
Compare
Okay I've updated the PR, I think I addressed most of your comments. I opted not to implement the rustc_client out-of-band storage -- that looked pretty complicated; it seems like there's a semi-opaque wall around "Work" / "Job" units and it was unclear to me how to thread state through that... maybe I missed something obvious though. Furthermore, we eventually do need to keep track of the number of requests for a given JobId, so while we could e.g. do a usize for that, it seems simpler to just send around clients. Of course if you disagree then happy to try and dig in more :) I've also -- in the process of refactoring the main loop and extracting things out into methods -- tried to add documentation here and there but since I've been fairly embedded into this code for a few hours now I'm not sure where all confusion lies anymore :) I think someone with a semi-fresh set of eyes (you) would be more helpful in basically saying "this could use a comment" -- happy to write them up then. The latest set also shifts around the algorithm for jobserver token assignment under the per-rustc model, where before we were eagerly giving tokens to the first rustc that requested them, which plausibly lead to pretty poor "multi-rustc" parallelism in practice (since the first one would likely gobble up tokens, then the second one would do the same, and so forth, presuming highly parallel and quick to request tokens rustcs). I'm not sure we were observing that behavior but it seemed possibly suboptimal. The new strategy is to dedicate assignment to the main loop, which now proceeds in a strict "least recently requested" fashion among both types of requests. Specifically, we will first allocate tokens to all pending process token requests (i.e., high-level units of work), and then (in per-rustc jobserver mode) follow up by assigning any remaining tokens to rustcs, again in the order that requests came into cargo (first request served first). It's not quite clear that that model is good -- at least for the latter bit, maybe randomly spreading tokens around would be better, or some other heuristic, but on the other hand this means that it is likely that long-running crates will get more tokens than short-running crates, which means that crates like As a fairly orthogonal note I'm also somewhat increasingly unhappy with the jobserver crate's API -- I keep hitting the problem that the "request a token" interface does not provide for cancellation -- and we've also wanted e.g. Acquired::forget in rustc, so I'll probably put some resources there before moving on to fixing up the rustc PR. Generally my plan is to get this landed or in a state ready for landing and then move on to fixing up the rustc PR :) |
Tests here are all going to fail as I missed a minor detail where previously cargo would always immediately spawn up a rustc process ("sharing" the cargo implicit jobserver token with that rustc) but the new implementation would make that pretty painful -- ideally we'd synthesize the token out of nowhere, but the jobserver crate makes that hard and in any case it's not quite what we want. I think to get the correct behavior we'd basically end up with a lot of code trying to shuffle around the implicit token -- loosely, it'd look something like what the current codegen stuff does in rustc I imagine (less complicated due to everything being threadsafe though). I'm going to try and think some more on this, maybe we can discuss synchronously tomorrow too. Don't have any great ideas here :/ |
Oh, also forgot to mention, I updated the PR description with a summary of the PR, though it loosely duplicates what I said already in a comment. |
src/cargo/core/compiler/job_queue.rs
Outdated
/// We should only have tokens here "in-between" loops of the | ||
/// `drain_the_queue` function, i.e., when we've acquired a bunch of tokens | ||
/// and are about to give them to rustcs. | ||
unused_tokens: Vec<Acquired>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, to clarify, can this be a separate struct which only lives for the one stack frame where we're running stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as to this field specifically, I don't think we're going to want it. There should never be any long-term storage for unused tokens.
src/cargo/core/compiler/job_queue.rs
Outdated
.len() | ||
.saturating_sub(self.pending_queue.len()); | ||
for _ in 0..tokens_needed { | ||
jobserver_helper.request_token(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I'm a little worried about since it will request a huge amount more than before. I don't think we want to do these requests on each turn of the loop, we should still strive to match up a request to an acquire 1:1
5cb8b08
to
0d41076
Compare
Okay I dropped the commit which refactored into the unused_tokens and so forth and then restored it a bit. I also moved the state into a new struct which only exists while we're draining the job queue. |
Turns out the crossbeam scope was needed, unlike what the comment indicated... |
src/cargo/core/compiler/job_queue.rs
Outdated
} | ||
} | ||
} else { | ||
self.tick_progress(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the purpose of this loop is to actually not call this in the case that the events are not empty, so was this perhaps a stray refactoring error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially in the case where rustc is using this channel for the jobserver protocol, events may not be empty for a long time (i.e., we're constantly trying to wait for threads, releasing tokens, etc.). I can drop this from this PR though and file it separately since it's a drive by change if anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ok, mind moving this to a separate commit? I think it'd be best to handle this with a sort of heartbeat message if it becomes problematic. Did you see extraneous stuttering you weren't expecting when building though?
f247393
to
eee14c0
Compare
I think that this latest push resolves your comments (dropped I also realize this has a ton of commits now -- do you want me to squash them down? I'll look at the CI failures later. |
This also moves and enhances the message logging the state before blocking.
eee14c0
to
7e3e1b1
Compare
CI is now passing. |
@bors: r+ Very nice, thanks so much! Commits are fine :) |
📌 Commit 7e3e1b1 has been approved by |
Scalable jobserver for rustc This refactors the job queue code for support of [per-rustc process jobservers](rust-lang/rust#67398). Furthermore, it also cleans up the code and refactors the main loop to be more amenable to understanding (splitting into methods and such). Assignment of tokens to either rustc "threads" or processes is dedicated to the main loop, which proceeds in a strict "least recently requested" fashion among both thread and process token requests. Specifically, we will first allocate tokens to all pending process token requests (i.e., high-level units of work), and then (in per-rustc jobserver mode) follow up by assigning any remaining tokens to rustcs, again in the order that requests came into cargo (first request served first). It's not quite clear that that model is good (no modeling or so has been done). On the other hand this strategy should mean that long-running crates will get more thread tokens once we bottom out in terms of rustc parallelism than short-running crates, which means that crates like syn which start early on but finish pretty late should hopefully get more parallelism nicely (without any more complex heuristics). One plausible change that may be worth exploring is making the assignment prefer earlier rustc's, globally, rather than first attempting to spawn new crates and only then increasing parallelism for old crates. syn for example frequently gets compiled in the early storm of dozens of crates so is somewhat unlikely to have parallelism, until fairly late in its compilation. We also currently conflate under this model the rayon threads and codegen threads. Eventually inside rustc those will probably(?) also be just one thing, and the rustc side of this implementation provides no information as to what the token request is for so we can't do better here yet.
☀️ Test successful - checks-azure |
This PR seems to be causing panics with debug builds of Cargo. @Mark-Simulacrum can you check? The repro I have is:
It panics at progress.rs:259 where in this case that function is called with bactraceCompiling bitflags v1.2.1 thread 'main' panicked at 'attempt to subtract with overflow', src/cargo/util/progress.rs:259:21 stack backtrace: 0: backtrace::backtrace::libunwind::trace at /Users/runner/.cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88 1: backtrace::backtrace::trace_unsynchronized at /Users/runner/.cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66 2: std::sys_common::backtrace::_print_fmt at src/libstd/sys_common/backtrace.rs:77 3: ::fmt at src/libstd/sys_common/backtrace.rs:59 4: core::fmt::write at src/libcore/fmt/mod.rs:1057 5: std::io::Write::write_fmt at src/libstd/io/mod.rs:1426 6: std::sys_common::backtrace::_print at src/libstd/sys_common/backtrace.rs:62 7: std::sys_common::backtrace::print at src/libstd/sys_common/backtrace.rs:49 8: std::panicking::default_hook::{{closure}} at src/libstd/panicking.rs:195 9: std::panicking::default_hook at src/libstd/panicking.rs:215 10: std::panicking::rust_panic_with_hook at src/libstd/panicking.rs:463 11: rust_begin_unwind at src/libstd/panicking.rs:371 12: std::panicking::begin_panic 13: std::panicking::begin_panic 14: cargo::util::progress::Format::progress at src/cargo/util/progress.rs:259 15: cargo::util::progress::State::tick at src/cargo/util/progress.rs:176 16: cargo::util::progress::Progress::tick_now at src/cargo/util/progress.rs:107 17: cargo::core::compiler::job_queue::DrainState::tick_progress at src/cargo/core/compiler/job_queue.rs:714 18: cargo::core::compiler::job_queue::DrainState::wait_for_events at src/cargo/core/compiler/job_queue.rs:600 19: cargo::core::compiler::job_queue::DrainState::drain_the_queue at src/cargo/core/compiler/job_queue.rs:648 20: cargo::core::compiler::job_queue::JobQueue::execute::{{closure}} at src/cargo/core/compiler/job_queue.rs:384 21: crossbeam_utils::thread::scope::{{closure}} at /Users/eric/.cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/crossbeam-utils-0.7.0/src/thread.rs:161 22: as core::ops::function::FnOnce<()>>::call_once at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/panic.rs:318 23: std::panicking::try::do_call at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/panicking.rs:296 24: __rust_maybe_catch_panic at src/libpanic_unwind/lib.rs:79 25: std::panicking::try at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/panicking.rs:272 26: std::panic::catch_unwind at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/panic.rs:394 27: crossbeam_utils::thread::scope at /Users/eric/.cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/crossbeam-utils-0.7.0/src/thread.rs:161 28: cargo::core::compiler::job_queue::JobQueue::execute at src/cargo/core/compiler/job_queue.rs:384 29: cargo::util::config::clippy_driver::{{closure}} 30: cargo::ops::cargo_compile::compile_ws at src/cargo/ops/cargo_compile.rs:469 31: cargo::ops::cargo_compile::compile_with_exec at src/cargo/ops/cargo_compile.rs:259 32: cargo::ops::cargo_compile::compile at src/cargo/ops/cargo_compile.rs:248 33: cargo::ops::cargo_test::compile_tests at src/cargo/ops/cargo_test.rs:69 34: cargo::ops::cargo_test::run_tests at src/cargo/ops/cargo_test.rs:21 35: cargo::commands::test::exec at src/bin/cargo/commands/test.rs:162 36: cargo::cli::execute_subcommand at src/bin/cargo/cli.rs:197 37: cargo::cli::main at src/bin/cargo/cli.rs:107 38: cargo::main at src/bin/cargo/main.rs:39 39: std::rt::lang_start::{{closure}} at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/rt.rs:67 40: std::rt::lang_start_internal::{{closure}} at src/libstd/rt.rs:52 41: std::panicking::try::do_call at src/libstd/panicking.rs:296 42: __rust_maybe_catch_panic at src/libpanic_unwind/lib.rs:79 43: std::panicking::try at src/libstd/panicking.rs:272 44: std::panic::catch_unwind at src/libstd/panic.rs:394 45: std::rt::lang_start_internal at src/libstd/rt.rs:51 46: std::rt::lang_start at /rustc/85976442558bf2d09cec3aa49c9c9ba86fb15c1f/src/libstd/rt.rs:67 47: cargo::init_git_transports |
Hm, not impossible, will take a look. I guess we're generating too many completed jobs? It'll probably be tomorrow at the earliest though, if necessary please do revert. |
Filed #7829 to resolve this bug. |
Store maximum queue length Previously, the queue length was constantly decreasing as we built crates, which meant that we were incorrectly displaying the progress bar. In debug builds, this even led to panics (due to underflow on subtraction). Not sure if we can add a test case for this. I have made the panic unconditional on release/debug though by explicitly checking that current is less than the maximum for the progress bar. Fixes #7731 (comment).
while let Err(e) = client.acquire_raw() { | ||
anyhow::bail!( | ||
"failed to fully drain {}/{} token from jobserver at startup: {:?}", | ||
i, | ||
tokens, | ||
e, | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intent of this loop? Should it maybe be something like this?
client.acquire_raw().chain_err(|| {
format!(
"failed to fully drain {}/{} token from jobserver at startup",
i, tokens
)
})?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think so -- it doesn't really make any difference (bail! returns from the method anyway), but I'll send a PR cleaning it up. I suspect I had intended this as "loop until you error" but that's clearly not what this does :)
This refactors the job queue code for support of per-rustc process jobservers. Furthermore, it also cleans up the code and refactors the main loop to be more amenable to understanding (splitting into methods and such).
Assignment of tokens to either rustc "threads" or processes is dedicated to the main loop, which proceeds in a strict "least recently requested" fashion among both thread and process token requests. Specifically, we will first allocate tokens to all pending process token requests (i.e., high-level units of work), and then (in per-rustc jobserver mode) follow up by assigning any remaining tokens to rustcs, again in the order that requests came into cargo (first request served first).
It's not quite clear that that model is good (no modeling or so has been done). On the other hand this strategy should mean that long-running crates will get more thread tokens once we bottom out in terms of rustc parallelism than short-running crates, which means that crates like syn which start early on but finish pretty late should hopefully get more parallelism nicely (without any more complex heuristics).
One plausible change that may be worth exploring is making the assignment prefer earlier rustc's, globally, rather than first attempting to spawn new crates and only then increasing parallelism for old crates. syn for example frequently gets compiled in the early storm of dozens of crates so is somewhat unlikely to have parallelism, until fairly late in its compilation.
We also currently conflate under this model the rayon threads and codegen threads. Eventually inside rustc those will probably(?) also be just one thing, and the rustc side of this implementation provides no information as to what the token request is for so we can't do better here yet.