Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cargo should release Make jobserver token when it's waiting for file lock #6747

Open
glandium opened this issue Mar 14, 2019 · 9 comments
Open
Labels
A-jobserver Area: jobserver, concurrency, parallelism S-triage Status: This issue is waiting on initial triage.

Comments

@glandium
Copy link
Contributor

Consider the following Makefile:

default: first second third

.PHONY: first second
first second:
	+cargo build -p $@ --verbose

third:
	echo start $@
	sleep 1
	echo end $@

And the directory containing the Makefile contains the following Cargo.toml:

[workspace]
members = [
  "first",
  "second",
]

first and second are just created with cargo new, with an additional build.rs each that contains the following:

use std::{thread, time};

fn main() {
  thread::sleep(time::Duration::from_secs(2));
}

Now, run the build with make -j2.

The timeline of what happens looks like the following:

  • 0s: make executes the two cargo commands.
  • 0s: first cargo command invokes rustc for build.rs and runs build_script_build ; second cargo command prints "Blocking waiting for file lock on build directory"
  • 2s: first cargo finishes running build_script_build, builds the rest of the first crate
  • 2s: second cargo command invokes rustc for build.rs and runs build_script_build
  • 2s: make executes echo start third then sleep 2
  • 4s: second cargo finishes running build_script_build, builds the rest of the second crate
  • 4s: make executes echo end third.

Obviously, this is all synthetic, and cargo releasing the make jobserver token as proposed in the summary would not change the outcome in this example, being that the build would still take 4 seconds. But what it would allow is for make to start running the third target earlier, and that could make a hell of a difference.

The real world manifestation is https://bugzilla.mozilla.org/show_bug.cgi?id=1533988 which is caused by the Firefox build system invoking three cargo commands in the same workspace, two of which are "Blocking waiting for file lock on build directory", and the one that does work is actually likely to not be using all the cores available because of how a lot of the dependency chain of the crates built is linear. So the two cargos doing nothing are preventing more C++ compiler processes working while the first cargo works.

Cc: @alexcrichton

alexcrichton added a commit to rust-lang/jobserver-rs that referenced this issue Mar 14, 2019
@alexcrichton
Copy link
Member

Oh wow, a fascinating find!

@glandium would you be able to test out #6748 to see if it fixes the issue you've found here?

alexcrichton added a commit to rust-lang/jobserver-rs that referenced this issue Mar 14, 2019
@glandium
Copy link
Contributor Author

It seems to work, at least with the small testcase from this report and a few other small testcases I derived from it.

@upsuper
Copy link
Contributor

upsuper commented Mar 15, 2019

@glandium pinged me on the original bug, so I tried to test that PR on the exactly same environment (a GCP n1-highcpu-8 instance). In conclusion, that patch helps CPU utilization a lot, although it doesn't seem to help on total build time of Firefox at all (which is a separate issue related to Firefox's build system I guess).

This is the CPU usage graph building Firefox on this instance with an unpatched Cargo:
Screenshot_2019-03-15 Compute Engine - Building Firefox - Google Cloud Platform

This is the CPU usage graph with an patched Cargo:
Screenshot_2019-03-16 Compute Engine - Building Firefox - Google Cloud Platform

As can be seen that with the patched Cargo, CPU utilization is keep being 100% for most of the time, while previously much less.

@alexcrichton
Copy link
Member

Thanks for investigating @glandium and @upsuper, those are definitely some convincing graphs :)

bors added a commit that referenced this issue Mar 15, 2019
Release a jobserver token while locking a file

This is a possible solution to #6747, but we'll ideally get some testing
in before landing!
@bors bors closed this as completed in d19b41f Mar 15, 2019
@glandium
Copy link
Contributor Author

glandium commented Aug 6, 2019

#6748 seems to have made it to 0.37.0, which would correspond to rust 1.36.0, right? Unfortunately, it seems there might still be some related issue. https://taskcluster-artifacts.net/HqtjBjwlSFG7K6E-DQYasQ/0/public/build/build_resources.html

@alexcrichton
Copy link
Member

FWIW this ended up getting reverted in #7201 due to #7200

@glandium
Copy link
Contributor Author

glandium commented Aug 6, 2019

So this should be reopened, right?

@glandium
Copy link
Contributor Author

glandium commented Aug 6, 2019

That said, it's only been reverted this week, so 1.36.0 should have it?

@alexcrichton
Copy link
Member

Sure, can reopen. I don't know how this is actionable though with a solution.

@alexcrichton alexcrichton reopened this Aug 7, 2019
@ehuss ehuss added the A-jobserver Area: jobserver, concurrency, parallelism label Apr 6, 2020
@epage epage added the S-triage Status: This issue is waiting on initial triage. label Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-jobserver Area: jobserver, concurrency, parallelism S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

5 participants