Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment: Create timing report. #7311

Merged
merged 15 commits into from
Sep 17, 2019
Merged

Conversation

ehuss
Copy link
Contributor

@ehuss ehuss commented Aug 30, 2019

This is just an experiment, so I'm not sure if we'll want to merge it.

This adds an HTML report which gets saved to disk when the build is finished. It is primarily geared for identifying slow dependencies, and for visualizing how pipelining affects the build.

Here's an example: https://ehuss.github.io/cargo-timing.html
You can mouse over the blocks to highlight the reverse-dependencies that are released when a unit finishes. syn is a really good example.

It does a few other things, like displaying a message after each unit is finished. See the docs for more information.

@rust-highfive
Copy link

r? @nrc

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 30, 2019
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to say first off that this is awesome to play with. I'm looking forward to iterating on this to a point where users can easily know what the bottleneck of their compilation graph is and easily diagnose performance issues.

Some miscellaneous thoughts reading your generated HTML report:

  • For Cargo, and I suspect other projects, this is a huge amount of information to take in in terms of dependencies. I wonder if we could perhaps hide most units by default? For example if things took less than 100ms they don't necessarily need to be shown (although having them optionally shown would still be good). I find the huge waterfall sort of hard to take in because there's lots of noise to sift through to find what you want, and basically either making it default or dynamically tunable to figure out what happened would be great.

  • Could the output of rustc --version be included in the summary?

  • How come only some of the dependency edges showed up when you hover over things? In your example graph, for example, regex-syntax doesn't appear to unlock anything. Ah I guess it's because something else was the last thing to unlock regex-syntax

  • Printing out a "critical path" would be pretty neat for easy diagnosis.

  • Have you checked around to see if there's any sort of open formats for this sort of data? We've had success taking perf output and throwing it into Firefox's flame graphs on perf-html.io for example, and I think timing in the compiler has tried to use Chrome's viewer for rendering this data. Basically I'm curious if we could also massage this into an output that's usable in a viewer maintained by others which may be a bit more powerful for exploring the data after you capture a run. I'm largely just curious if we can ping the compiler team to see if this sort of data can be imported into the Chrome viewer.

  • We should enable rmeta notifications for everything in -Z timings so we can track codegen time for all units, not just those pipelined perhaps? (or maybe document a disclaimer somewhere)

  • It might be good to have a blurb in the unstable documentation about how to read the generated HTML from Cargo

  • Wow libgit2-sys and cc take way too long to compile. I'm going to try to fix that.

cx.bcx
.config
.shell()
.verbose(|c| c.status("Running", &cmd))?;
self.timings.unit_start(id, self.active[&id]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically for the highest fidelity of timing information we should push this unit_start to just before the process is actually spawned on the worker thread itself (using internal synchronization in self.timings). It's probably not worth doing that just yet though since we're already getting what appears to be pretty accurate timing information.

src/doc/src/reference/unstable.md Show resolved Hide resolved
src/doc/src/reference/unstable.md Show resolved Hide resolved
src/cargo/core/compiler/timings.rs Show resolved Hide resolved
src/cargo/core/compiler/timings.rs Show resolved Hide resolved
src/cargo/core/compiler/timings.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/timings.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/timings.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/timings.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/timings.rs Outdated Show resolved Hide resolved
@bors
Copy link
Contributor

bors commented Sep 3, 2019

☔ The latest upstream changes (presumably #7216) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton
Copy link
Member

Have you checked around to see if there's any sort of open formats for this sort of data?

FWIW I was curious so I was poking around with this today. I was thinking of about:tracing and Chrome's tracing format, but I got some pretty disappointing graphs out of it and it didn't really give me (by default at least) any tools to work with. I wasn't able to really learn much from this output.

I also read over Firefox's format and didn't dive too deep into it but figured it wouldn't work.

All in all I don't think we'll get much use out of trying to use an open format.


One idea I had though was inspired by this post where it'd be ideal if we could flatten this whole graph to see a more linear sequence of events rather than a huge waterfall. We could ideally say what core each process started running on and associate that with rows, so everything would be a bit more compact and you might be able to more visually see idle parts of the build.

If you want to get crazy it'd be pretty sweet to overlay "what's currently building" along with CPU usage over time so we could get an idea of when the CPU is idle, what's building, and what it's bottlenecked on.

To reiterate though there's tons of directions we could take it, and landing anything is better than pontificating about how we could improve it, so I'd lean on the side of landing what we have vs wondering how to implement more fancy features.

@ehuss
Copy link
Contributor Author

ehuss commented Sep 9, 2019

How come only some of the dependency edges showed up when you hover over things?

Right, for the regex-syntax case, regex was still waiting on aho-corasick. I didn't want to show all edges because the number of lines would be overwhelming. It's already a bit of a mess.

Printing out a "critical path" would be pretty neat for easy diagnosis.

What does this mean? Can you give an example of what it would say? It seems like there are many, intertwined critical paths, so I'm not sure what it would mean. Hide the units without edges?

Also, some of the critical paths are just a happenstance of timing. If cargo randomly picks a different unit from the waiting set, the critical path may end up looking very different.

Have you checked around to see if there's any sort of open formats for this sort of data?

I looked around a little, and I didn't see anything that would be usable. If anyone has any ideas, I'd like to try them. I also considered using one of the javascript graphing libraries, but most of them didn't fit, and I wanted to avoid heavy dependencies. One thought I had was to add the JSON output and then allow other tools to process and display it. I could also move the actual HTML part of this PR to a separate repo, but I figured its utility would be severely reduced.

flatten this whole graph

My original version was exactly this (it had one row per cpu). It had some drawbacks. You can't visualize how the dependency edges work (particularly for pipelining). It also ends up just being a solid block of boxes until the end when there is 1 or 2 left. I had mouseover tooltips to help show more information, but it just didn't seem that useful.

The alternative I intended was the graph at the bottom tells you whether or not it is maximizing the concurrency (via the green line). But it is a bit difficult to visualize. I also had a mouseover tooltip on the graph line to show which units were currently active, but I didn't finish it and took it out to get earlier feedback on whether this is generally useful.

I'm happy to continue discussing ideas for what to display and more interactive features. I just wanted to see if this is something we'd like to add at all. I'd like to focus on actionable information — that is what decisions and changes would a user make after reading this information. Seeing the graphs can be "nifty", but if it doesn't help make changes, there isn't much value. And the user does not have very many options (removing dependencies, splitting libraries into smaller pieces, removing features, etc.).

@alexcrichton
Copy link
Member

I just wanted to see if this is something we'd like to add at all. I'd like to focus on actionable information — that is what decisions and changes would a user make after reading this information.

I personally very much agree with this, and I should probably frame my comments moreso in how I think that this sort of information will be interpreted. When I personally analyze graphs like this I'm looking for a few things:

  • Where is there idle CPU parallelism to be taken advantage of? For example is a dependency compiled very late in the dependency graph when it actually should be built much earlier? For example im-rc is built super later in Cargo's graph, but that's only because its build script depends on rustc_version which depends on serde which has its own big dependency tree. By removing the dependency on rustc_version we could shift most of this bottlenecking build time from the end of the build to idle CPU in the middle.

    This is what I was getting at with a more flat view where you might be able to more visually see what's happening for parallelism during the build. We ideally want "entirely full until the end" but most projects don't end up with that sort of graph right now.

  • What dependencies take a particularly long time in rustc? These are good candidates for splitting up, trying to move more to monomorphization, etc. In general long-compiling crates should have as few dependencies as possible and scheduled as early in the build as possible. Or in other words, a build should just minimize the number of "big crates" it has.

  • Why did it take so long to get my final crate to start compiling? What was the blocker to getting it running, aka what dependency has the chain that took the longest to compile. This is sort of what I was getting at with the critical path. I definitely agree that the critical path isn't the same between builds and is pretty nondeterministic, but I suspect many crate graphs have a pretty deterministic critical path, and raising awareness about that seems pretty useful. For example Cargo's critical path largely included cc-rs -> libgit2-sys -> git2 -> git2-curl -> cargo, and that's because libgit2-sys took so darned long to compile since it wasn't parallel.

I think that the current implementation you've got here is good enough for most of these metrics above. The reason I was curious about open source tools is that exploring the current graph is pretty difficult. For example my browser (Firefox, maybe it's buggy?) gets super janky when I'm scrolling around the giant SVG. Additionally it's difficult to see what unlocked a dependency (the line drawn into it) as well as what it unlocked because the units it connects to may be either above or below the fold. These aren't really things I expect us to fix, but it'd be cool if there was an open viewer for this that already solved these problems, but I wasn't quite able to find one!

FWIW I think it's also worth pointing out that the green line is an approximation of parallelism but it doesn't take into account internal parallelism in rustc (because it can't, really). That may not matter a huge amount for most projects though.

Overall I'd be fine merging this basically as-is with a few small improvements in the inline comments above. It's pretty low-risk in terms of maintenance and if we ever want to remove it it's not that hard to remove.

@bors
Copy link
Contributor

bors commented Sep 11, 2019

☔ The latest upstream changes (presumably #7351) made this pull request unmergeable. Please resolve the merge conflicts.

@ehuss
Copy link
Contributor Author

ehuss commented Sep 17, 2019

I switched to using canvas which should be faster. Let me know if it has any problems. It also allows interactive controls, and I added a few as examples.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few stylistic nits, but otherwise I think this is basically ready to merge.

One thing I'd be curious to do is to overlay the "# Units" graph with CPU usage of the host system during that time (graphs like this) so we could get an idea of what the CPU usage is as well, knowing that if the number of active units doesn't shoot up you might still actually be using all the parallelism you've got locally (or not, depends). I wouldn't mind trying to whip that up after this lands though

src/cargo/core/compiler/job_queue.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/timings.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/timings.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/mod.rs Outdated Show resolved Hide resolved
src/doc/src/reference/unstable.md Show resolved Hide resolved
@alexcrichton
Copy link
Member

Also I can confirm the canvas is indeed speedy, and the touch to color build script executions a different colors was nice!

@ehuss
Copy link
Contributor Author

ehuss commented Sep 17, 2019

Updated with review comments. Thanks!

CPU usage of the host system

I'd love to see that. I was reluctant to add a dependency for this, since I suspect getting it to compile and work on all host platforms will be tricky.

I also experimented with changing jobserver to report what it thinks the current concurrency is, which could be more useful with parallel-rustc. That's something we could also consider for later.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Sep 17, 2019

📌 Commit 8be10f7 has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 17, 2019
bors added a commit that referenced this pull request Sep 17, 2019
Experiment: Create timing report.

This is just an experiment, so I'm not sure if we'll want to merge it.

This adds an HTML report which gets saved to disk when the build is finished.  It is primarily geared for identifying slow dependencies, and for visualizing how pipelining affects the build.

Here's an example: https://ehuss.github.io/cargo-timing.html
You can mouse over the blocks to highlight the reverse-dependencies that are released when a unit finishes.  `syn` is a really good example.

It does a few other things, like displaying a message after each unit is finished.  See the docs for more information.
@bors
Copy link
Contributor

bors commented Sep 17, 2019

⌛ Testing commit 8be10f7 with merge d764fff...

@bors
Copy link
Contributor

bors commented Sep 17, 2019

☀️ Test successful - checks-azure
Approved by: alexcrichton
Pushing d764fff to master...

@bors bors merged commit 8be10f7 into rust-lang:master Sep 17, 2019
@csmoe
Copy link
Member

csmoe commented Sep 21, 2019

@ehuss the js render seems not happy with a huge project(~550 deps)
image

@est31
Copy link
Member

est31 commented Sep 21, 2019

@csmoe this sounds similar to #7388 and #7399 . Can you check whether it works with #7397 applied?

@csmoe
Copy link
Member

csmoe commented Sep 21, 2019

@est31 thank you, rebuilding :)

@luser
Copy link
Contributor

luser commented Oct 23, 2019

FYI, I know I had mentioned this before but I have an old branch laying around where I hacked cargo to output profiling info in Chrome's tracing format:
luser@a241a1f

There are a zillion formats for profiling/tracing info like this unfortunately. The nice thing about supporting someone else's format is that you can use other tools to explore it like Chrome's chrome://tracing, Firefox's profiler.firefox.com, or SpeedScope. Speedscope's docs mention a whole pile of formats it supports. That list includes flamescope which is a Rust crate that can output Speedscope's preferred format, so it ought to be possible to refactor that crate such that you could use it in cargo to generate that output.

@ehuss ehuss added this to the 1.39.0 milestone Feb 6, 2022
@ehuss ehuss mentioned this pull request Jul 13, 2022
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants