cmd/go: critical path scheduling #8893

dvyukov · 2014-10-07T12:39:45Z

Currently go tool schedules work in topological order, this leads to suboptimal
parallelization. It's possible that towards end, very few work items are available
leading to idle processors.

Critical path scheduling gives much better results in this respect:
http://en.wikipedia.org/wiki/Critical_path_method

Here is a prototype CL:
https://golang.org/cl/13235046/diff/3001/src/cmd/go/build.go
I've observed up to 2x 'go test -a std' speedup on a 32-way machine.

We can also assign meaningful weights to work nodes: e.g. test > link > compile.
It would also be super useful to somehow assign weights to test nodes proportional to
test run time (e.g. runtime tests take significantly more than than most packages). This
will have significant effect even on low-core machines (as runtime tests will start
first).

rsc · 2014-10-07T21:10:19Z

Comment 1:

Please send CL after Go 1.4 is out.

Labels changed: added release-go1.5, removed release-none.

Status changed to Accepted.

josharian · 2015-02-07T03:40:52Z

@dvyukov are you planning to migrate that CL to gerrit and mail? Does it speed up make.bash as well?

dvyukov · 2015-02-07T09:28:13Z

Is anybody planning to review it to +2 state?

josharian · 2015-02-07T22:08:27Z

I promise to review it thoroughly (although I'll be in and out until late March). And rsc asked for it to be mailed. If you don't want to bother, do you mind if I take it over?

The existing CL looks possibly incomplete. assignPriorities contains (lines 648-649):

if a.priority < a1.priority+1 {
}

dvyukov · 2015-02-08T13:51:27Z

Take over it.

That if should be:

if a.priority < a1.priority+a.weight {
a.priority = a1.priority+a.weight
}

Also weight must be more sensible, that is, represent real cost of action. Something along the lines of:
link > compile
run test > link
Fake instant actions should have weight << everything else.
The goal here is to avoid serial tail of actions that run one after another delaying whole thing.

Ideally 'run test' weights represent actual execution time, so that runtime test starts first rather then starts late and delays all.bash. This may be possible in 'go test' continuos mode that monitors filesystem changes. But that's of course not a part of this change.

minux · 2015-02-08T18:32:37Z

And compiling anything involving cgo should be even heavier weight than compiling any other packages. If we reorder tests, do we need to preserve the order of showing the results? I think we tried hard to show a sorted result for "go test -short std".

dvyukov · 2015-02-08T18:45:16Z

And compiling anything involving cgo should be even heavier weight than
compiling any other packages.

good idea

If we reorder tests, do we need to preserve the order of showing the
results?

absolutely
but that is already ensured by chaining printing actions with dependencies, otherwise they would be printed out of order even today with p>1
so you don't need to do anything special here

josharian · 2015-03-17T23:49:26Z

I still plan to take over this CL. I am just waiting for the compiler performance to stabilize a bit, so that I can get more realistic measurements of the relative speeds of different kinds of actions.

DO NOT REVIEW This is a straight port of Dmitry's CL 13235046. TODO: * Understand better how it works. * Assign better weights. * Convince myself that it is an improvement. Early measurements are not yet compelling, although that could be because the weights are not well-chosen. Notes on weights from the issue: link > compile run test > link Fake instant actions << everything else cgo increases weights Fixes golang#8893. Change-Id: I8fba875a9a56aed132bd6b37b607e374f336728d

bradfitz · 2015-05-07T02:59:07Z

This is pretty high priority for our all-compile builder. I increased it from 2 cores to 16 cores and we got almost no additional speed out of the builder.

josharian · 2015-05-07T03:13:51Z

I'll put this back in my queue (next step is attaching some more realistic weights.), but I'm unlikely to make much progress this week.

In the meantime, if you pull in https://go-review.googlesource.com/#/c/8918 as-is, does it help? It didn't show much improvement for me locally.

josharian · 2015-05-08T02:41:01Z

@bradfitz any numbers on that CL on a big machine?

@dvyukov @minux I spent a bit of time looking at tool exec times for 6g, 6l, and asm. asm is very fast, linking is very slow, but 6g has a huge amount of variability--two orders of magnitude. The variability goes down a tiny bit if you look at the number of files to compile, but not much. The variability is down to one order of magnitude if you look at the number of bytes to be compiled, but I don't think that we want to stat every file before getting started. Any other ideas for things to look at? As is, I think the best we can do for weights is very rough estimates, like fake=1, asm=10, gc=100, link=1000, cgo=1000. Any other ideas?

mwhudson · 2015-05-08T02:47:20Z

We already do stat every file before getting started (to compute staleness). So maybe that information can be saved somewhere?

dvyukov · 2015-05-08T07:23:18Z

Number of files look like a good first approximation.
Does this thing help with 8-16 cores?

dvyukov · 2015-05-08T07:23:49Z

Probably also open files to read build tags.

bradfitz · 2015-05-08T19:41:18Z

Sorry, I haven't had any free time to test this.

josharian · 2015-05-08T19:59:10Z

No prob. I'm in the middle of running some tests now, actually, and it looks better than before. Expect a minimal CL and more comments here later today.

josharian · 2015-05-09T03:41:46Z

My earlier optimism was the result of a bug.

I have 8 logical (4 real) CPUs on my laptop. It spent the afternoon running three build tests in a loop: go build -a std, go test -short std, all.bash. I had everything else shut down, measured wall time, and ran the results through a t-test. At 50 iterations (20 for all.bash, which is SLOW), the CL showed no statistically significant improvement, and the means were within 1%.

I then hacked in the best available predictor: Store previous action times on disk. Even with almost perfect weights, I got pretty much the same results.

Furthermore, this re-ordering hurts the experience of running all.bash, as there are long stalls between test outputs, which then come in spurts.

Maybe this helps with lots more cores than I have--Brad, I guess I would like your help confirming that after all. As it is, though, I don't feel comfortable mailing the CL, since I have no way to demonstrate its value.

minux · 2015-05-09T04:11:42Z

What has changed? Dmitry's initial post reported a up to 2x improvement on go test -a std. Is it because the compiler is now written in Go and are slower?

josharian · 2015-05-09T04:23:47Z

Dmitry's initial post said 2x speed-up on a 32 core machine. Maybe my machine (which is probably typical) doesn't have enough cores to benefit.

minux · 2015-05-09T04:39:27Z

Where is the CL? I can test on a 16-way machine.

josharian · 2015-05-09T04:48:04Z

CL 8918. It is a straight port of Dmitry's original CL.

minux · 2015-05-09T05:31:21Z

My initial observations: time taken for `go test -a std` is not a good performance measures for this change because there are some packages (e.g. math/big, runtime) that take a very long time to test, so the serial portion of the runtime is significantly skewed by those packages. Comparing the time of go install -a std cmd on the 16-way machine showed no significant impact on run time (~28s, CPU usage ~700%). Comparing the time of go install -a std cmd golang.org/x/tools/cmd/godoc on the same machine still doesn't show any much impact (~34s, CPU usage ~600%) The critical path scheduling will only show benefit where are many floating tasks with positive slacks, perhaps the dependency graph has changed a lot after the Go rewrite of the toolchain? Or perhaps there are unneeded dependency edges in the graph?

dvyukov · 2015-05-10T09:02:21Z

I've done some testing using time go build -a -p 4 std.
Before:

real    0m11.527s
real    0m11.478s
real    0m11.353s
real    0m11.544s
real    0m11.586s

After:

real    0m10.443s
real    0m10.496s
real    0m10.487s
real    0m10.348s
real    0m10.451s

There is clear 10% win.
I've also modified cmd/go to print number of executing actions every 500ms to confirm the win.
Before:

EXECUTING 1
EXECUTING 1
EXECUTING 1
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 2
EXECUTING 1
EXECUTING 1
EXECUTING 1
EXECUTING 1
EXECUTING 2

After:

EXECUTING 1
EXECUTING 1
EXECUTING 1
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 1
EXECUTING 1
EXECUTING 4

There is clear difference towards end of build -- old code leads to lower parallelism because previous actions were scheduled badly.

Also, the win highly depends on work graph structure. It is possible to come up with dependency structure which will be handled arbitrary inefficiently by the old code. So some users can see higher speedup than we see on std lib. Moreover, currently scheduling depends on package names (or how does it order them now?), which is just weird. You can increase/reduce build speed by renaming packages.

Also, better scheduling will have more impact when we make actions finer-grained. In particular split cgo compilation and maybe asm compilation.

So I am sure that this is a positive change and we should do it. The additional complexity is minimal and very localized.

josharian · 2015-05-11T22:57:06Z

There is clear 10% win.

I don't know why I can't replicate that speed-up. Maybe you should take the CL back over, with my apologies. It doesn't make sense for me to continue to work on it when I cannot observe its impact (and thus can't tell whether my changes hurt or help).

better scheduling will have more impact when we make actions finer-grained.

Agreed. It is very coarse right now. And having thought about it more, simply saving the execution time of past identical actions might not be a bad long-term strategy for weight estimation.

In particular split cgo compilation and maybe asm compilation.

asm compilation is very fast. At least in the stdlib, there's very little cgo, so I don't think this will impact all.bash much, but it could make a big difference for heavy cgo users.

dvyukov · 2015-05-12T10:02:31Z

@josharian have you tried running exactly "time go build -a -p 4 std"?

What does the following debug output shows on your machine?

diff --git a/src/cmd/go/build.go b/src/cmd/go/build.go
index fda126b..ceca4c5 100644
--- a/src/cmd/go/build.go
+++ b/src/cmd/go/build.go
@@ -18,6 +18,7 @@ import (
        "os"
        "os/exec"
        "path"
+       "sync/atomic"
        "path/filepath"
        "regexp"
        "runtime"
@@ -1139,6 +1140,12 @@ func (b *builder) do(root *action) {
        if buildN {
                par = 1
        }
+       var executing uint32
+       go func() {
+               for range time.NewTicker(500*time.Millisecond).C {
+                       fmt.Printf("EXECUTING %v\n", executing)
+               }
+       }()
        for i := 0; i < par; i++ {
                wg.Add(1)
                go func() {
@@ -1154,7 +1161,9 @@ func (b *builder) do(root *action) {
                                        b.exec.Lock()
                                        a := b.ready.pop()
                                        b.exec.Unlock()
+                                       atomic.AddUint32(&executing, 1)
                                        handle(a)
+                                       atomic.AddUint32(&executing, ^uint32(0))
                                case <-interrupted:
                                        setExitStatus(1)
                                        return

This is a port of part of CL 13235046, by dvyukov. Use the critical path method to schedule work. On my laptop (8 logical cores), this CL reduces the wall time for some but not all large commands: go build -a std old=19.0s new=13.2s delta=-30.7% n=50 p=2e-119 go test -short std old=65.3s new=54.4s delta=-16.8% n=25 p=1e-45 all.bash old=318.4s new=317.9s delta=-0.16% n=10 p=0.73 This initial implementation assigns equal weight to all actions. This is a significant oversimplification. Making actions more granular and assigning more accurate weights is future work. However, this should be enough to improve the CPU utilization of the compile-all builder. Note that this causes noticeable stalls in output when building or testing many packages, since the scheduling usually does not correspond to the planned output order. This should improve as scheduling improves. Updates golang#8893. Change-Id: I8fba875a9a56aed132bd6b37b607e374f336728d

rsc · 2015-06-29T03:39:08Z

Too late for Go 1.5.

josharian · 2015-08-25T02:52:02Z

I'm spending what time I have on SSA. Hopefully someone else wants to pick this back up.

josharian · 2015-09-16T23:24:38Z

My WIP CL is 8918.

ALTree · 2015-10-09T11:55:28Z

I fooled around with josharian's CL during a slow evening, I got nowhere but I'll report my findings, just to add another datapoint. This is on a 4-core intel machine.

First of all I applied dvyukov patch (the one that prints the number of currently executing jobs every 500ms), and run go build -a -p 4 std .

Before:

EXECUTING 1
EXECUTING 1
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 3
EXECUTING 1
EXECUTING 1
EXECUTING 1
EXECUTING 4
EXECUTING 1

after:

EXECUTING 1
EXECUTING 1
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 4
EXECUTING 1
EXECUTING 4

the parallelism of the last part of the job is indeed higher.

Unfortunately, as minux and josharian, I wasn't able to observe any significant speedup. For time go build -a -p 4 std there's no statistical difference between tip and the CL. For make.bash I observed a modest 1.5% speedup. I didn't try with all.bash.

rsc · 2015-11-04T17:42:25Z

Thanks for the null results. They're helpful. Deprioritizing.

dvyukov added accepted Performance labels Oct 7, 2014

minux mentioned this issue Dec 11, 2014

cmd/go: unbootstrapped cross compiler doesn't emit helpful error anymore #9272

Closed

bradfitz modified the milestone: Go1.5 Dec 16, 2014

bradfitz removed the release-go1.5 label Dec 16, 2014

rsc removed accepted labels Apr 14, 2015

rsc modified the milestones: Go1.6Early, Go1.5 Jun 29, 2015

josharian self-assigned this Jun 29, 2015

josharian removed their assignment Aug 25, 2015

rsc modified the milestones: Unplanned, Go1.6Early Nov 4, 2015

This was referenced May 14, 2016

cmd/go: compile asm files concurrently #15680

Closed

cmd/go: schedule cgo compilation early #15681

Open

josharian added ToolSpeed and removed Performance labels May 14, 2016

josharian self-assigned this May 14, 2016

josharian mentioned this issue May 18, 2016

cmd/go: build dependent packages as soon as export data is ready #15734

Closed

josharian mentioned this issue Apr 8, 2017

cmd/compile: parallelize compilation #15756

Closed

bcmills mentioned this issue Dec 19, 2019

cmd/go: build the cgo artifacts for each package in parallel #9887

Open

rsc unassigned josharian Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/go: critical path scheduling #8893

cmd/go: critical path scheduling #8893

dvyukov commented Oct 7, 2014

rsc commented Oct 7, 2014

josharian commented Feb 7, 2015

dvyukov commented Feb 7, 2015

josharian commented Feb 7, 2015

dvyukov commented Feb 8, 2015

minux commented Feb 8, 2015 via email

dvyukov commented Feb 8, 2015

josharian commented Mar 17, 2015

bradfitz commented May 7, 2015

josharian commented May 7, 2015

josharian commented May 8, 2015

mwhudson commented May 8, 2015

dvyukov commented May 8, 2015

dvyukov commented May 8, 2015

bradfitz commented May 8, 2015

josharian commented May 8, 2015

josharian commented May 9, 2015

minux commented May 9, 2015 via email

josharian commented May 9, 2015

minux commented May 9, 2015 via email

josharian commented May 9, 2015

minux commented May 9, 2015 via email

dvyukov commented May 10, 2015

josharian commented May 11, 2015

dvyukov commented May 12, 2015

rsc commented Jun 29, 2015

josharian commented Aug 25, 2015

josharian commented Sep 16, 2015

ALTree commented Oct 9, 2015

rsc commented Nov 4, 2015

cmd/go: critical path scheduling #8893

cmd/go: critical path scheduling #8893

Comments

dvyukov commented Oct 7, 2014

rsc commented Oct 7, 2014

josharian commented Feb 7, 2015

dvyukov commented Feb 7, 2015

josharian commented Feb 7, 2015

dvyukov commented Feb 8, 2015

minux commented Feb 8, 2015 via email

dvyukov commented Feb 8, 2015

josharian commented Mar 17, 2015

bradfitz commented May 7, 2015

josharian commented May 7, 2015

josharian commented May 8, 2015

mwhudson commented May 8, 2015

dvyukov commented May 8, 2015

dvyukov commented May 8, 2015

bradfitz commented May 8, 2015

josharian commented May 8, 2015

josharian commented May 9, 2015

minux commented May 9, 2015 via email

josharian commented May 9, 2015

minux commented May 9, 2015 via email

josharian commented May 9, 2015

minux commented May 9, 2015 via email

dvyukov commented May 10, 2015

josharian commented May 11, 2015

dvyukov commented May 12, 2015

rsc commented Jun 29, 2015

josharian commented Aug 25, 2015

josharian commented Sep 16, 2015

ALTree commented Oct 9, 2015

rsc commented Nov 4, 2015