Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: build dependent packages as soon as export data is ready #15734

Closed
josharian opened this issue May 18, 2016 · 9 comments
Closed

cmd/go: build dependent packages as soon as export data is ready #15734

josharian opened this issue May 18, 2016 · 9 comments

Comments

@josharian
Copy link
Contributor

This is a trace of the activity on an 8 core machine running 'go build -a std':

trace_build_std

For those who want to explore more, here is an html version. (Hint: Use a, d, w, and s keys to navigate.)

There are a few early bottlenecks (runtime, reflect, fmt) and a long near linear section at the end (net, crypto/x509, crypto/tls, net/http). Critical path scheduling (#8893) could help some with this, as could scheduling cgo invocations earlier (#15681). This issue is to discuss another proposal that complements those.

We currently wait until a package is finished building before building packages that depend on it. However, dependent packages only need export data, not machine code, to start building. I believe that that could be available once we're done with escape analysis and closure transformation, and before we run walk.

For the bottlenecks listed above:

package time until export data available total compilation time
runtime 226ms 1300ms
reflect 174ms 960ms
fmt 33ms 229ms
net 114ms 846ms
crypto/x509 66ms 253ms
crypto/tls 82ms 461ms
net/http 168ms 1310ms

Though slightly optimistic (writing export data isn't instantaneous), this does suggest that this would in general significantly reduce time spent waiting for dependencies to compile.

This pattern of large, slow, linear dependency chains also shows up in bigger projects, like juju.

@rsc implemented one enabling piece by adding a flag to emit export data separately from machine code.

Remaining work to implement this, and open questions:

  • Emitting export data before walk means that inlined functions would get walked and expanded at use rather than at initial package compilation. Does this matter? If so, an alternative is to change the compiler structure to walk all functions and then compile all functions. Would this increase high water memory mark?
  • How would the compiler signal to cmd/go that it is done emitting export data? I don't know of a clean, simple, portable cross-process semaphore.
  • This would be a pretty major upheaval in how cmd/go schedules builds. Making this work more fine-grained would be useful anyway, but it'd be a lot of high risk change.

Given the scope of the change, I'm marking this as a proposal. I'd love feedback.

@bradfitz
Copy link
Contributor

Love it. Also glad that traceview ended up working out.

How would the compiler signal to cmd/go that it is done emitting export data? I don't know of a clean, simple, portable cross-process semaphore.

Localhost TCP? Later: multi-machine TCP and making cmd/compile and cmd/link use a VFS (and network impls) rather than the os package for file access. Imagining running a large build on a cloud machine and cmd/go spinning up some helper Kubernetes (or whatever) containers to speed the build, then going away when the build is done, paying for them by the number of seconds they were running.

@mdempsky
Copy link
Contributor

How would the compiler signal to cmd/go that it is done emitting export data?

A relatively simple (but local-only) way would be:

  1. Each time cmd/go execs cmd/compile, it creates a new os.Pipe, passes the writing end as one of cmd/compile's ExtraFiles, along with a command line flag like -signalexportdata.
  2. In cmd/compile, if we see -signalexportdata, we close the pipe FD after writing out export data to disk.
  3. Back in cmd/go, when we see the pipe has been closed, we can assume export data is written out.

I also suspect you don't actually need separate export data file functionality for this. Since the export data is at the beginning of the .a file, we can just write a partial .a file, signal cmd/go, and then finish writing later.

@griesemer
Copy link
Contributor

Interesting. I suspect also that the time from start of compilation until the time the export data becomes available will become smaller over time (faster frontend), while the backend may become slower (relatively), due to more powerful optimizations. Seems like a good idea to me.

@mwhudson
Copy link
Contributor

Only vaguely connected random idea: If the export data hasn't changed, can
you skip the compilation of dependent packages?

On 19 May 2016 at 10:16, Robert Griesemer notifications@github.com wrote:

Interesting. I suspect also that the time from start of compilation until
the time the export data becomes available will become smaller over time
(faster frontend), while the backend may become slower (relatively), due to
more powerful optimizations. Seems like a good idea to me.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#15734 (comment)

@mdempsky
Copy link
Contributor

@mwhudson I think so, but that could be done even without this proposal.

@josharian
Copy link
Contributor Author

This proposal also got discussed at #15736. Based on that discussion, I will re-evaluate this proposal once cmd/compile itself is more concurrent.

@mwhudson moved your suggestion to #15752.

@josharian josharian self-assigned this May 19, 2016
@bradfitz bradfitz changed the title proposal: build dependent packages as soon as export data is ready cmd/go: build dependent packages as soon as export data is ready Aug 22, 2016
@bradfitz bradfitz modified the milestones: Unplanned, Proposal Aug 22, 2016
@josharian
Copy link
Contributor Author

@rsc is it safe to assume that this will be more or less independent of your 1.10 cmd/go work?

@rsc
Copy link
Contributor

rsc commented Oct 25, 2017

I'm very skeptical this is worth the complexity. It would require support for "half-completed" actions in the go command where the compile step half-completes early and then fully-completes later. I don't believe the payoff here would be worth the significant increase in complexity. I guess you could run two compiles, so that you generate the export data first and then the whole object second, but that just does more work overall. Even if it improves latency in certain cases, more work overall is a net loss.

Critical path scheduling or just working on making the compiler faster seems like a better use of time.

@rsc
Copy link
Contributor

rsc commented Apr 30, 2019

Especially with good caching I think this is less and less important, and no less complex to implement. Closing, to better reflect our intention not to do this.

@rsc rsc closed this as completed Apr 30, 2019
@golang golang locked and limited conversation to collaborators Apr 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants