Skip to content
This repository has been archived by the owner on Feb 21, 2019. It is now read-only.
Paul Bowen-Huggett edited this page Jun 12, 2018 · 8 revisions

Program Repository: What’s the Idea?

Put simply, the project is about storing a build’s intermediate data and metadata in a database. This change offers a number of benefits:

  • The traditional object-file model doesn’t provide a means to discover data that was emitted for other parts of the same overall build. This forces the tools to pessimistically generate data that may also be present in other object-files. For a large system, the quantity of duplicated data can be very significant: up to 99% and tens of megabytes for some data types.
  • Since the compiler can discover what has already been compiled, it can avoid compiling it again. If a user repeatedly re-compiles a source file with small changes on each iteration (during debugging, for example), then the work of re-compiling the functions that weren’t changed can be avoided entirely.
  • The static linker is frequently on the critical path for builds. Build systems are good at parallelizing compilations but linker forces a “join” since the compilations must be finished before the linker can start. The link time can be a very significant part of the overall build time. It therefore makes sense to move as much work as possible from the linker into the compiler (such as merging string constants) and to design the intermediate format such that the linker’s workload is reduced wherever possible.
  • The database can be used as a common data source for the build. The compiler uses it to record the code and data for each translation unit; the linker’s input comes from it; the debugger can read the debugging metadata directly from it.

You can read an early overview of the work.