Take command line arguments into account when doing incremental compilation #33727

michaelwoerister · 2016-05-18T22:15:16Z

Since command line arguments to the compiler influence the code generated, changing them should also purge affected artifacts from the cache. It might be a good idea to treat them just like HIR or Metadata nodes and thus make them part of the regular dep-graph.

cfg settings are a special case. Since they will affect the HIR being generated, they need not be considered for incr. comp. However, it might make sense to have separate caches for different sets of cfg flags, iff we want to support these being changed frequently without having to recompile a lot. Although that might be a negligible use case.

A first step to implementing this is probably taking a survey of all command line options and seeing what they might affect.

The text was updated successfully, but these errors were encountered:

michaelwoerister · 2016-05-18T22:15:30Z

cc @nikomatsakis

jonas-schievink · 2016-06-02T21:17:33Z

I've taken an initial survey of all command line arguments for rustc. Feel free to correct any mistakes, as I'm sure there will be a few.

Options marked with ✔️ affect incr. comp.: When those options are added/removed between runs, everything should be recompiled.
Options marked with ✖️ have no effect on the existing incr. comp. infrastructure (eg. they only change HIR, or only affect linking).
Options marked as 💥 produce output depending on the whole crate (similar to lints). Their output can either be cached and replayed between runs (when their output is known to only change depending on a subset of the whole crate -- like an item (for lints) or the resulting codegen unit), or they require disabling incr. comp. as soon as they're used.

Option	Description	Affects incr. comp.
`--cfg SPEC`	Configure the compilation environment	✖️ (see above)
`-L [KIND=]PATH`	Add a directory to the library search path	❓ (affects which crates we'll link against, not sure what else)
`-l [KIND=]NAME`	Link the generated crate(s) to the specified native library NAME.	✖️ Only affects linking
`--crate-type [bin	lib	...]`
`--crate-name NAME`	Specify the name of the crate being built	❓ Technically it shouldn't, but the crate name is part of symbol names, so everything will be rebuilt IIUC
`-o FILENAME`	Write output to	✖️ Should just be a matter of executing the linker again
`--out-dir DIR`	Write output to compiler-chosen filename in	✖️ Same as above
`--explain OPT`	Provide a detailed explanation of an error message	✖️
`--test`	Build a test harness	✖️ Should only affect HIR
`--target TARGET`	Target triple for which the code is compiled	✔️ Every target should get its own incr. comp. directory, really.
Any lint options		💥 The resulting warnings can change, and with `--deny <lint>` compilation might even start to fail
`-V --version`	Print version info and exit	✖️
`-v --verbose`	Use verbose output	✖️ Doesn't use anything interesting
`--extern NAME=PATH`	Specify where an external rust library is located	Should behave like `-L`
`--sysroot PATH`	Override the system root	Should behave like `-L`
`--color auto	always	never`
`-Car=val`	tool to assemble archives with	✖️ Only used to build the final artifact
`-Clinker=val`	system linker to link outputs with	✖️ Like `-Car`
`-Clink-args=val`	extra arguments to pass to the linker	✖️
`-Clink-dead-code`	don't let linker strip dead code	✖️
`-Clto`	perform LLVM link-time optimizations	✖️
`-Ctarget-cpu=val`	select target processor	✔️ Affects all codegen, similar to `--target`
`-Ctarget-feature=val`	target specific attributes	✔️ Ditto
`-Cpasses=val`	a list of extra LLVM passes to run	✔️ Since we're not saving unoptimized LLVM IR, we need to rebuild everything
`-Cllvm-args=val`	a list of arguments to pass to llvm	✔️ Superset of `-Cpasses` (I think)
`-Csave-temps`	save all temporary output files during compilation	✖️ I think when we're saving temps, we'll do so for each CGU, which incr. comp. can handle
`-Crpath`	set rpath values in libs/exes	✖️ I think this is just a linker arg
`-Cno-prepopulate-passes`	don't pre-populate the pass manager with a list of passes	✔️ Can change all LLVM output
`-Cno-vectorize-loops`	don't run the loop vectorization optimization passes	✔️ Can change all LLVM output
`-Cno-vectorize-slp`	don't run LLVM's SLP vectorization pass	✔️ Ditto
`-Csoft-float`	generate software floating point library calls	✔️ Ditto
`-Cprefer-dynamic`	prefer dynamic linking to static linking	✖️ Should just be a different linker invocation
`-Cno-integrated-as`	use an external assembler rather than LLVM's integrated one	✔️ Requires rebuilding everything with the external assembler
`-Cno-redzone=val`	disable the use of the redzone	✔️ Global codegen change
`-Crelocation-model=val`	choose the relocation model to use	✔️ Probably also a global codegen change
`-Ccode-model=val`	choose the code model to use	✔️ Not too sure, but sounds like a global codegen change
`-Cmetadata=val`	metadata to mangle symbol names with	✖️ Used for SVH so we'll recompile automatically
`-Cextra-filename=val`	extra data to put in each output filename	✖️ Only relevant for the final artifact
`-Ccodegen-units=val`	divide crate into N units to optimize in parallel	✖️ Incr. comp. will divide into many units anyways, whether they're compiled in parallel shouldn't affect the output
`-Cremark=val`	print remarks for these optimization passes	✔️ We should probably recompile everything when this is passed
`-Cno-stack-check`	disable checks for stack exhaustion	✔️ Global codegen change (but does this even do anything today?)
`-Cdebuginfo=val`	debug info emission level	✔️ Global codegen change
`-Copt-level=val`	optimize with possible levels 0-3	✔️ Global codegen change
`-Cdebug-assertions=val`	explicitly enable the cfg(debug_assertions) directive	✖️ Changes the HIR we work with, so this is already handled
`-Cinline-threshold=val`	set the inlining threshold for	✔️ Global codegen/optimization change
`-Cpanic=val`	panic strategy to compile crate with	✔️ Changes whether we emit landing pads
`-Zverbose`	in general, enable more debug printouts	✖️ Like `-v`
`-Ztime-passes`	measure time of each rustc pass	✖️ Benchmark option, meant to change with incr. comp.
`-Zcount-llvm-insns`	count where LLVM instrs originate	✖️ Ditto
`-Ztime-llvm-passes`	measure time of each LLVM pass	✖️ Ditto
`-Zinput-stats`	gather statistics about the input	✖️ Runs before and shortly after expansion
`-Ztrans-stats`	gather trans statistics	✖️
`-Zasm-comments`	generate comments into the assembly	✔️ Changes the whole output
`-Zno-verify`	skip LLVM verification	✔️
`-Zborrowck-stats`	gather borrowck statistics	💥
`-Zno-landing-pads`	omit landing pads for unwinding	✔️ Superseded by and similar to `-Cpanic`
`-Zdebug-llvm`	enable debug output from LLVM	💥
`-Zcount-type-sizes`	count the sizes of aggregate types	💥
`-Zmeta-stats`	gather metadata statistics	💥 (?)
`-Zprint-link-args`	print the arguments passed to the linker	✖️
`-Zgc`	garbage collect shared data (experimental)	❓ no clue what this does
`-Zprint-llvm-passes`	prints the llvm optimization passes being run	❓
`-Zast-json`	print the AST as JSON and halt	✖️ Exits before incr. comp.
`-Zast-json-noexpand`	print the pre-expansion AST as JSON and halt	✖️ Ditto
`-Zls`	list the symbols defined by a library crate	❓
`-Zsave-analysis`	write syntax and type analysis (in JSON format) information in addition to normal output	❓ Not handled by current incr. comp. design
`-Zsave-analysis-csv`	write syntax and type analysis (in CSV format) information in addition to normal output	Like above
`-Zprint-move-fragments`	print out move-fragment data for every fn	❓ Are move fragments calculated incrementally?
`-Zflowgraph-*`	...	❓ Is the flowgraph constructed incrementally?
`-Zprint-region-graph`	prints region inference graph.	❓ Like above
`-Zparse-only`	parse only; do not compile, assemble, or link	✖️
`-Zno-trans`	run all passes except translation; no output	✖️
`-Ztreat-err-as-bug`	treat all errors that occur as bugs	✖️ Errors stop compilation one way or the other
`-Zcontinue-parse-after-error`	attempt to recover from parse errors	✖️
`-Zincremental=val`	enable incremental compilation	uuuh...
`-Zdump-dep-graph`	dump the dependency graph	✖️ We always build the dep graph for incr. comp.
`-Zquery-dep-graph`	enable queries of the dependency graph for regression testing	✖️ This causes rustc to build the dep graph, but we're doing that anyway for incr. comp.
`-Zno-analysis`	parse and expand the source, but run no analysis	✖️ Stops before incr. comp. can do its magic
`-Zextra-plugins=val`	load extra plugins	❓ Loading extra lints should integrate them with rustc lints
`-Zprint-enum-sizes`	print the size of enums and their variants	❓ (spoiler: I'm working on a PR that will get rid of this)
`-Zforce-overflow-checks=val`	force overflow checks on or off	✔️
`-Zforce-dropflag-checks=val`	force drop flag checks on or off	✔️ Codegen chance
`-Ztrace-macros`	for every macro invocation, print its name and arguments	✖️
`-Zenable-nonzeroing-move-hints`	force nonzeroing move optimization on	✔️ Codegen change
`-Zkeep-mtwt-tables`	don't clear the resolution tables after analysis	✖️
`-Zkeep-ast`	keep the AST after lowering it to HIR	✖️
`-Zshow-span=val`	show spans for compiler debugging	✖️
`-Zprint-trans-items=val`	print the result of the translation item collection pass	❓ Can currently change the trans item collection mode, which seems problematic with incr. comp.
`-Zmir-opt-level=val`	set the MIR optimization level	✔️
`-Zdump-mir=val`	dump MIR state at various points in translation	💥 When compiling incrementally, MIR might not go through all these "points"
`-Zorbit`	get MIR where it belongs	✔️ Changes the translation backend (codegen change)

Sorry for this wall of text, rustc has more options than I remember 😅

michaelwoerister · 2016-07-20T12:49:44Z

@jonas-schievink Wow, thanks a lot for this listing! That's very impressive :)

michaelwoerister · 2016-07-21T10:08:47Z

OK, I see two basic approaches for implementing this:

Conservative - If there is a change in a commandline argument that might affect any cached artifact, delete the whole cache.
DepGraph-based - Commandline arguments are represented as nodes in the dependency graph. The regular dependency system takes care of deleting invalidated cache entries.

Advantages of the Conservative Approach

It is easy to implement
It is pretty fool-proof, in part due to its simplicity
As long as we just cache object files, it is not more wasteful than a more fine-grained approach

Disadvantages of the Conservative Approach

Once we start caching things like MIR, this approach will delete more things from the cache than necessary

Advantages of the DepGraph-based Approach

It is more accurate than the conservative approach, allowing, for example, to keep cached MIR around if only things affecting codegen have changed
It is building on the existing infrastructure

Disadvantages of the DepGraph-based Approach

It is more complicated than the conservative approach, needing more changes to existing code
It might lead to breaking changes for plugins and tools using rustc as a library
It is less fool-proof because information can leak out of the tracking system

Initially I thought that the DepGraph-based approach would be the natural way to do it, but now I'm thinking that it might be better to go the conservative route, at least in the beginning. It reduces the risk for bugs and, at least initially, should have no negative effect on performance. I'd be interested in what others think here.

Conservative Approach: Implementation
My proposal for implementing the conservative approach would be to divide all command-line options into the following categories:

RELEVANT potential effect on output (~ ✔️ in @jonas-schievink's listing)
IRRELEVANT no influence, completely ignored (~ ✖️)
INCOMPATIBLE incompatible (~ 💥)

With these categories defined, run the following algorithm:

If the set of arguments contains any that match INCOMPATIBLE, warn about the negative effect or abort if necessary
Take the subset of arguments that match RELEVANT and compare it to the RELEVANT set that was used to build the cache. If there is a difference, delete all cached artifacts.
Store the current RELEVANT arguments for later comparison.

I was also thinking about sub-dividing the incr. comp. directory, depending on given set of RELEVANT arguments like target triple, optimization level, and debuginfo. But now I'm thinking that it's better left to cargo and build systems to choose a different incr. comp. directory for different configurations. What does the @rust-lang/tools think about that?

alexcrichton · 2016-07-21T17:42:26Z

Makes sense to me to take the conservative approach to start out at least, and the proposed strategy here seems reasonable to me. I agree that Cargo should probably deal with the separate incremental compilation directories, it'll just have one for debug/release mode. Additionally in practice the arguments shouldn't change much unless the cache actually needs to be invalidated because Cargo'll pass the same arguments, so the conservative approach I think will pan out well enough.

One question I'd have is how we protect against new flags being added to the compiler? Would we ensure that all options are tagged with relevant/irrelevant to incremental compilation, or would we just have whitelist of "irrelevant" options you'd have to add to if it's relevant?

michaelwoerister · 2016-07-21T18:03:14Z

I prototyped the DepGraph approach the other day and there I extended the options! macro in sess::config to require that one specifies the DepNode that the option should be mapped to. Something similar (but simpler) can also be done here, I guess. A whitelist seems good too.

On July 21, 2016 7:42:41 PM GMT+02:00, Alex Crichton notifications@github.com wrote:

Makes sense to me to take the conservative approach to start out at
least, and the proposed strategy here seems reasonable to me. I agree
that Cargo should probably deal with the separate incremental
compilation directories, it'll just have one for debug/release mode.
Additionally in practice the arguments shouldn't change much unless the
cache actually needs to be invalidated because Cargo'll pass the same
arguments, so the conservative approach I think will pan out well
enough.

One question I'd have is how we protect against new flags being added
to the compiler? Would we ensure that all options are tagged with
relevant/irrelevant to incremental compilation, or would we just have
whitelist of "irrelevant" options you'd have to add to if it's
relevant?

You are receiving this because you were assigned.
Reply to this email directly or view it on GitHub:
#33727 (comment)

nikomatsakis · 2016-07-26T14:09:10Z

@michaelwoerister so the simplest thing I can imagine is just to hash the options and encode that in the serialized state. When we re-load the state, we could compare against our current options and just choose to ignore everything if the options changed. But I guess if you've worked through some of the dep-graph approach that'd be that much better.

michaelwoerister · 2016-07-26T19:05:08Z

@nikomatsakis Prototyping the DepGraph approach was mostly to get a feel for what it would look like -- and subsequently lead me to prefer the simpler approach. What you describe is pretty much how I intend to implement it, just with filtering out those options before hashing that should not have any influence.

…sakis Take commandline arguments into account for incr. comp. Implements the conservative strategy described in #33727. From now one, every time a new commandline option is added, one has to specify if it influences the incremental compilation cache. I've tried to implement this as automatic as possible: One just has to added either the `[TRACKED]` or the `[UNTRACKED]` marker next to the field. The `Options`, `CodegenOptions`, and `DebuggingOptions` definitions in `session::config` show plenty of examples. The PR removes some cruft from `session::config::Options`, mostly unnecessary copies of flags also present in `DebuggingOptions` or `CodeGenOptions` in the same struct. One notable removal is the `cfg` field that contained the values passed via `--cfg` commandline arguments. I chose to remove it because (1) its content is only a subset of what later is stored in `hir::Crate::config` and it's pretty likely that reading the cfgs from `Options` would not be what you wanted, and (2) we could not incorporate it into the dep-tracking hash of the `Options` struct because of how the test framework works, leaving us with a piece of untracked but vital data. It is now recommended (just as before) to access the crate config via the `krate()` method in the HIR map. Because the `cfg` field is not present in the `Options` struct any more, some methods in the `CompilerCalls` trait now take the crate config as an explicit parameter -- which might constitute a breaking change for plugin authors.

michaelwoerister · 2016-08-16T15:42:21Z

#35340 has landed, closing :D

…sakis Take commandline arguments into account for incr. comp. Implements the conservative strategy described in rust-lang/rust#33727. From now one, every time a new commandline option is added, one has to specify if it influences the incremental compilation cache. I've tried to implement this as automatic as possible: One just has to added either the `[TRACKED]` or the `[UNTRACKED]` marker next to the field. The `Options`, `CodegenOptions`, and `DebuggingOptions` definitions in `session::config` show plenty of examples. The PR removes some cruft from `session::config::Options`, mostly unnecessary copies of flags also present in `DebuggingOptions` or `CodeGenOptions` in the same struct. One notable removal is the `cfg` field that contained the values passed via `--cfg` commandline arguments. I chose to remove it because (1) its content is only a subset of what later is stored in `hir::Crate::config` and it's pretty likely that reading the cfgs from `Options` would not be what you wanted, and (2) we could not incorporate it into the dep-tracking hash of the `Options` struct because of how the test framework works, leaving us with a piece of untracked but vital data. It is now recommended (just as before) to access the crate config via the `krate()` method in the HIR map. Because the `cfg` field is not present in the `Options` struct any more, some methods in the `CompilerCalls` trait now take the crate config as an explicit parameter -- which might constitute a breaking change for plugin authors.

michaelwoerister added the A-incr-comp Area: Incremental compilation label May 18, 2016

michaelwoerister self-assigned this Jul 20, 2016

nikomatsakis added this to the Incremental compilation alpha milestone Jul 23, 2016

michaelwoerister mentioned this issue Aug 4, 2016

Take commandline arguments into account for incr. comp. #35340

Merged

michaelwoerister closed this as completed Aug 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take command line arguments into account when doing incremental compilation #33727

Take command line arguments into account when doing incremental compilation #33727

michaelwoerister commented May 18, 2016 •

edited

Loading

michaelwoerister commented May 18, 2016

jonas-schievink commented Jun 2, 2016 •

edited

Loading

michaelwoerister commented Jul 20, 2016

michaelwoerister commented Jul 21, 2016

alexcrichton commented Jul 21, 2016

michaelwoerister commented Jul 21, 2016

nikomatsakis commented Jul 26, 2016

michaelwoerister commented Jul 26, 2016

michaelwoerister commented Aug 16, 2016

Take command line arguments into account when doing incremental compilation #33727

Take command line arguments into account when doing incremental compilation #33727

Comments

michaelwoerister commented May 18, 2016 • edited Loading

michaelwoerister commented May 18, 2016

jonas-schievink commented Jun 2, 2016 • edited Loading

michaelwoerister commented Jul 20, 2016

michaelwoerister commented Jul 21, 2016

alexcrichton commented Jul 21, 2016

michaelwoerister commented Jul 21, 2016

nikomatsakis commented Jul 26, 2016

michaelwoerister commented Jul 26, 2016

michaelwoerister commented Aug 16, 2016

michaelwoerister commented May 18, 2016 •

edited

Loading

jonas-schievink commented Jun 2, 2016 •

edited

Loading