diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 43df9bdb5e..9a695c94d6 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -147,6 +147,7 @@ - [Backend Agnostic Codegen](./backend/backend-agnostic.md) - [Implicit Caller Location](./backend/implicit-caller-location.md) - [Profile-guided Optimization](./profile-guided-optimization.md) +- [LLVM Source-Based Code Coverage](./llvm-coverage-instrumentation.md) - [Sanitizers Support](./sanitizers.md) - [Debugging Support in the Rust Compiler](./debugging-support-in-rustc.md) diff --git a/src/img/coverage-branch-counting-01.png b/src/img/coverage-branch-counting-01.png new file mode 100644 index 0000000000..c445f3552a Binary files /dev/null and b/src/img/coverage-branch-counting-01.png differ diff --git a/src/img/coverage-graphviz-01.png b/src/img/coverage-graphviz-01.png new file mode 100644 index 0000000000..cf7e1e76b9 Binary files /dev/null and b/src/img/coverage-graphviz-01.png differ diff --git a/src/img/coverage-spanview-01.png b/src/img/coverage-spanview-01.png new file mode 100644 index 0000000000..f3e4b35355 Binary files /dev/null and b/src/img/coverage-spanview-01.png differ diff --git a/src/img/llvm-cov-show-01.png b/src/img/llvm-cov-show-01.png new file mode 100644 index 0000000000..35f0459434 Binary files /dev/null and b/src/img/llvm-cov-show-01.png differ diff --git a/src/llvm-coverage-instrumentation.md b/src/llvm-coverage-instrumentation.md new file mode 100644 index 0000000000..882991b779 --- /dev/null +++ b/src/llvm-coverage-instrumentation.md @@ -0,0 +1,631 @@ +# LLVM Source-Based Code Coverage + +`rustc` supports detailed source-based code and test coverage analysis +with a command line option (`-Z instrument-coverage`) that instruments Rust +libraries and binaries with additional instructions and data, at compile time. + +The coverage instrumentation injects calls to the LLVM intrinsic instruction +[`llvm.instrprof.increment`][llvm-instrprof-increment] at code branches +(based on a MIR-based control flow analysis), and LLVM converts these to +instructions that increment static counters, when executed. The LLVM coverage +instrumentation also requires a [Coverage Map] that encodes source metadata, +mapping counter IDs--directly and indirectly--to the file locations (with +start and end line and column). + +Rust libraries, with or without coverage instrumentation, can be linked into +instrumented binaries. When the program is executed and cleanly terminates, +LLVM libraries write the final counter values to a file (`default.profraw` or +a custom file set through environment variable `LLVM_PROFILE_FILE`). + +Developers use existing LLVM coverage analysis tools to decode `.profraw` +files, with corresponding Coverage Maps (from matching binaries that produced +them), and generate various reports for analysis, for example: + +Screenshot of sample `llvm-cov show` result, for function add_quoted_string +
+
+ +Detailed instructions and examples are documented in the +[Rust Unstable Book (under _source-based-code-coverage_)][unstable-book-sbcc]. + +[llvm-instrprof-increment]: https://llvm.org/docs/LangRef.html#llvm-instrprof-increment-intrinsic +[Coverage Map]: https://llvm.org/docs/CoverageMappingFormat.html +[unstable-book-sbcc]: https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/source-based-code-coverage.html + +### Rust symbol mangling + +`-Z instrument-coverage` automatically enables Rust symbol mangling `v0` (as +if the user specified `-Z symbol-mangling-version=v0` option when invoking +`rustc`) to ensure consistent and reversible name mangling. This has two +important benefits: + +1. LLVM coverage tools can analyze coverage over multiple runs, including some + changes to source code; so mangled names must be consistent across compilations. +2. LLVM coverage reports can report coverage by function, and even separates + out the coverage counts of each unique instantiation of a generic function, + if invoked with multiple type substitution variations. + +## Components of LLVM Coverage Instrumentation in `rustc` + +### LLVM Runtime Dependency + +Coverage data is only generated by running the executable Rust program. `rustc` +statically links coverage-instrumented binaries with LLVM runtime code +([compiler-rt][compiler-rt-profile]) that implements program hooks +(such as an `exit` hook) to write the counter values to the `.profraw` file. + +In the `rustc` source tree, `library/profiler_builtins` bundles the LLVM +`compiler-rt` code into a Rust library crate. (When building `rustc`, the +`profiler_builtins` library is only included when `profiler = true` is set +in `rustc`'s `config.toml`.) + +When compiling with `-Z instrument-coverage`, +[`CrateLoader::postprocess()`][crate-loader-postprocess] dynamically loads the +`profiler_builtins` library by calling `inject_profiler_runtime()`. + +[compiler-rt-profile]: https://github.com/llvm/llvm-project/tree/master/compiler-rt/lib/profile +[crate-loader-postprocess]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CrateLoader.html#method.postprocess + +### MIR Pass: `InstrumentCoverage` + +Coverage instrumentation is performed on the MIR with a [MIR pass][mir-passes] +called [`InstrumentCoverage`][mir-instrument-coverage]. This MIR pass analyzes +the control flow graph (CFG)--represented by MIR `BasicBlock`s--to identify +code branches, and injects additional [`Coverage`][coverage-statement] +statements into the `BasicBlock`s. + +A MIR `Coverage` statement is a virtual instruction that indicates a counter +should be incremented when its adjacent statemeents are executed, to count +a span of code ([`CodeRegion`][code-region]). It counts the number of times a +branch is executed, and also specifies the exact location of that code span in +the Rust source code. + +Note that many of these `Coverage` statements will *not* be converted into +physical counters (or any other executable instructions) in the final binary. +Some of them will be (see `CoverageKind::`[`Counter`][counter-coverage-kind]), +but other counters can be computed on the fly, when generating a coverage +report, by mapping a `CodeRegion` to a +`CoverageKind`::[`Expression`][expression-coverage-kind]. + +As an example: + +```rust +fn some_func(flag: bool) { + // increment Counter(1) + ... + if flag { + // increment Counter(2) + ... + } else { + // count = Expression(1) = Counter(1) - Counter(2) + ... + } + // count = Expression(2) = Counter(1) + Zero + // or, alternatively, Expression(2) = Counter(2) + Expression(1) + ... +} +``` + +In this example, four contiguous code regions are counted while only +incrementing two counters. + +CFG analysis is used to not only determine *where* the branches are, for +conditional expressions like `if`, `else`, `match`, and `loop`, but also to +determine where expressions can be used in place of physical counters. + +The advantages of optimizing coverage through expressions are more pronounced +with loops. Loops generally include at least one conditional branch that +determines when to break out of a loop (a `while` condition, or an `if` or +`match` with a `break`). In MIR, this is typically lowered to a `SwitchInt`, +with one branch to stay in the loop, and another branch to break out of the +loop. The branch that breaks out will almost always execute less often, +so `InstrumentCoverage` chooses to add a `Counter` to that branch, and an +`Expression(continue) = Counter(loop) - Counter(break)` to the branch that +continues. + +The `InstrumentCoverage` MIR pass is documented in +[more detail below][instrument-coverage-pass-details]. + +[mir-passes]: mir/passes.md +[mir-instrument-coverage]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_mir/src/transform/coverage +[code-region]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/struct.CodeRegion.html +[counter-coverage-kind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/enum.CoverageKind.html#variant.Counter +[expression-coverage-kind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/enum.CoverageKind.html#variant.Expression +[coverage-statement]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.StatementKind.html#variant.Coverage +[instrument-coverage-pass-details]: #implementation-details-of-the-instrumentcoverage-mir-pass + +### Counter Injection and Coverage Map Pre-staging + +When the compiler enters [the Codegen phase][backend-lowering-mir], with a +coverage-enabled MIR, [`codegen_statement()`][codegen-statement] converts each +MIR `Statement` into some backend-specific action or instruction. +`codegen_statement()` forwards `Coverage` statements to +[`codegen_coverage()`][codegen-coverage]: + +```rust + pub fn codegen_statement(&mut self, mut bx: Bx, statement: &mir::Statement<'tcx>) -> Bx { + ... + match statement.kind { + ... + mir::StatementKind::Coverage(box ref coverage) => { + self.codegen_coverage(&mut bx, coverage.clone()); + bx + } +``` + + +`codegen_coverage()` handles each `CoverageKind` as follows: + +* For all `CoverageKind`s, Coverage data (counter ID, expression equation + and ID, and code regions) are passed to the backend's `Builder`, to + populate data structures that will be used to generate the crate's + "Coverage Map". (See the [`FunctionCoverage`][function-coverage] `struct`.) +* For `CoverageKind::Counter`s, an instruction is injected in the backend + IR to increment the physical counter, by calling the `BuilderMethod` + [`instrprof_increment()`][instrprof-increment]. + +```rust + pub fn codegen_coverage(&self, bx: &mut Bx, coverage: Coverage) { + let Coverage { kind, code_region } = coverage; + match kind { + CoverageKind::Counter { function_source_hash, id } => { + if let Some(code_region) = code_region { + bx.add_coverage_counter(self.instance, id, code_region); + } + ... + bx.instrprof_increment(fn_name, hash, num_counters, index); + } + CoverageKind::Expression { id, lhs, op, rhs } => { + bx.add_coverage_counter_expression(self.instance, id, lhs, op, rhs, code_region); + } + CoverageKind::Unreachable => { + ... +``` +_[code snippet trimmed for brevity]_ + +> The function name `instrprof_increment()` is taken from the LLVM intrinsic +call of the same name ([`llvm.instrprof.increment`][llvm-instrprof-increment]), +and uses the same arguments and types; but note that, up to and through this +stage (even though modeled after LLVM's implementation for code coverage +instrumentation), the data and instructions are not strictly LLVM-specific. +> +> But since LLVM is the only Rust-supported backend with the tooling to +process this form of coverage instrumentation, the backend for `Coverage` +statements is only implemented for LLVM, at this time. + +[backend-lowering-mir]: backend/lowering-mir.md +[codegen-statement]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.codegen_statement +[codegen-coverage]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.codegen_coverage +[function-coverage]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/coverageinfo/map/struct.FunctionCoverage.html +[instrprof-increment]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/trait.BuilderMethods.html#tymethod.instrprof_increment + +### Coverage Map Generation + +With the instructions to increment counters now implemented in LLVM IR, +the last remaining step is to inject the LLVM IR variables that hold the +static data for the coverage map. + +`rustc_codegen_llvm`'s [`compile_codegen_unit()`][compile-codegen-unit] calls +[`coverageinfo_finalze()`][coverageinfo-finalize], +which delegates its implementation to the +[`rustc_codegen_llvm::coverageinfo::mapgen`][mapgen-finalize] module. + +For each function `Instance` (code-generated from MIR, including multiple +instances of the same MIR for generic functions that have different type +substitution combinations), `mapgen`'s `finalize()` method queries the +`Instance`-associated `FunctionCoverage` for its `Counter`s, `Expression`s, +and `CodeRegion`s; and calls LLVM codegen APIs to generate +properly-configured variables in LLVM IR, according to very specific +details of the [_LLVM Coverage Mapping Format_][coverage-mapping-format] +(Version 4).[^llvm-and-covmap-versions] + +[^llvm-and-covmap-versions]: The Rust compiler currently supports +_LLVM Coverage Mapping Format_ Version 4 (the most up-to-date version +of the format, at the time of this writing) for improved compatibility +with other LLVM-based compilers (like _Clang_), and to take advantage +of some format optimizations. Version 4 was introduced in _LLVM 11_, +which is currently the default LLVM version for Rust. Note that the +Rust compiler optionally supports some earlier LLVM versions, +prior to _LLVM 11_. If `rustc` is configured to use an incompatible +version of LLVM, compiling with `-Z instrument-coverage` will +generate an error message. + +```rust +pub fn finalize<'ll, 'tcx>(cx: &CodegenCx<'ll, 'tcx>) { + let mut function_coverage_map = match cx.coverage_context() { + Some(ctx) => ctx.take_function_coverage_map(), + None => return, + }; + ... + add_unreachable_coverage(tcx, &mut function_coverage_map); + + let mut mapgen = CoverageMapGenerator::new(); + + for (instance, function_coverage) in function_coverage_map { + ... + let coverage_mapping_buffer = llvm::build_byte_buffer(|coverage_mapping_buffer| { + mapgen.write_coverage_mapping(expressions, counter_regions, coverage_mapping_buffer); + }); +``` +_[code snippet trimmed for brevity]_ + +One noteable step, performed by `mapgen::finalize()` before processing the +`Instance`s and their `FunctionCoverage`s, is the call to +[`add_unreachable_functions()`][add-unreachable-coverage]. + +When finalizing the coverage map, `FunctionCoverage` only has the `CodeRegion`s and counters for +the functions that went through codegen; such as public functions and "used" functions +(functions referenced by other "used" or public items). Any other functions (considered unused +or "Unreachable") were still parsed and processed through the MIR stage. + +The set of unreachable functions is computed via the set difference of all MIR +`DefId`s (`tcx` query `mir_keys`) minus the codegenned `DefId`s +(`tcx` query `collect_and_partition_mono_items`). `add_unreachable_functions()` +computes the set of unreachable functions, queries the `tcx` for the +previously-computed `CodeRegions`, for each unreachabe MIR, and adds those code +regions to one of the non-generic codegenned functions (non-generic avoids +potentially injecting the unreachable coverage multiple times for multiple +instantiations). + +[compile-codegen-unit]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/base/fn.compile_codegen_unit.html +[coverageinfo-finalize]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/context/struct.CodegenCx.html#method.coverageinfo_finalize +[mapgen-finalize]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/coverageinfo/mapgen/fn.finalize.html +[coverage-mapping-format]: https://llvm.org/docs/CoverageMappingFormat.html +[add-unreachable-coverage]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/coverageinfo/mapgen/fn.add_unreachable_coverage.html + +## Testing LLVM Coverage + +Coverage instrumentation in the MIR is validated by a `mir-opt` test: +[`instrument-coverage`][mir-opt-test]. + +More complete testing of end-to-end coverage instrumentation and reports are done +in the `run-make-fulldeps` tests, with sample Rust programs (to be instrumented) +in the [`coverage`][coverage-test-samples] directory, and the actual tests and expected +results in [`coverage-reports`]. + +In addition to testing the final result, two intermediate results are also validated +to catch potential regression errors early: Minimum `CoverageSpan`s computed during +the `InstrumentCoverage` MIR pass are saved in `mir_dump` [Spanview][spanview-debugging] +files and compared to expected results in [`coverage-spanview`]. + +Finally, the [`coverage-llvmir`] test compares compiles a simple Rust program with +`-Z instrument-coverage` and compares the compiled program's LLVM IR to expected +LLVM IR instructions and structured data for a coverage-enabled program, including +various checks for Coverage Map-related metadata and the LLVM intrinsic calls to +increment the runtime counters. + +Expected results for both the `mir-opt` tests and the `coverage*` tests under +`run-make-fulldeps` can be refreshed by running: + +```shell +$ ./x.py test src/test/ --blessed +``` + +[mir-opt-test]: https://github.com/rust-lang/rust/blob/master/src/test/mir-opt/instrument_coverage.rs +[coverage-test-samples]: https://github.com/rust-lang/rust/tree/master/src/test/run-make-fulldeps/coverage +[`coverage-reports`]: https://github.com/rust-lang/rust/tree/master/src/test/run-make-fulldeps/coverage-reports +[`coverage-spanview`]: https://github.com/rust-lang/rust/tree/master/src/test/run-make-fulldeps/coverage-spanview +[spanview-debugging]: compiler-debugging.md#viewing-spanview-output +[`coverage-llvmir`]: https://github.com/rust-lang/rust/tree/master/src/test/run-make-fulldeps/coverage-llvmir + +## Implementation Details of the `InstrumentCoverage` MIR Pass + +The bulk of the implementation of the `InstrumentCoverage` MIR pass is performed +by the [`Instrumentor`][instrumentor]. For each MIR (each non-const, non-inlined +function, generic, or closure), the `Instrumentor`'s constructor prepares a +[`CoverageGraph`][coverage-graph] and then executes +[`inject_counters()`][inject-counters]. + +```rust + Instrumentor::new(&self.name(), tcx, mir_body).inject_counters(); +``` + +The `CoverageGraph` is a coverage-specific simplification of the MIR control +flow graph (CFG). Its nodes are [`BasicCoverageBlock`s][bcb], which +encompass one or more sequentially-executed MIR `BasicBlock`s +(with no internal branching), plus a `CoverageKind` counter (to +be added, via coverage analysis), and an optional set of additional counters +to count incoming edges (if there are more than one). + +The `Instrumentor`'s `inject_counters()` uses the `CoverageGraph` to +compute the best places to inject coverage counters, as MIR `Statement`s, +with the following steps: + +1. Depending on the debugging configurations in `rustc`'s, `config.toml`, + and `rustc` command line flags, various debugging features may be enabled + to enhance `debug!()` messages in logs, and to generate various "dump" files, + to help developers understand the MIR transformation process for coverage. + Most of the debugging features are implemented in the [`debug`][debug] + sub-module. +2. [`generate_coverage_spans()`][generate-coverage-spans] computes the minimum set of distinct, + non-branching code regions, from the MIR. These `CoverageSpan`s + represent a span of code that must be counted. +3. [`make_bcb_counters()`][make-bcb-counters] generates `CoverageKind::Counter`s and + `CoverageKind::Expression`s for each `CoverageSpan`, plus additional + `intermediate_expressions`[^intermediate-expressions], not associated with any `CodeRegion`, but + are required to compute a final `Expression` value for a `CodeRegion`. +4. Inject the new counters into the MIR, as new `StatementKind::Coverage` + statements. This is done by three distinct functions: + - `inject_coverage_span_counters()` + - `inject_indirect_counters()` + - `inject_intermediate_expression()`, called for each intermediate expression + returned from `make_bcb_counters()` + +[^intermediate-expressions]: Intermediate expressions are sometimes required +because `Expression`s are limited to binary additions or subtractions. For +example, `A + (B - C)` might represent an `Expression` count computed from three +other counters, `A`, `B`, and `C`, but computing that value requires an +intermediate expression for `B - C`. + +[instrumentor]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/struct.Instrumentor.html +[coverage-graph]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/graph/struct.CoverageGraph.html +[inject-counters]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/struct.Instrumentor.html#method.inject_counters +[bcb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/graph/struct.BasicCoverageBlock.html +[debug]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/debug +[generate-coverage-spans]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/spans/struct.CoverageSpans.html#method.generate_coverage_spans +[make-bcb-counters]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/counters/struct.BcbCounters.html#method.make_bcb_counters + +### The `CoverageGraph` + +The [`CoverageGraph`][coverage-graph] is derived from the MIR (`mir::Body`). + +```rust + let basic_coverage_blocks = CoverageGraph::from_mir(mir_body); +``` + +Like `mir::Body`, the `CoverageGraph` is also a +[`DirectedGraph`][directed-graph]. Both graphs represent the function's +fundamental control flow, with many of the same +[`graph trait`][graph-traits]s, supporting `start_node()`, `num_nodes()`, +`successors()`, `predecessors()`, and `is_dominated_by()`. + +For anyone that knows how to work with the [MIR, as a CFG][mir-dev-guide], the +`CoverageGraph` will be familiar, and can be used in much the same way. +The nodes of the `CoverageGraph` are `BasicCoverageBlock`s (BCBs), which +index into an `IndexVec` of `BasicCoverageBlockData`. This is analagous +to the MIR CFG of `BasicBlock`s that index `BasicBlockData`. + +Each `BasicCoverageBlockData` captures one or more MIR `BasicBlock`s, +exclusively, and represents the maximal-length sequence of `BasicBlocks` +without conditional branches. + +[`compute_basic_coverage_blocks()`][compute-basic-coverage-blocks] builds the +`CoverageGraph` as a coverage-specific simplification of the MIR CFG. In +contrast with the [`SimplifyCfg`][simplify-cfg] MIR pass, this step does +not alter the MIR itself, because the `CoverageGraph` aggressively simplifies +the CFG, and ignores nodes that are not relevant to coverage. For example: + + * The BCB CFG ignores (excludes) branches considered not relevant + to the current coverage solution. It excludes unwind-related code[^78544] + that is injected by the Rust compiler but has no physical source + code to count, which allows a `Call`-terminated BasicBlock + to be merged with its successor, within a single BCB. + * A `Goto`-terminated `BasicBlock` can be merged with its successor + ***as long as*** it has the only incoming edge to the successor `BasicBlock`. + * Some BasicBlock terminators support Rust-specific concerns--like borrow-checking--that are + not relevant to coverage analysis. `FalseUnwind`, for example, can be treated the same as + a `Goto` (potentially merged with its successor into the same BCB). + +[^78544]: (Note, however, that Issue [#78544][rust-lang/rust#78544] considers +providing future support for coverage of programs that intentionally +`panic`, as an option, with some non-trivial cost.) + +The BCB CFG is critical to simplifying the coverage analysis by ensuring graph path-based +queries (`is_dominated_by()`, `predecessors`, `successors`, etc.) have branch (control flow) +significance. + +To visualize the `CoverageGraph`, you can generate a _graphviz_ `*.dot` +file with the following `rustc` flags:[^graphviz-dark-mode] + +[^graphviz-dark-mode]: This image also applies `-Z graphviz-dark-mode`, to +produce a Graphviz document with "dark mode" styling. If you use a dark mode or +theme in your development environment, you will probably want to use this +option so you can review the graphviz output without straining your vision. + +```shell +$ rustc -Z instrument-coverage -Z dump-mir=InstrumentCoverage \ + -Z dump-mir-graphviz some_rust_source.rs +``` + +The `-Z dump-mir` flag requests [MIR debugging +output][mir-debugging] (generating `*.mir` files, by default). +`-Z dump-mir-graphviz` additionally generates `*.dot` files. +`-Z dump-mir=InstrumentCoverage` restricts these files to the +`InstrumentCoverage` pass. All files are written to the `./mir_dump/` +directory, by default. + +Files with names ending in `.-------.InstrumentCoverage.0.dot` contain the +_graphviz_ representations of a `CoverageGraph` (one for each MIR, that is, +for each function and closure): + +cropped image of a sample CoverageGraph in graphviz format +
+
+ +This image shows each `BasicCoverageBlock` as a rectangular _node_, with +directional edges (the arrows) leading from each node to its `successors()`. +The nodes contain information in sections: + +1. The gray header has a label showing the BCB ID (or _index_ for looking up +its `BasicCoverageBlockData`). +2. The first content section shows the assigned `Counter` or `Expression` for +each contiguous section of code. (There may be more than one `Expression` +incremented by the same `Counter` for discontigous sections of code representing +the same sequential actions.) Note the code is represented by the line and +column ranges (for example: `52:28-52:33`, representing the original source +line 52, for columns 28-33). These are followed by the MIR `Statement` or +`Terminator` represented by that source range. (How these coverage regions +are determined is discussed in the following section.) +3. The final section(s) show the MIR `BasicBlock`s (by ID/index and its +`TerminatorKind`) contained in this BCB. The last BCB is separated out because +its `successors()` determine the edges leading out of the BCB, and into +the `leading_bb()` (first `BasicBlock`) of each successor BCB. + +Note, to find the `BasicCoverageBlock` from a final BCB `Terminator`'s +successor `BasicBlock`, there is an index and helper +function--[`bcb_from_bb()`][bcb-from-bb]--to look up a `BasicCoverageBlock` from _any_ +contained `BasicBlock`. + +[directed-graph]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/graph/trait.DirectedGraph.html +[graph-traits]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/graph/index.html#traits +[mir-dev-guide]: mir/index.md +[compute-basic-coverage-blocks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/graph/struct.CoverageGraph.html#method.compute_basic_coverage_blocks +[simplify-cfg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/simplify/struct.SimplifyCfg.html +[rust-lang/rust#78544]: https://github.com/rust-lang/rust/issues/78544 +[mir-debugging]: mir/debugging.md +[bcb-from-bb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/graph/struct.CoverageGraph.html#method.bcb_from_bb + +### `CoverageSpans` + +The `struct` [`CoverageSpans`][coverage-spans] builds and refines a final set of +[`CoverageSpan`][coverage-span]s, each representing the largest contiguous `Span` +of source within a single BCB. By definition--since each `Span` falls within a +BCB, the `Span` is also non-branching; so if any code in that `Span` has executed, +all code in the `Span` will have executed, the same number of times. + +[`CoverageSpans::generate_coverage_spans()`][generate-coverage-spans] constructs +an initial set of `CoverageSpan`s from the `Span`s associated with each MIR +`Statement` and `Terminator`. + +The final stage of `generate_coverage_spans()` is handled by +[`to_refined_spans()`][to-refined-spans], which iterates through the `CoverageSpan`s, +merges and de-duplicates them, and returns an optimial, minimal set of `CoverageSpan`s +that can be used to assign coverage `Counter`s or `Expression`s, one-for-one. + +An visual, interactive representation of the final `CoverageSpan`s can be +generated with the following `rustc` flags: + +```shell +$ rustc -Zinstrument-coverage -Zdump-mir=InstrumentCoverage \ + -Zdump-mir-spanview some_rust_source.rs +``` + +These flags request Spanview output for the `InstrumentCoverage` pass, and the +resulting files (one for each MIR, that is, for each function or closure) can be +found in the `mir_dump` directory (by default), with the extension: +`.-------.InstrumentCoverage.0.html`. + +cropped image of a sample Spanview in a browser +
+
+ +The image above shows one such example. The orange and blue backgrounds +highlight alternating `CoverageSpan`s from the refined set. Hovering over a +line expands the output on that line to show the MIR `BasicBlock` IDs covered +by each `CoverageSpan`. While hovering, the `CoverageSpan` under the pointer +also has a _tooltip_ block of text, showing even more detail, including the +MIR `Statement`s and `Terminator`s contributing to the `CoverageSpan`, and +their individual `Span`s (which should be encapsulated within the code region +of the refined `CoverageSpan`) + +[coverage-spans]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/spans/struct.CoverageSpans.html +[coverage-span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/spans/struct.CoverageSpan.html +[to-refined-spans]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/spans/struct.CoverageSpans.html#method.to_refined_spans + +### `make_bcb_counters()` + +[`make_bcb_counters()`][make-bcb-counters] traverses the `CoverageGraph` and adds a +`Counter` or `Expression` to every BCB. It uses _Control Flow Analysis_ +to determine where an `Expression` can be used in place of a `Counter`. +`Expressions` have no runtime overhead, so if a viable expression (adding or +subtracting two other counters or expressions) can compute the same result as +an embedded counter, an `Expression` is preferred. + +[`TraverseCoverageGraphWithLoops`][traverse-coverage-graph-with-loops] +provides a traversal order that ensures all `BasicCoverageBlock` nodes in a +loop are visited before visiting any node outside that loop. The traversal +state includes a `context_stack`, with the current loop's context information +(if in a loop), as well as context for nested loops. + +Within loops, nodes with multiple outgoing edges (generally speaking, these +are BCBs terminated in a `SwitchInt`) can be optimized when at least one +branch exits the loop and at least one branch stays within the loop. (For an +`if` or `while`, there are only two branches, but a `match` may have more.) + +A branch that does not exit the loop should be counted by `Expression`, if +possible. Note that some situations require assigning counters to BCBs before +they are visited by traversal, so the `counter_kind` (`CoverageKind` for +a `Counter` or `Expression`) may have already been assigned, in which case +one of the other branches should get the `Expression`. + +For a node with more than two branches (such as for more than two +`match` patterns), only one branch can be optimized by `Expression`. All +others require a `Counter` (unless its BCB `counter_kind` was previously +assigned). + +A branch expression is derived from the equation: + +```text +Counter(branching_node) = SUM(Counter(branches)) +``` + +It's important to +be aware that the `branches` in this equation are the outgoing _edges_ +from the `branching_node`, but a `branch`'s target node may have other +incoming edges. Given the following graph, for example, the count for +`B` is the sum of its two incoming edges: + +Example graph with multiple incoming edges to a branch node +
+
+ +In this situation, BCB node `B` may require an edge counter for its +"edge from A", and that edge might be computed from an `Expression`, +`Counter(A) - Counter(C)`. But an expression for the BCB _node_ `B` +would be the sum of all incoming edges: + +```text +Expression((Counter(A) - Counter(C)) + SUM(Counter(remaining_edges))) +``` + +Note that this is only one possible configuration. The actual choice +of `Counter` vs. `Expression` also depends on the order of counter +assignments, and whether a BCB or incoming edge counter already has +its `Counter` or `Expression`. + +[bcb-counters]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/counters/struct.BcbCounters.html +[traverse-coverage-graph-with-loops]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/graph/struct.TraverseCoverageGraphWithLoops.html + +### Injecting counters into a MIR `BasicBlock` + +With the refined `CoverageSpan`s, and after all `Counter`s and `Expression`s are +created, the final step is to inject the `StatementKind::Coverage` statements +into the MIR. There are three distinct sources, handled by the following +functions: + +- [`inject_coverage_span_counters()`][inject-coverage-span-counters] injects the + counter from each `CoverageSpan`'s BCB. +- [`inject_indirect_counters()`][inject-indirect-counters] injects counters + for any BCB not assigned to a `CoverageSpan`, and for all edge counters. + These counters don't have `CoverageSpan`s. +- [`inject_intermediate_expression()`][inject-intermediate-expression] injects + the intermediate expressions returned from `make_bcb_counters()`. These + counters aren't associated with any BCB, edge, or `CoverageSpan`. + +These three functions inject the `Coverage` statements into the MIR. +`Counter`s and `Expression`s with `CoverageSpan`s add `Coverage` statements +to a corresponding `BasicBlock`, with a `CodeRegion` computed from the +refined `Span` and current `SourceMap`. + +All other `Coverage` statements have a `CodeRegion` of `None`, but they +still must be injected because they contribute to other `Expression`s. + +Finally, edge's with a `CoverageKind::Counter` require a new `BasicBlock`, +so the counter is only incremented when traversing the branch edge. + +[inject-coverage-span-counters]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/struct.Instrumentor.html#method.inject_coverage_span_counters +[inject-indirect-counters]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/struct.Instrumentor.html#method.inject_indirect_counters +[inject-intermediate-expression]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/fn.inject_intermediate_expression.html + +### Additional Debugging Support + +See the +[crate documentation for `rustc_mir::transform::coverage::debug`][coverage-debugging] +for a detailed description of the debug output, logging, and configuration options +available to developers working on the `InstrumentCoverage` pass. + +[coverage-debugging]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/coverage/debug/index.html diff --git a/src/profile-guided-optimization.md b/src/profile-guided-optimization.md index 03198fb1aa..3597ee2689 100644 --- a/src/profile-guided-optimization.md +++ b/src/profile-guided-optimization.md @@ -29,7 +29,8 @@ LLVM actually [supports multiple forms][clang-pgo] of PGO: - GCOV-based profiling, where code coverage infrastructure is used to collect profiling information. - Front-end based instrumentation, where the compiler front-end (e.g. Clang) - inserts instrumentation intrinsics into the LLVM IR it generates. + inserts instrumentation intrinsics into the LLVM IR it generates (but see the + [^note-instrument-coverage]"Note"). - IR-level instrumentation, where LLVM inserts the instrumentation intrinsics itself during optimization passes. @@ -44,6 +45,11 @@ optimized. Instrumentation-based PGO has two components: a compile-time component and run-time component, and one needs to understand the overall workflow to see how they interact. +[^note-instrument-coverage]: Note: `rustc` now supports front-end-based coverage +instrumentation, via the experimental option +[`-Z instrument-coverage`](../llvm-coverage-instrumentation), but using these +coverage results for PGO has not been attempted at this time. + ### Overall Workflow Generating a PGO-optimized program involves the following four steps: