From a2e9374048f1c04f025469dde88e13a5037d8db7 Mon Sep 17 00:00:00 2001 From: Camelid Date: Sat, 20 Mar 2021 14:23:38 -0700 Subject: [PATCH 1/2] Move debuginfo docs from `doc.rs` module to `doc.md` file And use `#[doc = include_str!("doc.md")]` in `mod.rs` so the docs are rendered as if they were inline in the root module. --- .../rustc_codegen_llvm/src/debuginfo/doc.md | 179 ++++++++++++++++++ .../rustc_codegen_llvm/src/debuginfo/doc.rs | 179 ------------------ .../rustc_codegen_llvm/src/debuginfo/mod.rs | 3 +- compiler/rustc_codegen_llvm/src/lib.rs | 1 + 4 files changed, 181 insertions(+), 181 deletions(-) create mode 100644 compiler/rustc_codegen_llvm/src/debuginfo/doc.md delete mode 100644 compiler/rustc_codegen_llvm/src/debuginfo/doc.rs diff --git a/compiler/rustc_codegen_llvm/src/debuginfo/doc.md b/compiler/rustc_codegen_llvm/src/debuginfo/doc.md new file mode 100644 index 0000000000000..485b124ab06ba --- /dev/null +++ b/compiler/rustc_codegen_llvm/src/debuginfo/doc.md @@ -0,0 +1,179 @@ +# Debug Info Module + +This module serves the purpose of generating debug symbols. We use LLVM's +[source level debugging](https://llvm.org/docs/SourceLevelDebugging.html) +features for generating the debug information. The general principle is +this: + +Given the right metadata in the LLVM IR, the LLVM code generator is able to +create DWARF debug symbols for the given code. The +[metadata](https://llvm.org/docs/LangRef.html#metadata-type) is structured +much like DWARF *debugging information entries* (DIE), representing type +information such as datatype layout, function signatures, block layout, +variable location and scope information, etc. It is the purpose of this +module to generate correct metadata and insert it into the LLVM IR. + +As the exact format of metadata trees may change between different LLVM +versions, we now use LLVM +[DIBuilder](https://llvm.org/docs/doxygen/html/classllvm_1_1DIBuilder.html) +to create metadata where possible. This will hopefully ease the adaption of +this module to future LLVM versions. + +The public API of the module is a set of functions that will insert the +correct metadata into the LLVM IR when called with the right parameters. +The module is thus driven from an outside client with functions like +`debuginfo::create_local_var_metadata(bx: block, local: &ast::local)`. + +Internally the module will try to reuse already created metadata by +utilizing a cache. The way to get a shared metadata node when needed is +thus to just call the corresponding function in this module: + + let file_metadata = file_metadata(cx, file); + +The function will take care of probing the cache for an existing node for +that exact file path. + +All private state used by the module is stored within either the +CrateDebugContext struct (owned by the CodegenCx) or the +FunctionDebugContext (owned by the FunctionCx). + +This file consists of three conceptual sections: +1. The public interface of the module +2. Module-internal metadata creation functions +3. Minor utility functions + + +## Recursive Types + +Some kinds of types, such as structs and enums can be recursive. That means +that the type definition of some type X refers to some other type which in +turn (transitively) refers to X. This introduces cycles into the type +referral graph. A naive algorithm doing an on-demand, depth-first traversal +of this graph when describing types, can get trapped in an endless loop +when it reaches such a cycle. + +For example, the following simple type for a singly-linked list... + +``` +struct List { + value: i32, + tail: Option>, +} +``` + +will generate the following callstack with a naive DFS algorithm: + +``` +describe(t = List) + describe(t = i32) + describe(t = Option>) + describe(t = Box) + describe(t = List) // at the beginning again... + ... +``` + +To break cycles like these, we use "forward declarations". That is, when +the algorithm encounters a possibly recursive type (any struct or enum), it +immediately creates a type description node and inserts it into the cache +*before* describing the members of the type. This type description is just +a stub (as type members are not described and added to it yet) but it +allows the algorithm to already refer to the type. After the stub is +inserted into the cache, the algorithm continues as before. If it now +encounters a recursive reference, it will hit the cache and does not try to +describe the type anew. + +This behavior is encapsulated in the 'RecursiveTypeDescription' enum, +which represents a kind of continuation, storing all state needed to +continue traversal at the type members after the type has been registered +with the cache. (This implementation approach might be a tad over- +engineered and may change in the future) + + +## Source Locations and Line Information + +In addition to data type descriptions the debugging information must also +allow to map machine code locations back to source code locations in order +to be useful. This functionality is also handled in this module. The +following functions allow to control source mappings: + ++ set_source_location() ++ clear_source_location() ++ start_emitting_source_locations() + +`set_source_location()` allows to set the current source location. All IR +instructions created after a call to this function will be linked to the +given source location, until another location is specified with +`set_source_location()` or the source location is cleared with +`clear_source_location()`. In the later case, subsequent IR instruction +will not be linked to any source location. As you can see, this is a +stateful API (mimicking the one in LLVM), so be careful with source +locations set by previous calls. It's probably best to not rely on any +specific state being present at a given point in code. + +One topic that deserves some extra attention is *function prologues*. At +the beginning of a function's machine code there are typically a few +instructions for loading argument values into allocas and checking if +there's enough stack space for the function to execute. This *prologue* is +not visible in the source code and LLVM puts a special PROLOGUE END marker +into the line table at the first non-prologue instruction of the function. +In order to find out where the prologue ends, LLVM looks for the first +instruction in the function body that is linked to a source location. So, +when generating prologue instructions we have to make sure that we don't +emit source location information until the 'real' function body begins. For +this reason, source location emission is disabled by default for any new +function being codegened and is only activated after a call to the third +function from the list above, `start_emitting_source_locations()`. This +function should be called right before regularly starting to codegen the +top-level block of the given function. + +There is one exception to the above rule: `llvm.dbg.declare` instruction +must be linked to the source location of the variable being declared. For +function parameters these `llvm.dbg.declare` instructions typically occur +in the middle of the prologue, however, they are ignored by LLVM's prologue +detection. The `create_argument_metadata()` and related functions take care +of linking the `llvm.dbg.declare` instructions to the correct source +locations even while source location emission is still disabled, so there +is no need to do anything special with source location handling here. + +## Unique Type Identification + +In order for link-time optimization to work properly, LLVM needs a unique +type identifier that tells it across compilation units which types are the +same as others. This type identifier is created by +`TypeMap::get_unique_type_id_of_type()` using the following algorithm: + +(1) Primitive types have their name as ID +(2) Structs, enums and traits have a multipart identifier + + (1) The first part is the SVH (strict version hash) of the crate they + were originally defined in + + (2) The second part is the ast::NodeId of the definition in their + original crate + + (3) The final part is a concatenation of the type IDs of their concrete + type arguments if they are generic types. + +(3) Tuple-, pointer and function types are structurally identified, which + means that they are equivalent if their component types are equivalent + (i.e., (i32, i32) is the same regardless in which crate it is used). + +This algorithm also provides a stable ID for types that are defined in one +crate but instantiated from metadata within another crate. We just have to +take care to always map crate and `NodeId`s back to the original crate +context. + +As a side-effect these unique type IDs also help to solve a problem arising +from lifetime parameters. Since lifetime parameters are completely omitted +in debuginfo, more than one `Ty` instance may map to the same debuginfo +type metadata, that is, some struct `Struct<'a>` may have N instantiations +with different concrete substitutions for `'a`, and thus there will be N +`Ty` instances for the type `Struct<'a>` even though it is not generic +otherwise. Unfortunately this means that we cannot use `ty::type_id()` as +cheap identifier for type metadata -- we have done this in the past, but it +led to unnecessary metadata duplication in the best case and LLVM +assertions in the worst. However, the unique type ID as described above +*can* be used as identifier. Since it is comparatively expensive to +construct, though, `ty::type_id()` is still used additionally as an +optimization for cases where the exact same type has been seen before +(which is most of the time). diff --git a/compiler/rustc_codegen_llvm/src/debuginfo/doc.rs b/compiler/rustc_codegen_llvm/src/debuginfo/doc.rs deleted file mode 100644 index 10dd590652949..0000000000000 --- a/compiler/rustc_codegen_llvm/src/debuginfo/doc.rs +++ /dev/null @@ -1,179 +0,0 @@ -//! # Debug Info Module -//! -//! This module serves the purpose of generating debug symbols. We use LLVM's -//! [source level debugging](https://llvm.org/docs/SourceLevelDebugging.html) -//! features for generating the debug information. The general principle is -//! this: -//! -//! Given the right metadata in the LLVM IR, the LLVM code generator is able to -//! create DWARF debug symbols for the given code. The -//! [metadata](https://llvm.org/docs/LangRef.html#metadata-type) is structured -//! much like DWARF *debugging information entries* (DIE), representing type -//! information such as datatype layout, function signatures, block layout, -//! variable location and scope information, etc. It is the purpose of this -//! module to generate correct metadata and insert it into the LLVM IR. -//! -//! As the exact format of metadata trees may change between different LLVM -//! versions, we now use LLVM -//! [DIBuilder](https://llvm.org/docs/doxygen/html/classllvm_1_1DIBuilder.html) -//! to create metadata where possible. This will hopefully ease the adaption of -//! this module to future LLVM versions. -//! -//! The public API of the module is a set of functions that will insert the -//! correct metadata into the LLVM IR when called with the right parameters. -//! The module is thus driven from an outside client with functions like -//! `debuginfo::create_local_var_metadata(bx: block, local: &ast::local)`. -//! -//! Internally the module will try to reuse already created metadata by -//! utilizing a cache. The way to get a shared metadata node when needed is -//! thus to just call the corresponding function in this module: -//! -//! let file_metadata = file_metadata(cx, file); -//! -//! The function will take care of probing the cache for an existing node for -//! that exact file path. -//! -//! All private state used by the module is stored within either the -//! CrateDebugContext struct (owned by the CodegenCx) or the -//! FunctionDebugContext (owned by the FunctionCx). -//! -//! This file consists of three conceptual sections: -//! 1. The public interface of the module -//! 2. Module-internal metadata creation functions -//! 3. Minor utility functions -//! -//! -//! ## Recursive Types -//! -//! Some kinds of types, such as structs and enums can be recursive. That means -//! that the type definition of some type X refers to some other type which in -//! turn (transitively) refers to X. This introduces cycles into the type -//! referral graph. A naive algorithm doing an on-demand, depth-first traversal -//! of this graph when describing types, can get trapped in an endless loop -//! when it reaches such a cycle. -//! -//! For example, the following simple type for a singly-linked list... -//! -//! ``` -//! struct List { -//! value: i32, -//! tail: Option>, -//! } -//! ``` -//! -//! will generate the following callstack with a naive DFS algorithm: -//! -//! ``` -//! describe(t = List) -//! describe(t = i32) -//! describe(t = Option>) -//! describe(t = Box) -//! describe(t = List) // at the beginning again... -//! ... -//! ``` -//! -//! To break cycles like these, we use "forward declarations". That is, when -//! the algorithm encounters a possibly recursive type (any struct or enum), it -//! immediately creates a type description node and inserts it into the cache -//! *before* describing the members of the type. This type description is just -//! a stub (as type members are not described and added to it yet) but it -//! allows the algorithm to already refer to the type. After the stub is -//! inserted into the cache, the algorithm continues as before. If it now -//! encounters a recursive reference, it will hit the cache and does not try to -//! describe the type anew. -//! -//! This behavior is encapsulated in the 'RecursiveTypeDescription' enum, -//! which represents a kind of continuation, storing all state needed to -//! continue traversal at the type members after the type has been registered -//! with the cache. (This implementation approach might be a tad over- -//! engineered and may change in the future) -//! -//! -//! ## Source Locations and Line Information -//! -//! In addition to data type descriptions the debugging information must also -//! allow to map machine code locations back to source code locations in order -//! to be useful. This functionality is also handled in this module. The -//! following functions allow to control source mappings: -//! -//! + set_source_location() -//! + clear_source_location() -//! + start_emitting_source_locations() -//! -//! `set_source_location()` allows to set the current source location. All IR -//! instructions created after a call to this function will be linked to the -//! given source location, until another location is specified with -//! `set_source_location()` or the source location is cleared with -//! `clear_source_location()`. In the later case, subsequent IR instruction -//! will not be linked to any source location. As you can see, this is a -//! stateful API (mimicking the one in LLVM), so be careful with source -//! locations set by previous calls. It's probably best to not rely on any -//! specific state being present at a given point in code. -//! -//! One topic that deserves some extra attention is *function prologues*. At -//! the beginning of a function's machine code there are typically a few -//! instructions for loading argument values into allocas and checking if -//! there's enough stack space for the function to execute. This *prologue* is -//! not visible in the source code and LLVM puts a special PROLOGUE END marker -//! into the line table at the first non-prologue instruction of the function. -//! In order to find out where the prologue ends, LLVM looks for the first -//! instruction in the function body that is linked to a source location. So, -//! when generating prologue instructions we have to make sure that we don't -//! emit source location information until the 'real' function body begins. For -//! this reason, source location emission is disabled by default for any new -//! function being codegened and is only activated after a call to the third -//! function from the list above, `start_emitting_source_locations()`. This -//! function should be called right before regularly starting to codegen the -//! top-level block of the given function. -//! -//! There is one exception to the above rule: `llvm.dbg.declare` instruction -//! must be linked to the source location of the variable being declared. For -//! function parameters these `llvm.dbg.declare` instructions typically occur -//! in the middle of the prologue, however, they are ignored by LLVM's prologue -//! detection. The `create_argument_metadata()` and related functions take care -//! of linking the `llvm.dbg.declare` instructions to the correct source -//! locations even while source location emission is still disabled, so there -//! is no need to do anything special with source location handling here. -//! -//! ## Unique Type Identification -//! -//! In order for link-time optimization to work properly, LLVM needs a unique -//! type identifier that tells it across compilation units which types are the -//! same as others. This type identifier is created by -//! `TypeMap::get_unique_type_id_of_type()` using the following algorithm: -//! -//! (1) Primitive types have their name as ID -//! (2) Structs, enums and traits have a multipart identifier -//! -//! (1) The first part is the SVH (strict version hash) of the crate they -//! were originally defined in -//! -//! (2) The second part is the ast::NodeId of the definition in their -//! original crate -//! -//! (3) The final part is a concatenation of the type IDs of their concrete -//! type arguments if they are generic types. -//! -//! (3) Tuple-, pointer and function types are structurally identified, which -//! means that they are equivalent if their component types are equivalent -//! (i.e., (i32, i32) is the same regardless in which crate it is used). -//! -//! This algorithm also provides a stable ID for types that are defined in one -//! crate but instantiated from metadata within another crate. We just have to -//! take care to always map crate and `NodeId`s back to the original crate -//! context. -//! -//! As a side-effect these unique type IDs also help to solve a problem arising -//! from lifetime parameters. Since lifetime parameters are completely omitted -//! in debuginfo, more than one `Ty` instance may map to the same debuginfo -//! type metadata, that is, some struct `Struct<'a>` may have N instantiations -//! with different concrete substitutions for `'a`, and thus there will be N -//! `Ty` instances for the type `Struct<'a>` even though it is not generic -//! otherwise. Unfortunately this means that we cannot use `ty::type_id()` as -//! cheap identifier for type metadata -- we have done this in the past, but it -//! led to unnecessary metadata duplication in the best case and LLVM -//! assertions in the worst. However, the unique type ID as described above -//! *can* be used as identifier. Since it is comparatively expensive to -//! construct, though, `ty::type_id()` is still used additionally as an -//! optimization for cases where the exact same type has been seen before -//! (which is most of the time). diff --git a/compiler/rustc_codegen_llvm/src/debuginfo/mod.rs b/compiler/rustc_codegen_llvm/src/debuginfo/mod.rs index 440e4d505fc92..abb87cb36568e 100644 --- a/compiler/rustc_codegen_llvm/src/debuginfo/mod.rs +++ b/compiler/rustc_codegen_llvm/src/debuginfo/mod.rs @@ -1,5 +1,4 @@ -// See doc.rs for documentation. -mod doc; +#![doc = include_str!("doc.md")] use rustc_codegen_ssa::mir::debuginfo::VariableKind::*; diff --git a/compiler/rustc_codegen_llvm/src/lib.rs b/compiler/rustc_codegen_llvm/src/lib.rs index d11c1592f99d1..146c08337fc77 100644 --- a/compiler/rustc_codegen_llvm/src/lib.rs +++ b/compiler/rustc_codegen_llvm/src/lib.rs @@ -8,6 +8,7 @@ #![feature(bool_to_option)] #![feature(const_cstr_unchecked)] #![feature(crate_visibility_modifier)] +#![feature(extended_key_value_attributes)] #![feature(extern_types)] #![feature(in_band_lifetimes)] #![feature(nll)] From dc240faed5ec55968b2468002d2139e1e44d873b Mon Sep 17 00:00:00 2001 From: Camelid Date: Sat, 20 Mar 2021 14:21:02 -0700 Subject: [PATCH 2/2] Cleanup LLVM debuginfo module docs * Use Markdown list syntax and unindent a bit to prevent Markdown interpreting the nested lists as code blocks * A few more small typographical cleanups --- .../rustc_codegen_llvm/src/debuginfo/doc.md | 29 ++++++++++--------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/compiler/rustc_codegen_llvm/src/debuginfo/doc.md b/compiler/rustc_codegen_llvm/src/debuginfo/doc.md index 485b124ab06ba..f983d09203904 100644 --- a/compiler/rustc_codegen_llvm/src/debuginfo/doc.md +++ b/compiler/rustc_codegen_llvm/src/debuginfo/doc.md @@ -96,9 +96,9 @@ allow to map machine code locations back to source code locations in order to be useful. This functionality is also handled in this module. The following functions allow to control source mappings: -+ set_source_location() -+ clear_source_location() -+ start_emitting_source_locations() ++ `set_source_location()` ++ `clear_source_location()` ++ `start_emitting_source_locations()` `set_source_location()` allows to set the current source location. All IR instructions created after a call to this function will be linked to the @@ -142,21 +142,22 @@ type identifier that tells it across compilation units which types are the same as others. This type identifier is created by `TypeMap::get_unique_type_id_of_type()` using the following algorithm: -(1) Primitive types have their name as ID -(2) Structs, enums and traits have a multipart identifier +1. Primitive types have their name as ID - (1) The first part is the SVH (strict version hash) of the crate they - were originally defined in +2. Structs, enums and traits have a multipart identifier - (2) The second part is the ast::NodeId of the definition in their - original crate + 1. The first part is the SVH (strict version hash) of the crate they + were originally defined in - (3) The final part is a concatenation of the type IDs of their concrete - type arguments if they are generic types. + 2. The second part is the ast::NodeId of the definition in their + original crate -(3) Tuple-, pointer and function types are structurally identified, which - means that they are equivalent if their component types are equivalent - (i.e., (i32, i32) is the same regardless in which crate it is used). + 3. The final part is a concatenation of the type IDs of their concrete + type arguments if they are generic types. + +3. Tuple-, pointer-, and function types are structurally identified, which + means that they are equivalent if their component types are equivalent + (i.e., `(i32, i32)` is the same regardless in which crate it is used). This algorithm also provides a stable ID for types that are defined in one crate but instantiated from metadata within another crate. We just have to