From 127447a708b7a0ed876b2558f0ec104fe13a1f49 Mon Sep 17 00:00:00 2001 From: Amanjeev Sethi Date: Sun, 5 May 2019 15:15:09 -0400 Subject: [PATCH] Added Rustc Debugger Support Chapter --- src/SUMMARY.md | 1 + src/debugging-support-in-rustc.md | 321 ++++++++++++++++++++++++++++++ 2 files changed, 322 insertions(+) create mode 100644 src/debugging-support-in-rustc.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 273b409ec..663357f00 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -84,6 +84,7 @@ - [Updating LLVM](./codegen/updating-llvm.md) - [Debugging LLVM](./codegen/debugging.md) - [Profile-guided Optimization](./profile-guided-optimization.md) + - [Debugging Support in Rust Compiler](./debugging-support-in-rustc.md) --- diff --git a/src/debugging-support-in-rustc.md b/src/debugging-support-in-rustc.md new file mode 100644 index 000000000..1775b07af --- /dev/null +++ b/src/debugging-support-in-rustc.md @@ -0,0 +1,321 @@ +# Debugging support in the Rust compiler + +This document explains the state of debugging tools support in the Rust compiler (rustc). +The document gives an overview of debugging tools like GDB, LLDB etc. and infrastrcture +around Rust compiler to debug Rust code. If you want to learn how to debug the Rust compiler +itself, then you must see [Debugging the Compiler] page. + +The material is gathered from YouTube video [Tom Tromey discusses debugging support in rustc]. + +## Preliminaries + +### Debuggers + +According to Wikipedia + +> A [debugger or debugging tool] is a computer program that is used to test and debug +> other programs (the "target" program). + +Writing a debugger from scratch for a language requires a lot of work, especially if +debuggers have to be supported on various platforms. GDB and LLDB, however, can be +extended to support debugging a language. This is the path that Rust has chosen. +This document's main goal is to document the said debuggers support in Rust compiler. + +### DWARF + +According to the [DWARF] standard website + +> DWARF is a debugging file format used by many compilers and debuggers to support source level +> debugging. It addresses the requirements of a number of procedural languages, +> such as C, C++, and Fortran, and is designed to be extensible to other languages. +> DWARF is architecture independent and applicable to any processor or operating system. +> It is widely used on Unix, Linux and other operating systems, +> as well as in stand-alone environments. + +DWARF reader is a program that consumes the DWARF format and creates debugger compatible output. +This program may live in the compiler itself. DWARF uses a data structure called +Debugging Information Entry (DIE) which stores the information as "tags" to denote functions, +variables etc., e.g., `DW_TAG_variable`, `DW_TAG_pointer_type`, `DW_TAG_subprogram` etc. +You can also invent your own tags and attributes. + +## Supported debuggers + +### GDB + +We have our own fork of GDB - [https://github.com/rust-dev-tools/gdb] + +#### Rust expression parser + +To be able to show debug output we need an expression parser. +This (GDB) expression parser is written in [Bison] and is only a subset of Rust expressions. +This means that this parser can parse only a subset of Rust expressions. +GDB parser was written from scratch and has no relation to any other parser. +For example, this parser is not related to Rustc's parser. + +GDB has Rust like value and type output. It can print values and types in a way +that look like Rust syntax in the output. Or when you print a type as [ptype] in GDB, +it also looks like Rust source code. Checkout the documentation in the [manual for GDB/Rust]. + +#### Parser extensions + +Expression parser has a couple of extensions in it to facilitate features that you cannot do +with Rust. Some limitations are listed in the [manual for GDB/Rust]. There is some special +code in the DWARF reader in GDB to support the extensions. + +A couple of examples of DWARF reader support needed are as follows - + +1. Enum: Needed for support for enum types. The Rustc writes the information about enum into +DWARF and GDB reads the DWARF to understand where is the tag field or is there a tag +field or is the tag slot shared with non-zero optimization etc. + +2. Dissect trait objects: DWARF extension where the trait object's description in the DWARF +also points to a stub description of the corresponding vtable which in turn points to the +concrete type for which this trait object exists. This means that you can do a `print *object` +for that trait object, and GDB will understand how to find the correct type of the payload in +the trait object. + +**TODO**: Figure out if the following should be mentioned in the GDB-Rust document rather than +this guide page so there is no duplication. This is regarding the following comments: + +[This comment by Tom](https://github.com/rust-lang/rustc-guide/pull/316#discussion_r284027340) +> gdb's Rust extensions and limitations are documented in the gdb manual: +https://sourceware.org/gdb/onlinedocs/gdb/Rust.html -- however, this neglects to mention that +gdb convenience variables and registers follow the gdb $ convention, and that the Rust parser +implements the gdb @ extension. + +[This question by Aman](https://github.com/rust-lang/rustc-guide/pull/316#discussion_r285401353) +> @tromey do you think we should mention this part in the GDB-Rust document rather than this +document so there is no duplication etc.? + +#### Developer notes + +* This work is now upstream. Bugs can be reported in [GDB Bugzilla]. + +### LLDB + +We have our own fork of LLDB - [https://github.com/rust-lang/lldb] + +Fork of LLVM project - [https://github.com/rust-lang/llvm-project] + +LLDB currently only works on macOS because of a dependency issue. This issue was easier to +solve for macOS as compared to Linux. However, Tom has a possible solution which can enable +us to ship LLDB everywhere. + +#### Rust expression parser + +This expression parser is written in C++. It is a type of [Recursive Descent parser]. +Implements slightly less of the Rust language than GDB. LLDB has Rust like value and type output. + +#### Parser extensions + +There is some special code in the DWARF reader in LLDB to support the extensions. +A couple of examples of DWARF reader support needed are as follows - + +1. Enum: Needed for support for enum types. The Rustc writes the information about +enum into DWARF and LLDB reads the DWARF to understand where is the tag field or +is there a tag field or is the tag slot shared with non-zero optimization etc. +In other words, it has enum support as well. + +#### Developer notes + +* None of the LLDB work is upstream. This [rust-lang/lldb wiki page] explains a few details. +* The reason for forking LLDB is that LLDB recently removed all the other language plugins +due to lack of maintenance. +* LLDB has a plugin architecture but that does not work for language support. +* LLDB is available via Rust build (`rustup`). +* GDB generally works better on Linux. + +## DWARF and Rustc + +[DWARF] is the standard way compilers generate debugging information that debuggers read. +It is _the_ debugging format on macOS and Linux. It is a multi-language, extensible format +and is mostly good enough for Rust's purposes. Hence, the current implementation reuses DWARF's +concepts. This is true even if some of the concepts in DWARF do not align with Rust +semantically because generally there can be some kind of mapping between the two. + +We have some DWARF extensions that the Rust compiler emits and the debuggers understand that +are _not_ in the DWARF standard. + +* Rust compiler will emit DWARF for a virtual table, and this `vtable` object will have a + `DW_AT_containing_type` that points to the real type. This lets debuggers dissect a trait object + pointer to correctly find the payload. E.g., here's such a DIE, from a test case in the gdb + repository: + + ```asm + <1><1a9>: Abbrev Number: 3 (DW_TAG_structure_type) + <1aa> DW_AT_containing_type: <0x1b4> + <1ae> DW_AT_name : (indirect string, offset: 0x23d): vtable + <1b2> DW_AT_byte_size : 0 + <1b3> DW_AT_alignment : 8 + ``` + +* The other extension is that the Rust compiler can emit a tagless discriminated union. + See [DWARF feature request] for this item. + +### Current limitations of DWARF + +* Traits - require a bigger change than normal to DWARF, on how to represent Traits in DWARF. +* DWARF provides no way to differentiate between Structs and Tuples. Rust compiler emits +fields with `__0` and debuggers look for a sequence of such names to overcome this limitation. +For example, in this case the debugger would look at a field via `x.__0` instead of `x.0`. +This is resolved via the Rust parser in the debugger so now you can do `x.0`. + +DWARF relies on debuggers to know some information about platform ABI. +Rust does not do that all the time. + +## Developer notes + +This section is from the talk about certain aspects of development. + +## What is missing + +### Shipping GDB in Rustup + +Tracking issue: [https://github.com/rust-lang/rust/issues/34457] + +Shipping GDB requires change to Rustup delivery system. To manage Rustup build size and +times we need to build GDB separately, on its own and somehow provide the artifacts produced +to be included in the final build. However, if we can ship GDB with rustup, it will simplify +the development process by having compiler emit new debug info which can be readily consumed. + +Main issue in achieving this is setting up dependencies. One such dependency is Python. That +is why we have our own fork of GDB because one of the drivers is patched on Rust's side to +check the correct version of Python (Python 2.7 in this case. *Note: Python3 is not chosen +for this purpose because Python's stable ABI is limited and is not sufficient for GDB's needs. +See [https://docs.python.org/3/c-api/stable.html]*). + +This is to keep updates to debugger as fast as possible as we make changes to the debugging symbols. +In essence, to ship the debugger as soon as new debugging info is added. GDB only releases +every six months or so. However, the changes that are +not related to Rust itself should ideally be first merged to upstream eventually. + +### Code signing for LLDB debug server on macOS + +According to Wikipedia, [System Integrity Protection] is + +> System Integrity Protection (SIP, sometimes referred to as rootless) is a security feature +> of Apple's macOS operating system introduced in OS X El Capitan. It comprises a number of +> mechanisms that are enforced by the kernel. A centerpiece is the protection of system-owned +> files and directories against modifications by processes without a specific "entitlement", +> even when executed by the root user or a user with root privileges (sudo). + +It prevents processes using `ptrace` syscall. If a process wants to use `ptrace` it has to be +code signed. The certificate that signs it has to be trusted on your machine. + +See [Apple developer documentation for System Integrity Protection]. + +We may need to sign up with Apple and get the keys to do this signing. Tom has looked into if +Mozilla cannot do this because it is at the maximum number of +keys it is allowed to sign. Tom does not know if Mozilla could get more keys. + +Alternatively, Tom suggests that maybe a Rust legal entity is needed to get the keys via Apple. +This problem is not technical in nature. If we had such a key we could sign GDB as well and +ship that. + +### DWARF and Traits + +Rust traits are not emitted into DWARF at all. The impact of this is calling a method `x.method()` +does not work as is. The reason being that method is implemented by a trait, as opposed +to a type. That information is not present so finding trait methods is missing. + +DWARF has a notion of interface types (possibly added for Java). Tom's idea was to use this +interface type as traits. + +DWARF only deals with concrete names, not the reference types. So, a given implementation of a +trait for a type would be one of these interfaces (`DW_tag_interface` type). Also, the type for +which it is implemented would describe all the interfaces this type implements. This requires a +DWARF extension. + +Issue on Github: [https://github.com/rust-lang/rust/issues/33014] + +## Typical process for a Debug Info change (LLVM) + +LLVM has Debug Info (DI) builders. This is the primary thing that Rust calls into. +This is why we need to change LLVM first because that is emitted first and not DWARF directly. +This is a kind of metadata that you construct and hand-off to LLVM. For the Rustc/LLVM hand-off +some LLVM DI builder methods are called to construct representation of a type. + +The steps of this process are as follows - + +1. LLVM needs changing. + + LLVM does not emit Interface types at all, so this needs to be implemented in the LLVM first. + + Get sign off on LLVM maintainers that this is a good idea. + +2. Change the DWARF extension. + +3. Update the debuggers. + + Update DWARF readers, expression evaluators. + +4. Update Rust compiler. + + Change it to emit this new information. + +### Procedural macro stepping + +A deeply profound question is that how do you actually debug a procedural macro? +What is the location you emit for a macro expansion? Consider some of the following cases - + +* You can emit location of the invocation of the macro. +* You can emit the location of the definition of the macro. +* You can emit locations of the content of the macro. + +RFC: [https://github.com/rust-lang/rfcs/pull/2117] + +Focus is to let macros decide what to do. This can be achieved by having some kind of attribute +that lets the macro tell the compiler where the line marker should be. This affects where you +set the breakpoints and what happens when you step it. + +## Future work + +#### Name mangling changes + +* New demangler in `libiberty` (gcc source tree). +* New demangler in LLVM or LLDB. + +**TODO**: Check the location of the demangler source. +[Question on Github](https://github.com/rust-lang/rustc-guide/pull/316#discussion_r283062536). + +#### Reuse Rust compiler for expressions + +This is an important idea because debuggers by and large do not try to implement type +inference. You need to be much more explicit when you type into the debugger than your +actual source code. So, you cannot just copy and paste an expression from your source +code to debugger and expect the same answer but this would be nice. This can be helped +by using compiler. + +It is certainly doable but it is a large project. You certainly need a bridge to the +debugger because the debugger alone has access to the memory. Both GDB (gcc) and LLDB (clang) +have this feature. LLDB uses Clang to compile code to JIT and GDB can do the same with GCC. + +Both debuggers expression evaluation implement both a superset and a subset of Rust. +They implement just the expression language but they also add some extensions like GDB has +convenience variables. Therefore, if you are taking this route then you not only need +to do this bridge but may have to add some mode to let the compiler understand some extensions. + +#### Windows debugging (PDB) is missing + +This is a complete unknown. + +[Tom Tromey discusses debugging support in rustc]: https://www.youtube.com/watch?v=elBxMRSNYr4 +[Debugging the Compiler]: compiler-debugging.md +[debugger or debugging tool]: https://en.wikipedia.org/wiki/Debugger +[Bison]: https://www.gnu.org/software/bison/ +[ptype]: https://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_109.html +[rust-lang/lldb wiki page]: https://github.com/rust-lang/lldb/wiki +[DWARF]: http://dwarfstd.org +[manual for GDB/Rust]: https://sourceware.org/gdb/onlinedocs/gdb/Rust.html +[GDB Bugzilla]: https://sourceware.org/bugzilla/ +[Recursive Descent parser]: https://en.wikipedia.org/wiki/Recursive_descent_parser +[System Integrity Protection]: https://en.wikipedia.org/wiki/System_Integrity_Protection +[https://github.com/rust-dev-tools/gdb]: https://github.com/rust-dev-tools/gdb +[DWARF feature request]: http://dwarfstd.org/ShowIssue.php?issue=180517.2 +[https://docs.python.org/3/c-api/stable.html]: https://docs.python.org/3/c-api/stable.html +[https://github.com/rust-lang/rfcs/pull/2117]: https://github.com/rust-lang/rfcs/pull/2117 +[https://github.com/rust-lang/rust/issues/33014]: https://github.com/rust-lang/rust/issues/33014 +[https://github.com/rust-lang/rust/issues/34457]: https://github.com/rust-lang/rust/issues/34457 +[Apple developer documentation for System Integrity Protection]: https://developer.apple.com/library/archive/releasenotes/MacOSX/WhatsNewInOSX/Articles/MacOSX10_11.html#//apple_ref/doc/uid/TP40016227-SW11 +[https://github.com/rust-lang/lldb]: https://github.com/rust-lang/lldb +[https://github.com/rust-lang/llvm-project]: https://github.com/rust-lang/llvm-project