-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc_codegen_llvm: traitification of LLVM-specific CodegenCx and Builder methods #54012
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @matthewjasper (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
r? @eddyb cc @rust-lang/compiler @sunfishcode |
fn type_i32(&self) -> Self::Type; | ||
fn type_i64(&self) -> Self::Type; | ||
fn type_i128(&self) -> Self::Type; | ||
fn type_ix(&self, num_bites: u64) -> Self::Type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: s/bites/bits/
@bors try I agree that this shouldn't affect performance, but let's check |
rustc_codegen_llvm: traitification of LLVM-specific CodegenCx and Builder methods This PR is the continuation of #52461 in the grand plan of #45274 to allow for multiple codegen backends. A first attempt at this was #52987 but since @irinagpopa is no longer working on it I'm taking ownership of the PR. The changes are refactoring only and do not affect the logic of the code. Performance should not be impacted since all parametrization is done with generics (no trait objects). The `librustc_codegen_llvm` crate now contains a new folder `interfaces` that describes with traits part of how the compiler interfaces with LLVM during codegen. `CodegenCx` and `Builder` implement those traits. Many things are still missing. All the calls to LLVM are not yet under a trait, and later LLVM-agnostic code should be parametrized.
This comment has been minimized.
This comment has been minimized.
7a2f7bf
to
77c1340
Compare
This comment has been minimized.
This comment has been minimized.
77c1340
to
5bf850a
Compare
This comment has been minimized.
This comment has been minimized.
5bf850a
to
b351097
Compare
This comment has been minimized.
This comment has been minimized.
b351097
to
4fda203
Compare
This comment has been minimized.
This comment has been minimized.
4fda203
to
c67e3c8
Compare
This comment has been minimized.
This comment has been minimized.
c67e3c8
to
afd8fdd
Compare
This comment has been minimized.
This comment has been minimized.
afd8fdd
to
cf510ad
Compare
Let's try this again: @bors try |
⌛ Trying commit cf510ad919ecc89f176781932794a94e5d989ebd with merge 63cfaffef84632fc355bfdc48ffc8a3d2f3c46fe... |
💥 Test timed out |
I don't know why bors says that the tests timed out while the Travis log reports everything green. |
1311322
to
ec7beb1
Compare
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
ec7beb1
to
1cdea08
Compare
I included in this PR the work I've done over the past month and that completes the refactoring. A new weird LLVM ThinLTO bug causes one test to fail (see description of the PR). |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
I've now fixed the thin-lto test failure here, rebased on master, and opened #55627 to carry this forward. |
Closing in favor of #55627. |
rustc_target: pass contexts by reference, not value. `LayoutOf` now takes `&self` instead of `self`, and so does every method generic over a context that implements `LayoutOf` and/or other traits, like `HasDataLayout`, `HasTyCtxt`, etc. Originally using by-value `Copy` types was relevant because `TyCtxt` was one of those types, but now `TyCtxt::layout_of` is separate from `LayoutOf`, and `TyCtxt` is not an often used layout context. Passing these context by reference is a lot nicer for miri, which has `self: &mut EvalContext`, and needed `f(&self)` (that is, creating `&&mut EvalContext` references) for layout purposes. Now, the `&mut EvalContext` can be passed to a function expecting `&C`, directly. This should help with #54012 / #55627 (to not need `where &'a T::Cx: LayoutOf` bounds). r? @nikomatsakis or @oli-obk or @nagisa cc @sunfishcode
rustc_codegen_llvm: traitification of LLVM-specific CodegenCx and Builder methods This PR is the continuation of #54012 and earlier PRs, in the grand plan of #45274 to allow for multiple codegen backends. High-level summary: interpose a set of traits between Rust's codegen logic and the LLVM APIs, allowing another backend to implement the traits and share most of the codegen logic. These traits are currently somewhat LLVM-specific, but once this refactoring is in place, they can evolve to be more general. See [this README](https://github.com/rust-lang/rust/blob/756f84d7cef90b7364ae88ca707e59670dde4c92/src/librustc_codegen_ssa/README.md) for a writeup on the current trait organization.
This PR is the continuation of #52461 in the grand plan of #45274 to allow for multiple codegen backends. A first attempt at this was #52987 but since @irinagpopa is no longer working on it I'm taking ownership of the PR.
State of the code before the refactoring
All the code related to the compilation of MIR into LLVM IR was contained inside the
rustc_codegen_llvm
crate. Here is the breakdown of the most important elements:back
folder (7,800 LOC) implements the mechanisms for creating the different object files and archive through LLVM, but also the communication mechanisms for parallel code generation;debuginfo
(3,200 LOC) folder contains all code that passes debug information down to LLVM;llvm
(2,200 LOC) folder defines the FFI necessary to communicate with LLVM using the C++ API;mir
(4,300 LOC) folder implements the actual lowering from MIR to LLVM IR;base.rs
(1,300 LOC) file contains some helper functions but also the high-level code that launches the code generation and distributes the work.builder.rs
(1,200 LOC) file contains all the functions generating individual LLVM IR instructions inside a basic block;common.rs
(450 LOC) contains various helper functions and all the functions generating LLVM static values;type_.rs
(300 LOC) defines most of the type translations to LLVM IR.The goal of this refactoring is to separate inside this crate code that is specific to the LLVM from code that can be reused for other rustc backends. For instance, the
mir
folder is almost entirely backend-specific but it relies heavily on other parts of the crate. The separation of the code must not affect the logic of the code nor its performance.For these reasons, the separation process involves two transformations that have to be done at the same time for the resulting code to compile :
While the LLVM-specific code will be left in
rustc_codegen_llvm
, all the new interfaces and backend-agnostic code will be moved inrustc_codegen_ssa
(name suggestion by @eddyb).Generic types and structures
@irinagpopa started to parametrize the types of
rustc_codegen_llvm
by a genericValue
type, implemented in LLVM by a reference&'ll Value
. This work has been extended to all structures inside themir
folder and elsewhere, as well as for LLVM'sBasicBlock
andType
types.The two most important structures for the LLVM codegen are
CodegenCx
andBuilder
. They are parametrized by multiple liftime parameters and the type forValue
.CodegenCx
is used to compile one codegen-unit that can contain multiple functions, whereasBuilder
is created to compile one basic block.The code in
rustc_codegen_llvm
has to deal with multiple explicit lifetime parameters, that correspond to the following:'tcx
is the longest lifetime, that corresponds to the originalTyCtxt
containing the program's information;'a
is a short-lived reference of aCodegenCx
or another object inside a struct;'ll
is the lifetime of references to LLVM objects such asValue
orType
.Although there are already many lifetime parameters in the code, making it generic uncovered situations where the borrow-checker was passing only due to the special nature of the LLVM objects manipulated (they are extern pointers). For instance, a additional lifetime parameter had to be added to
LocalAnalyser
inanalyse.rs
, leading to the definition:However, the two most important structures
CodegenCx
andBuilder
are not defined in the backend-agnostic code. Indeed, their content is highly specific of the backend and it makes more sense to leave their definition to the backend implementor than to allow just a narrow spot via a generic field for the backend's context.Traits and interface
Because they have to be defined by the backend,
CodegenCx
andBuilder
will be the structures implementing all the traits defining the backend's interface. These traits are defined in the folderrustc_codegen_ssa/interfaces
and all the backend-agnostic code is parametrized by them. For instance, let us explain how a function inbase.rs
is parametrized:In this signature, we have the three lifetime parameters explained earlier and the master type
Bx
which satisfies the traitBuilderMethods
corresponding to the interface satisfied by theBuilder
struct. TheBuilderMethods
defines an associated typeBx::CodegenCx
that itself satisfies theCodegenMethods
traits implemented by the structCodegenCx
. This prototype contains awhere
clause because theLayoutOf
trait is satisfied by a reference (&'a Bx::CodegenCx
) of the associated type and that we can't specify that in the trait definition ofBuilderMethods
. Finally, we have to specify that the associated types insideLayoutOf
are the actual types of Rust, using theTy = Ty<'tcx>
syntax.On the trait side, here is an example with part of the definition of
BuilderMethods
ininterfaces/builder.rs
:Finally, a master structure implementing the
ExtraBackendMethods
trait is used for high-level codegen-driving functions likecodegen_crate
inbase.rs
. For LLVM, it is the emptyLlvmCodegenBackend
.ExtraBackendMethods
should be implemented by the same structure that implements theCodegenBackend
defined inrustc_codegen_utils/codegen_backend.rs
.During the traitification process, certain functions have been converted from methods of a local structure to methods of
CodegenCx
orBuilder
and a correspondingself
parameter has been added. Indeed, LLVM stores information internally that it can access when called through its API. This information does not show up in a Rust data structure carried around when these methods are called. However, when implementing a Rust backend forrustc
, these methods will need information fromCodegenCx
, hence the additional parameter (unused in the LLVM implementation of the trait).State of the code after the refactoring
The traits offer an API which is very similar to the API of LLVM. This is not the best solution since LLVM has a very special way of doing things: when addding another backend, the traits definition might be changed in order to offer more flexibility.
However, the current separation between backend-agnostic and LLVM-specific code has allows the reuse of a significant part of the old
rustc_codegen_llvm
. Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the most important elements:back
folder: 3,800 (BA) vs 4,100 (LLVM);mir
folder: 4,400 (BA) vs 0 (LLVM);base.rs
: 1,100 (BA) vs 250 (LLVM);builder.rs
: 1,400 (BA) vs 0 (LLVM);common.rs
: 350 (BA) vs 350 (LLVM);The
debuginfo
folder has been left almost untouched by the splitting and is specific to LLVM. Only its high-level features have been traitified.The new
interfaces
folder has 1500 LOC only for trait definitions. Overall, the 27,000 LOC-sized oldrustc_codegen_llvm
code has been split into the new 18,500 LOC-sized newrustc_codegen_llvm
and the 12,000 LOC-sizedrustc_codegen_ssa
. We can say that this refactoring allowed the reuse of approximately 10,000 LOC that would otherwise have had to be duplicated between the multiple backends ofrustc
.The refactored version of
rustc
's backend introduced no regression over the test suite nor in performance benchmark, which is in coherence with the nature of the refactoring that used only compile-time parametricity (no trait objects).One test is failing after the refactoring :
src/test/run-pass/thinlto/thin-lto-inlines.rs
. It tests whether ThinLTO inlining is working via the indirect property that two raw function pointers are equal. This regression happens after the splitting of theback
folder introduced by the commit "Separating the back folder between backend-agnostic and LLVM-specifc code". I double-checked that I did not alter the logic of the code when doing the splitting and it seems not, so maybe this is a more subtle LLVM ThinLTO bug like #53912.