diff --git a/text/2386-used.md b/text/2386-used.md new file mode 100644 index 00000000000..153be68844c --- /dev/null +++ b/text/2386-used.md @@ -0,0 +1,245 @@ +- Feature Name: `used` +- Start Date: 2018-04-03 +- RFC PR: [rust-lang/rfcs#2386](https://github.com/rust-lang/rfcs/pull/2386) +- Rust Issue: [rust-lang/rust#40289](https://github.com/rust-lang/rust/issues/40289) + +# Summary +[summary]: #summary + +Stabilize the `#[used]` attribute which is used to force the compiler to keep static variables, +even if not referenced by any other part of the program, in the output object file. + +# Motivation +[motivation]: #motivation + +Bare metal applications, like kernels, bootloaders and other firmware, usually need precise control +over the memory layout of the program. These programs usually need to place data structures like +vector (interrupt) tables in certain memory locations for the system to operate properly. + +The final memory layout of the program is decided by the linker; bare metal applications make use of +*linker scripts* to control the placement of (linker) *sections* in memory. But for all this to work +the vector table must be present in the object files passed to the linker. That's where the +`#[used]` attribute comes in: without it the compiler will optimize away the vector table, as it's +not directly used by the program, and it will never reach the linker. + +It's possible to work around the lack of the `#[used]` attribute by declaring the vector table as +public: + +``` rust +// public items are exposed in the object file +#[link_section = ".vector_table.exceptions"] +pub static EXCEPTIONS: [extern "C" fn(); 14] = [/* .. */]; +``` + +But this is brittle because the compiler can still optimize the symbol away when compiling with LTO +enabled -- with LTO the compiler has global knowledge about the program, and will see that +`EXCEPTIONS` is unused by the program and discard it. + +Yet another workaround is to force a volatile load of the vector table in some part of the program, +usually before main. The compiler will always keep the vector table in this case but this +alternative incurs in the cost of a load operation that will never be optimized away by the +compiler. + +``` rust +#[link_section = ".vector_table.exceptions"] +static EXCEPTIONS: [extern "C" fn(); 14] = [/* .. */]; + +// entry point of the firmware +fn reset() -> ! { + extern "C" { + // user entry point + fn main() -> !; + } + + // this operation will never be optimized away + unsafe { ptr::read_volatile(&EXCEPTIONS[0]) }; + + main() +} +``` + +The proper solution to keeping the vector table is to mark the vector table as a *used* variable to +force the compiler to keep in one of the emitted object files. + +``` rust +#[used] // will be present in the object file +#[link_section = ".vector_table.exceptions"] +static EXCEPTIONS: [extern "C" fn(); 14] = [/* .. */]; +``` + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +We can think of the compilation process performed by `rustc` as a two stage process. First, `rustc` +compiles a crate (source code) into *object files*, then `rustc` invokes the linker on those object +files to produce a single *executable*, or shared library (e.g. `.so`) if the crate type was set to +"cdylib". + +The `#[used]` attribute can be applied to static variables to keep them in the *object files* +produced by `rustc`, even in the presence of LTO. Note that this does **not** mean that the static +variable will make its way into the binary file emitted by the linker as the linker is free to drop +symbols that it deems unused. In other words, the `#[used]` attribute does **not** affect the +behavior of the linker. + +Consider the following program: + +``` rust +#[used] +static FOO: u32 = 0; +static BAR: u32 = 0; + +fn main() {} +``` + +The variable `FOO` marked with the `#[used]` attribute will be kept in the emitted object file +regardless of the optimization level. On the other hand, the unused variable `BAR` is always +optimized away. + +``` console +$ cargo rustc -- --emit=obj # for simplicity incr. comp. has been disabled +$ nm -C $(find target -name '*.o') +(..) +0000000000000000 r foo::FOO +0000000000000000 t foo::main +0000000000000000 T std::rt::lang_start +(..) + +$ cargo clean; cargo rustc --release -- +$ nm -C $(find target -name '*.o') +0000000000000000 T main +0000000000000000 r foo::FOO +0000000000000000 t foo::main +0000000000000000 T std::rt::lang_start +(..) + +$ cargo clean; cargo rustc --release -- --emit=obj -C lto +$ nm -C $(find target -name '*.o') +(..) +0000000000000000 r foo::FOO +0000000000000000 t foo::main +(..) +``` + +`FOO` never makes it to the final executable because the linker sees that the call graph that stems +from the user entry point `main` never makes use of `FOO` and discards it. + +``` console +$ cargo clean; cargo build +$ nm -C target/debug/foo | grep FOO || echo not found +not found +``` + +To keep `FOO` in the final binary assistance from the linker is required; this usually means writing +a linker script. + +Consider the following program: + +``` rust +#[used] +#[link_section = ".init_array"] +static FOO: extern "C" fn() = before_main; + +extern "C" fn before_main() { + println!("Hello") +} + +fn main() { + println!("World") +} +``` + +When dealing with ELF files the `.init_array` section will usually be kept in the final binary by +the default linker script. If the system supports it, all function pointers stored in the +`.init_array` section will be called before entering `main`. Thus, the above program prints "Hello" +and then "World" to the console when run on a *nix system. + +``` console +$ cargo run --release +Hello +World + +$ nm -C target/release/foo | grep FOO +000000000026b620 t foo::FOO +``` + +If the `#[used]` attribute is removed from the source code then only "World" is printed to the +console as the `FOO` variable will get optimized away by the compiler. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +The `#[used]` attribute can only be used on static variables. Static variables marked with this +attribute will be appended to the special `@llvm.used` global variable when lowered to LLVM IR. + +``` rust +#[used] +static FOO: u32 = 0; + +fn main() {} +``` + +``` console +$ cargo clean; cargo rustc -- --emit=llvm-ir +$ grep llvm.used $(find -name '*.ll') +@llvm.used = appending global [1 x i8*] [i8* getelementptr inbounds (<{ [4 x i8] }>, <{ [4 x i8] }>* @_ZN3foo3FOO17hf0af6b03a826c578E, i32 0, i32 0, i32 0)], section "llvm.metadata" +``` + +The semantics of this operation are (quoting [LLVM reference][llvm]): + +[llvm]: https://llvm.org/docs/LangRef.html#the-llvm-used-global-variable + +> If a symbol appears in the @llvm.used list, then the compiler, assembler, ~and linker~ are +> required to treat the symbol as if there is a reference to the symbol that it cannot see (which is +> why they have to be named). For example, if a variable has internal linkage and no references +> other than that from the @llvm.used list, it cannot be deleted. This is commonly used to represent +> references from inline asms and other things the compiler cannot “see”, and corresponds to +> “attribute((used))” in GNU C. + +*strikethrough added by the author* + +The part about the linker is not true (\*): from the point of view of the linker static variables +marked with `#[used]` look exactly the same as variables that have not been marked with that +attribute -- those are the implemented LLVM semantics. Also ELF object files have no mechanism to +prevent the linker from dropping its symbols if they are not referenced by other object files. + +(\*) unless "linker" is actually referring to `llvm-link` (?) + +# Drawbacks +[drawbacks]: #drawbacks + +This is yet another low level feature that alternative `rustc` implementations would have to +implement to be 100% compatible with the official LLVM based `rustc` implementation. Also see +`#[repr(align = "*")]`, `#[repr(*)]`, `#[link_section]`, etc. + +# Rationale and alternatives +[alternatives]: #alternatives + +## Chosen design + +This design pretty much matches how C compilers implement this feature. See prior art section below. + +## Not doing this + +Not doing this means that people will continue to use the brittle workarounds presented in the +motivation section. + +# Prior art +[prior-art]: #prior-art + +Most compilers provide a feature with the exact same semantics: usually in the form of a "used" +attribute (e.g. `__attribute(used)__`) that can be applied to static variables. + +The following C code is an example from the [KEIL toolchain documentation][keil]: + +[keil]: http://www.keil.com/support/man/docs/armcc/armcc_chr1359124983230.htm + +``` c +static int lose_this = 1; +static int keep_this __attribute__((used)) = 2; // retained in object file +static int keep_this_too __attribute__((used)) = 3; // retained in object file +``` + +# Unresolved questions +[unresolved]: #unresolved-questions + +None so far.