Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: #[used] static variables #2386

Merged
merged 2 commits into from
May 27, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
245 changes: 245 additions & 0 deletions text/2386-used.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
- Feature Name: `used`
- Start Date: 2018-04-03
- RFC PR: [rust-lang/rfcs#2386](https://github.com/rust-lang/rfcs/pull/2386)
- Rust Issue: [rust-lang/rust#40289](https://github.com/rust-lang/rust/issues/40289)

# Summary
[summary]: #summary

Stabilize the `#[used]` attribute which is used to force the compiler to keep static variables,
even if not referenced by any other part of the program, in the output object file.

# Motivation
[motivation]: #motivation

Bare metal applications, like kernels, bootloaders and other firmware, usually need precise control
over the memory layout of the program. These programs usually need to place data structures like
vector (interrupt) tables in certain memory locations for the system to operate properly.

The final memory layout of the program is decided by the linker; bare metal applications make use of
*linker scripts* to control the placement of (linker) *sections* in memory. But for all this to work
the vector table must be present in the object files passed to the linker. That's where the
`#[used]` attribute comes in: without it the compiler will optimize away the vector table, as it's
not directly used by the program, and it will never reach the linker.

It's possible to work around the lack of the `#[used]` attribute by declaring the vector table as
public:

``` rust
// public items are exposed in the object file
#[link_section = ".vector_table.exceptions"]
pub static EXCEPTIONS: [extern "C" fn(); 14] = [/* .. */];
```

But this is brittle because the compiler can still optimize the symbol away when compiling with LTO
enabled -- with LTO the compiler has global knowledge about the program, and will see that
`EXCEPTIONS` is unused by the program and discard it.

Yet another workaround is to force a volatile load of the vector table in some part of the program,
usually before main. The compiler will always keep the vector table in this case but this
alternative incurs in the cost of a load operation that will never be optimized away by the
compiler.

``` rust
#[link_section = ".vector_table.exceptions"]
static EXCEPTIONS: [extern "C" fn(); 14] = [/* .. */];

// entry point of the firmware
fn reset() -> ! {
extern "C" {
// user entry point
fn main() -> !;
}

// this operation will never be optimized away
unsafe { ptr::read_volatile(&EXCEPTIONS[0]) };

main()
}
```

The proper solution to keeping the vector table is to mark the vector table as a *used* variable to
force the compiler to keep in one of the emitted object files.

``` rust
#[used] // will be present in the object file
#[link_section = ".vector_table.exceptions"]
static EXCEPTIONS: [extern "C" fn(); 14] = [/* .. */];
```

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

We can think of the compilation process performed by `rustc` as a two stage process. First, `rustc`
compiles a crate (source code) into *object files*, then `rustc` invokes the linker on those object
files to produce a single *executable*, or shared library (e.g. `.so`) if the crate type was set to
"cdylib".

The `#[used]` attribute can be applied to static variables to keep them in the *object files*
produced by `rustc`, even in the presence of LTO. Note that this does **not** mean that the static
variable will make its way into the binary file emitted by the linker as the linker is free to drop
symbols that it deems unused. In other words, the `#[used]` attribute does **not** affect the
behavior of the linker.

Consider the following program:

``` rust
#[used]
static FOO: u32 = 0;
static BAR: u32 = 0;

fn main() {}
```

The variable `FOO` marked with the `#[used]` attribute will be kept in the emitted object file
regardless of the optimization level. On the other hand, the unused variable `BAR` is always
optimized away.

``` console
$ cargo rustc -- --emit=obj # for simplicity incr. comp. has been disabled
$ nm -C $(find target -name '*.o')
(..)
0000000000000000 r foo::FOO
0000000000000000 t foo::main
0000000000000000 T std::rt::lang_start
(..)

$ cargo clean; cargo rustc --release --
$ nm -C $(find target -name '*.o')
0000000000000000 T main
0000000000000000 r foo::FOO
0000000000000000 t foo::main
0000000000000000 T std::rt::lang_start
(..)

$ cargo clean; cargo rustc --release -- --emit=obj -C lto
$ nm -C $(find target -name '*.o')
(..)
0000000000000000 r foo::FOO
0000000000000000 t foo::main
(..)
```

`FOO` never makes it to the final executable because the linker sees that the call graph that stems
from the user entry point `main` never makes use of `FOO` and discards it.

``` console
$ cargo clean; cargo build
$ nm -C target/debug/foo | grep FOO || echo not found
not found
```

To keep `FOO` in the final binary assistance from the linker is required; this usually means writing
a linker script.

Consider the following program:

``` rust
#[used]
#[link_section = ".init_array"]
static FOO: extern "C" fn() = before_main;

extern "C" fn before_main() {
println!("Hello")
}

fn main() {
println!("World")
}
```

When dealing with ELF files the `.init_array` section will usually be kept in the final binary by
the default linker script. If the system supports it, all function pointers stored in the
`.init_array` section will be called before entering `main`. Thus, the above program prints "Hello"
and then "World" to the console when run on a *nix system.

``` console
$ cargo run --release
Hello
World

$ nm -C target/release/foo | grep FOO
000000000026b620 t foo::FOO
```

If the `#[used]` attribute is removed from the source code then only "World" is printed to the
console as the `FOO` variable will get optimized away by the compiler.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

The `#[used]` attribute can only be used on static variables. Static variables marked with this
attribute will be appended to the special `@llvm.used` global variable when lowered to LLVM IR.

``` rust
#[used]
static FOO: u32 = 0;

fn main() {}
```

``` console
$ cargo clean; cargo rustc -- --emit=llvm-ir
$ grep llvm.used $(find -name '*.ll')
@llvm.used = appending global [1 x i8*] [i8* getelementptr inbounds (<{ [4 x i8] }>, <{ [4 x i8] }>* @_ZN3foo3FOO17hf0af6b03a826c578E, i32 0, i32 0, i32 0)], section "llvm.metadata"
```

The semantics of this operation are (quoting [LLVM reference][llvm]):

[llvm]: https://llvm.org/docs/LangRef.html#the-llvm-used-global-variable

> If a symbol appears in the @llvm.used list, then the compiler, assembler, ~and linker~ are
> required to treat the symbol as if there is a reference to the symbol that it cannot see (which is
> why they have to be named). For example, if a variable has internal linkage and no references
> other than that from the @llvm.used list, it cannot be deleted. This is commonly used to represent
> references from inline asms and other things the compiler cannot “see”, and corresponds to
> “attribute((used))” in GNU C.

*strikethrough added by the author*

The part about the linker is not true (\*): from the point of view of the linker static variables
marked with `#[used]` look exactly the same as variables that have not been marked with that
attribute -- those are the implemented LLVM semantics. Also ELF object files have no mechanism to
prevent the linker from dropping its symbols if they are not referenced by other object files.

(\*) unless "linker" is actually referring to `llvm-link` (?)

# Drawbacks
[drawbacks]: #drawbacks

This is yet another low level feature that alternative `rustc` implementations would have to
implement to be 100% compatible with the official LLVM based `rustc` implementation. Also see
`#[repr(align = "*")]`, `#[repr(*)]`, `#[link_section]`, etc.

# Rationale and alternatives
[alternatives]: #alternatives

## Chosen design

This design pretty much matches how C compilers implement this feature. See prior art section below.

## Not doing this

Not doing this means that people will continue to use the brittle workarounds presented in the
motivation section.

# Prior art
[prior-art]: #prior-art

Most compilers provide a feature with the exact same semantics: usually in the form of a "used"
attribute (e.g. `__attribute(used)__`) that can be applied to static variables.

The following C code is an example from the [KEIL toolchain documentation][keil]:

[keil]: http://www.keil.com/support/man/docs/armcc/armcc_chr1359124983230.htm

``` c
static int lose_this = 1;
static int keep_this __attribute__((used)) = 2; // retained in object file
static int keep_this_too __attribute__((used)) = 3; // retained in object file
```

# Unresolved questions
[unresolved]: #unresolved-questions

None so far.