Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2025H1] Research: How to achieve safety when linking separately compiled code #155

Merged
merged 1 commit into from
Dec 2, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions src/2025h1/safe-linking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Research: How to achieve safety when linking separately compiled code

| Metadata | |
| -------- | ------------------------------------------------------------ |
| Owner(s) | [Mara Bos](https://github.com/m-ou-se) and [Jonathan Dönszelman](https://github.com/jdonszelmann) |
| Teams | [lang], [compiler] |
| Status | Proposed |

## Summary

Research what "safety" and "unsafety" means when dealing with separately compiled code, like when loading dynamically linked libraries.
Specifically, figure out how it'd be possible to provide any kind of safety when linking external code (such as loading a dynamic library or linking a separately compiled static library).

## Motivation

Rust has a very clear definition of "safe" and "unsafe" and (usually) makes it easy to stay in the "safe" world.
`unsafe` blocks usually only have to encapsulate very small blocks of which one can (and should) prove soundness manually.

When using `#[no_mangle]` and/or `extern { … }` to connect separately compiled code, however, any concept safety pretty much disappears.

While it might be reasonable to make some assumptions about (standardized) symbols like `strlen`,
the _unsafe assumption_ that a symbol with the same name will refer to something of the expected signature
is _not_ something that one can _prove at compile time_, but is rather an (hopeful, perhaps reasonable) expectation of
the contents of dynamic libraries available at runtime.

The end result is that for use cases like plugins, we have no option than just `unsafe`ly hoping for the best,
accepting that we cannot perfectly guarantee that undefined behavior is impossible linking/loading a library/a plugin/some external code.

### The status quo

Today, combining separately compiled code (from different Rust projects or different languages)
is done through a combination of `extern "…" fn`, `#[repr(…)]`, `#[no_mangle]`, and `extern {…}`.

Specifically:

1. `extern "…" fn` (which has a bit of a confusing name) is used to specify the _calling convention_ or _ABI_ of a function.

The default one is the `"Rust"` ABI, which (purposely) has no stability guarantees.
The `"C"` ABI is often used for its stability guarantees, but places restrictions on the possible signatures.

2. `#[repr(…)]` is used to control _memory layout_.

The default one is the `Rust` layout, which (purposely) has no stability guarantees.
The `C` layout is often used for its stability guarantees, but places restrictions on the types.

3. `#[no_mangle]` and `extern {…}` are used to control the _symbols_ used for _linking_.

`#[no_mangle]` is used for _exporting_ an item under a known symbol,
and `extern { … }` is used for _importing_ an item with a known symbol.

There have often been requests for a "stable Rust abi" which usually refers to a _calling convention_ and _memory layout_ that is
as unrestrictive as `extern "Rust" fn` and `#[repr(Rust)]`, but as stable as `extern "C" fn` and `#[repr(C)]`.

It seems unlikely that `extern "Rust" fn` and `#[repr(Rust)]` would ever come with stablity guarantees, as allowing for changes when stability is not necessary has its benefits.
It seems most likely that a "stable Rust ABI" will arrive in the form of a _new_ ABI,
by adding some kind of `extern "Rust-stable-v1"` (and `repr`) or similar
(such as `extern "crabi" fn` and `#[repr(crabi)]` [proposed here](https://github.com/rust-lang/rust/pull/105586)),
or by slowly extending `extern "C" fn` and `#[repr(C)]` to support more types (like tuples and slices, etc.).

Such developments would lift restrictions on which types one can use in FFI, but just a stable calling convention and memory layout will do almost nothing for safety,
as linking/loading a symbol (possibly at runtime) with a different signature (or ABI) than expected will still immediately lead to undefined behavior.

### Research question and scope

This research project focusses entirely on point 3 above: symbols and linking.

The main research question is:

_**What is necessary for an alternative for `#[no_mangle]` and `extern { … }` to be safe, with a reasonable and usable definition of "safe"?**_

We believe this question can be answered independently of the specifics of a stable calling convention (point 1) and memory layout (point 2).

[RFC3435 "#[export]" for dynamically linked crates](https://github.com/rust-lang/rfcs/pull/3435) proposes one possible way to provide safety in dynamic linking.
The goal of the research is to explore the entire solution space and understand the requirements and limitations that would apply to any possible solution/alternative.

### The next 6 months

- Assemble a small research team (e.g. an MSc student, a professor, and a researcher/mentor).
- Acquire funding.
- Run this as an academic research project.
- Publish intermediate results as a blog post.
- (After ~9 months) Publish a thesis and possibly a paper that answers the research question.

### The "shiny future" we are working towards

The future we're working towards is one where (dynamically) linking separately compiled code (e.g. plugins, libraries, etc.)
will feel like a first class Rust feature that is both safe and ergonomic.

Depending on the outcomes of the research, this can provide input and design requirements for future (stable) ABIs, and potentially pave the way for
safe cross-language linking.

## Design axioms

- Any design is either fully safe, or makes it possible to encapsulate the unsafety in a way that allows one to prove soundness (to reasonable extend).
- Any design allows for combining code compiled with different versions of the Rust compiler.
- Any design is usable for statically linking separately (pre) compiled static libraries, dynamically linking/loading libraries, and dynamically loading plugins.
- Designs require as little assumptions about the calling convention and memory layout.
Ideally, the only requirement is that they are stable, which means that the design can be used with the existing `extern "C" fn` and `#[repr(C)]`.

## Ownership and team asks

**Owner:** [Mara Bos](https://github.com/m-ou-se) and/or [Jonathan Dönszelman](https://github.com/jdonszelmann).

| Subgoal | Owner(s) or team(s) | Notes |
| ---------------------------------------------- | ----------------------- | ----- |
| Coordination with university | Jonathan | Delft University of Technology |
| Acquire funding | Hexcat (=Mara+Jonathan) | |
| Research | Research team (MSc student, professor, etc.) | |
| Mentoring and interfacing with Rust project | Mara, Jonathan | |
| Input and discussion on concept of safety | ![Team][] [lang] | |
| Blog post (author, review) | MSc student, Jonathan, Mara | |
| Feedback on conclusions | ![Team][] [lang] | also the Rust community (users) |
| Experimental implementation | Msc student | |
| ↳ Mentoring | Jonathan, Mara | |
| ↳ Review/accept lang experiment | ![Team][] [lang] | |
| ↳ Reviews/feedback | ![Team][] [compiler] | |
| Thesis / Paper | Research team (MSc student, professor, etc.) | |

## Frequently asked questions

### Is there a university and professor interested in this?

Yes! We've discussed this with a professor at the Delft University at Technology, who is excited and already looking for interested students.