Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[distributed_slice] aka a method to enumerate tests #3

Open
Tracked by #2
epage opened this issue Feb 7, 2024 · 22 comments
Open
Tracked by #2

#[distributed_slice] aka a method to enumerate tests #3

epage opened this issue Feb 7, 2024 · 22 comments
Labels
S-needs-team-input Status: Needs input from team on whether/how to proceed.

Comments

@epage
Copy link

epage commented Feb 7, 2024

We need a method for custom test harnesses to enumerate tests. dtolnay (maintainer of linkme and ctor) was pushing for an in-language #[distributed_slice]

@epage epage added the S-needs-team-input Status: Needs input from team on whether/how to proceed. label Feb 7, 2024
@epage epage added this to the custom test harness milestone Feb 7, 2024
@epage epage mentioned this issue Feb 7, 2024
6 tasks
@epage
Copy link
Author

epage commented Feb 7, 2024

Summary of Past Work

Past discussions

Use cases

Prior art

Potential RFC collaborators

Design ideas

New language syntax:

partial const ALL_THINGS: &[Thing] = &[..];
// "[..]" means this is a definition of a partial slice constant.
// It's a compilation error to have more than one definition.

partial const ALL_THINGS = &[.., Thing(1), ..];
// "[.., x, ..]" means it's an extension or a partial slice constant.
// It has to be defined elsewhere.

partial const ALL_THINGS = &[.., Thing(2), ..];
// Another extension of the same constant.

Attribute

// crate `foo`

// extendable static should be public and have type `&[T]`
#[distributed_slice]
pub static ERROR_MSGS: &'static [(u32, &'static str)] = &[
    (ERROR_CODE_1, ERROR_MSG_1),
    (ERROR_CODE_2, ERROR_MSG_2),
];
// crate `bar` which depends on crate `foo`
use foo::ERROR_MSGS;

// if extended static has type `&[T]`, then this const should have type `T`
#[distributed_slice(ERROR_MSGS)]
const CUSTOM_ERROR: (u32, &'static str) = (0x123456, "custom err message");

@jdonszelmann
Copy link

jdonszelmann commented May 9, 2024

I spent my evening doing a tiny bit of research, expanding on the points you'd already mentioned. Mostly as an overview for myself, but I thought making those thoughts public wouldn't hurt.

Use cases

  1. Custom Test Harnasses
    • This is essentially required if you want to run tests on embedded systems. Current libtest doesn't support this, but with the experimental custom test harnesses feature you can enumerate all tests to call them manually after boot for example.
  2. language binding registration:
    • PyO3 is currently using inventory for their multiple-pymethods feature. Using a macro you can define python methods in Rust. However, you can only have a single such #[pymethods] annotated block. With multiple-pymethods you can define multiple such blocks, which using inventory are added to a static array at compile time to then be used in the bindings to python.
  3. distributed cli flags
  4. plugin systems
    • Crates such as inventory can essentially be replaced by this feature. This is essentially why PyO3 uses inventory as a dependency. However one can also imagine using this to
      • register routes to a webserver
      • typetag uses inventory to find all annotated types
      • Registring benchmarks with libraries such as criterion (related to custom test harnesses)

Prior art

Both ctor and linkme use linker magic. It's kind of amazing this works as well as it does.

  • ctor works by adding functions to a specific link section (.init_array), which can be run before main. This allows (essentially) arbitrary code to run before main, as much as running any code before main is supported. This code could register an item in some kind of global which can then be picked up in main.
  • linkme's distributed slice works by adding specific names to the start and end of a linker section. All "registered" symbols are added inbetween these start and end markers, and a slice can later be constructed by iterating over the section from the start to the end marker.

Rust's #[test] annotations work completely differently. Instead of (ab)using the linker, the rust compiler itself collects all the tests during compilation.

Why implement this in the compiler?

Since there are already libraries (like ctor and linkme) that can do this, why would we need to add a new feature to rustc?

  1. It would be an alternative way to implement custom test frameworks, making #[test] less magical. This is something the testing devex team wants.
  2. It would provide a clear, supported way to do things for which rust users might otherwise use global constructors (through ctor, for example). Limitations of those are described in the next section (this was noted by David Tolnay)
  3. We would be able to create custom syntax for this, something libraries could not. However, I'll later note that we might not actually want to do that.

Using the linker or the compiler?

Prior art shows that this feature could be implemented in two ways.
Either through the linker or directly in the compiler.

The linker route means possible limitations in platform support.

  • linkme, at the time of writing, is tested on Linux, Macos, Windows, FreeBSD and illuminos. It will possibly work on even more targets. The repository also seems to include some tests for embedded targets, although it requires a modified memory.x file. If this became a standard feature of the language, it would be available everywhere. Thus, everyone would need to use such modified memory.x files on embedded. I also worry (maybe wrongly?) that this would limit what kind of linkers we are able to support.
  • ctor can run code before main, and it's README.md on github starts with a list of warnings. Code in them can only use libc as std is not loaded yet. Making this a standard language feature would either require us to define code before main properly (which sounds rather impossible) or make it unsafe and something library authors might use as an implementation detail. However, theoretically we could use global constructors under the hood to implement distributed slices. That would certainly not be zero-cost though. Code would need to be run for every part of a distributed slice before main. Additionally, ctor has limited platform support, much more limited than linkme.

However, with "linker magic", rust could support such distributed slices, possibly through dynamically loaded libraries. Dynamic linking in Rust has always been kind of hard, and usually goes through cdylibs. However, if rust ever wants to support that, adding a language feature that would block this might not be very productive. I say that, because the alternative might do exactly that. Supporting distributed slices through the compiler like #[test] essentially blocks dynamically linked libraries to add to distributed slices. The slices are already built before linktime.

Note that #[test] itself isn't affected by this, as tests are never dynamically linked in.

Ed Page noted during RustNL that this might not be so big of an issue, saying that rust crates span the entire spectrum from c's single header libraries to gigantic dynamically libraries like openssl for example. He thinks that the kind of applications you would use distributed slices for are not likely to be these gigantic libraries you might want to dynamically link. (correct me if I'm now misquoting you @epage)

Personally, I think going through the compiler, not the linker is much less hacky and provides a more natural extension of #[test].

Complications

  • As noted by newpavlov, the order of elements in distributed slices would ideally be deterministic. They propose to use the crate name and module path to sort elements in a distributed slice essentially lexiographically.
    • libtest sorts test based on their (test function) name, also alphabetically.
  • Incremental compilation might complicate things, since distributed slices need to be rebuilt when any code that might add to it is recompiled.

@epage
Copy link
Author

epage commented May 10, 2024

Regarding dylibs, there are

  • Publically facing dylibs: this would be similar to when people use dylibs in C/C++ for large subsystems, like gitoxide or bevy. The story here isn't too well developed at this time. It might be enough to do manual registration of the dylibs
  • Plugin systems: manual registration is probably the best way to go
  • Fast rebuild hack: turn everything into dylibs so you don't have slow link times during development. This could be negatively impacted but is there much of a use case if we use faster linkers? Might be good to check with Bevy as I think they do this.

@jdonszelmann
Copy link

One idea that has me more and more convinced is to not implement this feature as a slice at all. Distributed slice is maybe not even the right name. We cannot easily make the order stable so keeping indices to elements in the slice is nonsensical. Instead implementing as a type that implements Iterator (a bit like iterators over hashmaps) with a randomized order (that's seeded by something in the source code, to keep reproducible builds) we can do a lot better. Actually, the way this iterator works can then also be changed internally. If a shared library is loaded the iterators could in theory essentially be Iterator::chain-ed to make the "distributed iterator" work.

@jdonszelmann
Copy link

I also just talked to @m-ou-se who is working on an RFC to implement functions in a different library than their definition, which turns out to be a related issue. Essentially, they're "single item distributed slices". There, dynamic library loading is even more of an issue, since there can be only one implementation of a function. Thus, a dynamic library that has a conflicting definition would never be able to be loaded.

@m-ou-se
Copy link
Member

m-ou-se commented May 10, 2024

Here's the RFC for that: rust-lang/rfcs#3632

@jdonszelmann
Copy link

jdonszelmann commented May 11, 2024

@epage got an ok from Scott. The plan:

  1. Write down very clearly the problem and list the current possible solutions.
  2. Order them by viability.
  3. Start a T-lang experiment and implement the one that currently feels most viable (I'm guessing that's going to be an implementation in the compiler, not linker and an iterator, not a slice).
  4. Test that experiment, for example on dioxus and maybe bevy?
  5. Either go through with that (write an rfc) or implement one of the alternatives.

@jdonszelmann
Copy link

jdonszelmann commented May 11, 2024

Problem Description

#[test] is magical.
Using the attribute, functions spread all over a crate can be "collected" in such a way that they can be iterated over.
However, the ability to create such a collection system, or global registration system, turns out to be useful elsewhere, not just for #[test].

The following are just a few examples of some large crates using alternative collection systems (inventory, linkme based on ctor) for one reason or another:

Additionally, one can imagine webservers registring routes this way, although I found nobody doing that at the time of writing.

In almost all the examples above, doing global registration is is an opt-in feature, behind a cargo feature flag.
Existing solutions are a bit of a hack, and have limited platform support.

Especially `inventory`, based on `ctor`, which most crates mentioned above use, is only regularly tested on windows, macos and linux, and use on embedded targets is complicated. On embedded targets you must manually ensure that all global constructors are called, or a runtime like [`cortex-m-rt`](https://crates.io/crates/cortex-m-rt) must do so.
However, also `linkme` had 3 cases where it broke with linker errors, or was missing platform support in 2023.

It seems, authors of libraries are wary including registration systems in their library.
I conjecture because random breakages due to a bug in a downstream crate, or limited platform support is painful and limiting.
Bevy has discussed exactly this, citing limited or no wasm support.

Specifically for the testing-devex team, working on libtest-next.
It was proposed by Ed Page (and in in-person conversations) that we should make #[test] less magical so rust can fully support custom test frameworks.
This plan was explicitly endorsed by the libs team.

Custom test frameworks are useful for all kinds of purposes, like test fixtures.
Importantly, it is essential for testing on #![no_std].
The only way to currently do that is using #![feature(custom_test_frameworks)]
It was discussed (in-person) that this is also useful for rust for linux.

In summary:

  • #[test] is a magical registration system which cannot be used for any other purpose than tests.
  • Libraries do seem to have a need for registration systems.
  • Crates offering registration system by using the linker are in use, but it seems platform support and fragility is an issue for downstream crates.
  • To advance the state of testing in the language, having access to a better supported registration system is desired.

Existing solutions

Linkme

Because this pattern is so useful,
there are libraries available in the ecosystem that try to emulate this behavior.
Primarily, there's linkme's distribted_slice macro, by David Tolnay.
As the crate's name implies, this works by (ab)using the linker.

linkme has had issues because it was broken on various platforms in the past.
Indeed, it has some platform specific code, though most platforms are now supported.
The crate works by creating a linker section for each distributed slice,
and placing all elements of that slice in this section.
Based on special start and end symbols that are placed at the start and end of this section,
he program can figure out at runtime how large the slice has become reconstruct it using some unsafe code.

Inventory

An alternative approach, also written by David Tolnay is inventory, based on ctor.
Using ctor you can define "global construtors".
Entries in a special linker section that,
on various platforms, are executed before main is called.
The name and semantics of these sections changes per-platform,
and using them users can execute code before main.

This is wildly unsafe, as std is not yet initialized. ctor's README.md on github starts with a large warning to be very careful not to call std functions and to use libc functions instead if you must.
In inventory, these ctors each execute a little bit of code to register some element globally before main starts in a linked list.

#[test]

#[test] is unique, in that it does not involve the linker at all.
Instead, the compiler collects all the marked elements and generates a slice containing all elements from throughout the crates.

Note: this is also what Custom Test Frameworks does.

This can be both an advantage and a disadvantage.

Advantages:

  1. It's super stable. It is guaranteed to work on any platform
  2. If something goes wrong, you don't get a nasty linker error, but a nice compiler error
  3. Because it works on any platform, it indeed could support custom test frameworks on #![no_std], a part of the reason why we'd want a global registration system.
  4. It might be possible to support during const evaluation, though comments on a recent RFC by Mara show that this can also be undesirable, as it means that all crates need to be considered together during const evaluation.

Disadvantages:

  1. Building a slice at compile time simply does not support registering elements loaded through a dynamic library (though there might be some ways around that: TODO). Rust's story for dynamic libraries isn't great anyway, but this would add another major blocker.
  2. Exactly this might make hot-patching binaries harder. There were some proposals for this floating around but it would make future implementations of this harder. Actually, that goes for this entire feature, whether supported through the linker or the compiler.

Possible alternative solutions

Keep things as-is

Having this feature only supported through downstream crates.

Providing linkme's distributed_slice or inventory as part of the compiler

Either of these methods would have limitations in platform support,
but if they were also tested as part of the compiler we might be able to guarantee some sort of stability.
I'm especially wary of the ctor based approach, but maybe linkme isn't so bad.
It seems to support many platforms,
and even has a test of it running on cortex-m #![no_std].
It does require a modified linker script listing the sections used for the distributed slices.
Theoretically the compiler could automate those additions to the linker script.

However, it's unclear whether linkme supports WASM,
and based on my own testing I don't think it does.
I'm unsure what would be required to start supporting that.

Ignoring dynamic libraries: providing distributed slices like #[test]

This is the approach rust-lang/rfcs#3632 takes.
Their reasoning is that current similar systems don't either:
global_allocator doesn't work with dylibs either.
Indeed, tests also don't work across dylibs.
However, that's never a concern as tests are usually crate-local and always statically compiled with the binary they're testing.

Ed page also has an opinion about this,
and thinks we shouldn't worry too much dynamic linking right now, though we should check with Bevy whether it'd be benificial for them.

Personally, I do think we could keep in mind that dynamic linking exists, and we should make sure that the only possible implementation of a design is not to support dylibs at all.

A hybrid approach: a proposal to move forward

I think there is a hybrid approach we can take.
One that does not completely rule out dylibs, but might initially not support them while still meeting most people's needs.

The name "distributed slice" might not be very accurate.
With global registration, the ordering of elements is not important, and essentially deterministically random.
It's more like a distributed set of elements actually, where the index of elements in the slice is essentially useless.

Instead, I propose to expose a registration system as an opaque type that implements IntoIterator, just like std::env::Args.
Initially, we can choose to not even expose a len method, as the lenghth might depend on the number of dynamic libraries loaded.
The implementation could then be a slice, or a linked list, or a collection of slices linked together (one per dylib?).
Crucially, the key here is that in this way, we expose the minimal useful API for global registration,
leaving our options for implementation details completely open, such that we can change the internals of it at any point in the future.

The only downside of this approach that I could find sofar is that iterators are not const-safe (yet).
Whether we want to support iteration over globally registered elements in const context is questionable
(as highlighted above; then const evaluation depends on all the crates are being compiled and might register elements),
but it would restrict that feature.
I believe that's acceptable, especially now for experimentation, and where any linker-based approaches wouldn't support that use case either.

We should also make sure that we only implement traits for this opaque type that stay compatible with slices,
so we're free to expose a slice in the future if we want to.

I'd like to experiment with that approach, to see if it meets enough people's needs.
If not we can consider one of the other approaches highlighted.
I propose calling the feature global_registration, not distributed_slice to be more generic.

@jdonszelmann
Copy link

jdonszelmann commented May 11, 2024

There we go, that's a kind of proposal. I hope it makes some sense. Any thoughts?

@epage
Copy link
Author

epage commented May 11, 2024

Specifically for the testing-devex team, working on libtest-next.
It was #3 (and in in-person conversations) that we should make #[test] less magical so rust can fully support custom test frameworks.

To add weight, this was a plan agreed to in conjunction with libs-api with an explicit endorsement to implement a language feature and not to use ctor or linkme from dtolany

#[test] is unique, in that it does not involve the linker at all.

An important part of prior art in this is
https://rust-lang.github.io/rfcs/2318-custom-test-frameworks.html

iirc the plan was to stabilize the compiler's test collection for use by users.

The key, is that the name "distributed slice" might not be correct.

imo the biggest argument for an opaque type is that we are able to stabilize the least amount of the API and then go in one of several directions in the future, depending on how things evolve.

Instead, I propose to expose a registration system as an opaque type that implements IntoIterator, just like std::env::Args.

We should call out that traits will be implemented with being compatible with [] in case we decide to expose that as a future possibility

I propose calling the feature global_registration, not distributed_slice to be more generic.

This doesn't describe how we would be defining and registering items, but the name we'd use. This is where an opaque type might be difficult and require alternative design work instead.

@jdonszelmann
Copy link

Thanks Ed! I'll see if I can incorporate some of your comments. I just don't quite get your last one?

This doesn't describe how we would be defining and registering items, but the name we'd use. This is where an opaque type might be difficult and require alternative design work instead.

@bal-e
Copy link

bal-e commented May 12, 2024

I've personally implemented this linker magic for embedded code before, and I think that performing that work in the compiler instead is a fantastic idea. However, I'm quite worried about how this will interact with dynamic library loading, and I'd like to see a resilient API that deals with that use case properly.

The Use Cases

The primary concern with loading a library at runtime is that new items get registered, and the various "distributed slices" / registries need to be intelligently updated and notified about this. Based on the use cases @jdonszelmann outlined, these registries need the following actions in the face of dynamic library loading:

  • pyo3 uses this feature for tracking impl blocks for Python classes implemented in Rust (see here). It's probably fair to restrict the addition of new impl blocks from libraries loaded at runtime, so that libraries can only add impls for the classes they define themselves. As such, there is relatively little active interaction with dynamic libraries: this is best modeled with a specialized attribute that marks a "distributed slice" that cannot be added to from outside the current crate / linking unit.

  • cucumber uses this feature for collecting test cases (see here). While it is technically possible for a test harness to load some libraries at runtime before running all or specific tests, this generally seems unlikely: it's reasonable to use the same restricted "distributed slice" concept as above. On the other hand, if support for dynamically loading libraries is needed, then cucumber needs to be notified when these libraries are loaded, so that it can inspect the loaded libraries and add registered tests to its internal data structures. If this loading occurs after some tests have already been run, then cucumber needs to be able to dynamically update its data structures (at the moment they appear to be immutable after construction).

  • typetag uses this feature for collecting trait implementations (see here). The inspected trait implementations are added to a BTreeMap. In the case of dynamic library loading, this map needs to be updated - at the moment it is contained within an &'static Registry and some kind of mutex will likely be needed. While disallowing dynamic library loading is possible, I think support for it is reasonable here - some users of typetag may need this feature.

  • gflags (which appears to be archived as of December 2022) uses this feature for finding command-line flag specifications from all over a binary (see here). In some cases (e.g. as part of a pluggable extension system), dynamically loaded libraries may choose to add new command-line flags. In this case, being able to inspect the entire hierarchy of loaded libraries is the nicest option, and would support edge cases like recursive dynamic loading. While dynamic loading would modify their hierarchy, it should be easy to prevent race conditions (as argument parsing should only be happening on a single thread).

  • leptos/server_fn uses this feature for collecting a map of server-side functions (see here). In the face of dynamic loading, this map needs to be mutated (which is already supported for manual additions of server-side functions).

  • dioxus uses this feature similarly to leptos/server_fn (see here). This time, however, the map appears to be static.

  • apollo-router uses linkme directly to get a list of plugins at compile-time (see here). Dynamically loaded libraries containing plugins should be loaded strictly before the router starts, so that all plugins participate in initialization at the same time. Adding plugins after the router starts should be an error.

  • Similarly to leptos/server_fn and dioxus, rsasl uses linkme to get a list of registered authentication mechanisms (see here). These are stored as a static slice and iterated over on demand to select the appropriate mechanism to use per-request. In this context, dynamic library loading can be allowed fully (in which case the mechanism list is added to when a library is loaded, so that it affects future requests), allowed before initialization, or disallowed entirely.

I see a few patterns here. For use cases which may permit dynamic library loading, some kind of complex data structure needs to be updated every time new items in the "distributed slice" are added. All but one of the above use cases (leptos/server_fn) would need to change to make that data structure mutable at run-time. In some cases, updates are only allowed until a specific point in time (after which they can fail loudly or quietly); in others, updates are disallowed entirely.

The important concept to note here is that the user of each "distributed slice" needs to run custom code every time a library is loaded at runtime. In general, they will take this opportunity to lock and update their internal data structures with the newly available information. If support for this feature were implemented into the compiler, and we want to take the time to properly support dynamically loaded libraries, then we can't just expose the "distributed slice" or an iterator of items in unspecified order. Some kind of hook mechanism is necessary.

If a consumer of this feature does not care about dynamic library loading, then all the elements of their "distributed slices" are known to the compiler, and they should be able to access them in const contexts. Consumers who do care about dynamic library loading will have a mutable data structure of some kind to store the contents of their "distributed slices" in a processed manner. In either case, I do not think that we need to expose the actual "distributed slice" data structure to consumers at all.

An Alternative Design

We should implement two things: distributed_slice and distributed_slice_const (bikeshedding is left as an exercise to the reader). The former is an attributed annotated on a hook function that consumes one element at a time. The latter is a functional macro that can be used in const contexts, whose argument is a hook function in the form of a lambda. The hook function is called for each item in the "distributed slice". In the former case, it is called immediately before main() (for all items available in the executable before any new libraries are loaded) and when any library containing new items is dynamically loaded.

Let's take the example of leptos/server_fn (here):

// The current code, using `inventory`:

// The "distributed slice" object as wrapped in 'inventory'.
impl<...> inventory::Collect for ServerFnTraitObj<...> {
    #[inline]
    fn registry() -> &'static inventory::Registry {
        static REGISTRY: inventory::Registry = inventory::Registry::new();
        &REGISTRY
    }
}

// The actual data structure 'server_fn' stores information in.
static REGISTERED_SERVER_FUNCTIONS: LazyServerFnMap<...> = {
    // Inside the 'initialize_server_fn_map' macro:
    once_cell::sync::Lazy::new(|| {
        inventory::iter::<ServerFnTraitObj<...>>
            .into_iter()
            .map(|obj| (obj.path(), obj.clone()))
            .collect()
    })
};

// A function to explicitly update that structure.
pub fn register_explicit<T>()
where T: ServerFn<...> {
    REGISTERED_SERVER_FUNCTIONS.insert(...);
}

// An item being added to the "distributed slice".
#[linker_constructor_magic]
fn __ctor() {
    <ServerFnTraitObj<...>>::registry()
        .submit(ServerFnTraitObj::new(...));
}

// The code using @jdonszelmann's proposal:

// The "distributed slice" object, bikeshedding aside.
declare_distributed_slice!(REGISTERED_SERVER_FUNCTIONS_INTERNAL: ServerFnTraitObj<...>);

// The actual data structure 'server_fn' stores information in.
static REGISTERED_SERVER_FUNCTIONS: LazyServerFnMap<...> = {
    // Inside the 'initialize_server_fn_map' macro:
    once_cell::sync::Lazy::new(|| {
        REGISTERED_SERVER_FUNCTIONS_INTERNAL
            .into_iter()
            .map(|obj| (obj.path(), obj.clone()))
            .collect()
    })
};

// A function to explicitly update that structure.
pub fn register_explicit<T>()
where T: ServerFn<...> {
    REGISTERED_SERVER_FUNCTIONS.insert(...);
}

// An item being added to the "distributed slice".
add_to_distributed_slice!(ServerFnTraitObj::new(...));

// The code using my proposal:

// The actual data structure 'server_fn' stores information in.
static REGISTERED_SERVER_FUNCTIONS: ServerFnMap<...> = {
    // Eagerly initialize it to empty.
    Default::default()
};

// An alternative definition if dynamic loading was disallowed.
const REGISTERED_SERVER_FUNCTIONS_CONST: LazyServerFnMap<...> = {
    let mut map = Default::default();
    distributed_slice_const!(|obj| {
        map.insert(obj.path(), obj.clone());
    });
    map
};

// A function to explicitly update that structure.
pub fn register_explicit<T>()
where T: ServerFn<...> {
    REGISTERED_SERVER_FUNCTIONS.insert(...);
}

// The hook function called when a new item is detected.
#[distributed_slice_hook]
pub fn server_functions<T>()
where T: ServerFn<...> {
    register_explicit::<T>();
}

// An item being added to the distributed slice.
add_to_distributed_slice!(ServerFnTraitObj::new(...));

A Comparison

@jdonszelmann's proposal lines up nicely with how this feature is currently implemented in linkme or inventory. An opaque registry type manages the actual list of elements and can be queried at will to iterate over them. However, most use cases will only query this list once, process the elements, and store them in a more complex data structure. This proposal doesn't come with that much of an API change for these use cases. I'm concerned that it will not be able to deal with dynamic library loading nicely. While the opaque registry could be updated in-place when a library is loaded, the consumers of that registry will need some notification of this. Requiring manual intervention every time a library is loaded (including on other threads!) would be difficult and error-prone. Automatic intervention would come in the form of a hook function, at which point exposing the registry object becomes unnecessary.

My proposal handles the use cases we've seen just as well. It omits the opaque registry object and instead relies on the consumer to bring their own data structure for storage. It is able to support consumers which do not want dynamic library loading as well as those that do. By only requiring a single definition in terms of a hook function, consumers don't need to worry about how dynamic library loading will work: if their hook function is well-defined, then they have a mutable static data structure they are adding to, and Rust's type system will already ensure that it is used safely.

There are also additional optimizations that could be used with my proposal. Since every item is only processed by the hook function once, the compiler may be able to const-eval the execution of the hook function on items defined in the current crate / linking unit (LLVM already does this for some constructor-attributed functions in C code which modify a global; I don't know how it would work in Rust, but I think it's possible). In the case of distributed_slice_const, this const-eval is guaranteed, which may allow code referencing the data to be inlined and optimized further.

In terms of actually implementing this, we can start with distributed_slice_const. It's exactly as powerful as the existing solutions and can be initially implemented on top of them as a gated feature in std. As support machinery is added to the compiler, the implementation of that feature in std can be gradually moved over to compiler magic, without the API changing at all. distributed_slice is harder to implement as it interacts with dynamic library loading directly, and would require some integration with libloading.

There's still a lot to bikeshed about the API. I don't know how a distributed slice would be publicly declared or how it should appear in documentation. There are still some questions to answer about multi-threading and generics. But I believe that we need to better understand how dynamic library loading is going to look before experimenting with this feature. I'd love to hear others' suggestions on this!

@jdonszelmann
Copy link

I strongly feel like we should not introduce anything related to global constructors for this, and that we should keep the design as simple as possible for now while being just flexible enough that we can change the underlying implementation any time we want. Distributed slice also is not the only feature that might theoretically want updates of dynamic library loads (panic handler, allocator, Mara's RFC). For now, I propose just to experiment with a compiler generated slices + iterator and see if that design even works so we can learn something new. As long as the feature is not stabilized, we can always argue about alternative implementations.

@davidbarsky
Copy link

davidbarsky commented May 15, 2024

There's a lot of discussion here, especially with what libraries could benefit from a #[distributed_slice], so I'd like to throw tracing into the ring: I wanted something like #[distributed_slice] for ages. There are two reasons:

  1. People tend to write some decently complicated filters, but it's not really possible/feasible to know ahead of time whether a filter is valid or if a set of spans/events exist in a given module/crate, so most people resort to trial-and-error and wondering why some events/spans aren't showing up. Being able to list out to list out all the events and spans known to the binary in fn main() (or heck, even write some "did you mean..."-esque functionality) would be a tremendous improvement to tracing.
  2. We'd be able to make tracing faster for single-shot binaries like rustc or other CLIs because we currently need to evaluate each span/event the first time we see it. There's some tricks we employ to make it faster (caching level filters, for one), but being able to make filtering even more static would probably yield some additional, single-percentage wins for tracing in those cases.

@bjorn3
Copy link
Member

bjorn3 commented May 15, 2024

For the purpose of rustc, using #[distributed_slice] in tracing would require dylib support as rustc itself puts almost all of it's code in a dylib.

@joshtriplett
Copy link
Member

@bjorn3 Might still work to have one slice for the whole dylib, and a separate slice for things that aren't in the dylib.

@alice-i-cecile
Copy link

Chiming in from Bevy: a mechanism like this that worked across platforms would be incredibly useful for making reflection way more user-friendly and open up further improvements in the ECS itself. Manually registering types for reflection is tedious and error-prone currently.

@jdonszelmann
Copy link

@alecmocatta
Copy link

Sorry if I've missed this approach being mentioned, but I'd consider rust-lang/rust#66113 to be prior art worth considering.

Briefly the ideas is providing an iterator over vtables in the binary.

e.g. std::ptr::vtables::<dyn Test>() would be an iterator over all vtables for types upcast to dyn Test. Thus:

#[test]
fn my_test() { ... }

can be transformed to roughly:

#[used]
static _: &dyn Test = {
    struct my_test;
    impl Test for my_test {
        fn name(&self) -> &'static str { "my_test" }
        fn run(&self) { ... }
    }
    &my_test
};

and execution is roughly:

use std::ptr;
fn main() {
    for vtable in ptr::vtables::<dyn Test>() {
        let test: &dyn Test = ptr::from_raw_parts(NonNull::dangling(), vtable).as_ref_unchecked();
        println!("running {}", test.name());
        test.run();
    }
}

This notably also potentially resolves rust-lang/rfcs#668 and rust-lang/rfcs#1022 which were indeed the initial motivation. I'm not abreast of how this interacts with the pointer metadata apis rust-lang/rust#81513.

@bjorn3
Copy link
Member

bjorn3 commented Nov 6, 2024

I think an intrinsic to get all vtables is a bad idea. Vtables get duplicated into each cgu that uses them and the intrinsic likely would prevent unused vtables from getting optimized away. And if it didn't, it would have pretty unreliable optimization dependent behavior.

@alecmocatta
Copy link

alecmocatta commented Nov 6, 2024

Vtables get duplicated into each cgu that uses them and the intrinsic likely would prevent unused vtables from getting optimized away.

Yes this may be an issue, but note per my testing at the time it was only a cost if it was used:

This array increases the size of libraries by a small amount; for example build/x86_64-apple-darwin/stage2 by ~1.3MiB i.e. ~0.2%. Thanks to --gc-sections on Linux and /OPT:REF on Windows, it's removed entirely from binaries that don't use it on those platforms. macOS's -dead_strip works a little differently, and the best solution I've found adds 16 bytes per used vtable to the resulting binary. This increases the size of binaries by a small amount: hello world grows by 76 bytes i.e. ~0.03%. With a bit more work I think this could probably be made zero-cost similar to Linux and Windows. Android and iOS behave the same as Linux and macOS, and on other platforms this PR is a no-op.


And if it didn't, it would have pretty unreliable optimization dependent behavior.

In that PR I also included TypeIds for that reason, ensuring they can be correctly deduped at runtime with the same strong guarantees that TypeId has.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-needs-team-input Status: Needs input from team on whether/how to proceed.
Projects
None yet
Development

No branches or pull requests

9 participants