Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't pass the state and current process as arguments #617

Closed
yorickpeterse opened this issue Oct 16, 2023 · 6 comments
Closed

Don't pass the state and current process as arguments #617

yorickpeterse opened this issue Oct 16, 2023 · 6 comments
Assignees
Labels
compiler Changes related to the compiler performance Changes related to improving performance
Milestone

Comments

@yorickpeterse
Copy link
Collaborator

For every compiled method, the first two arguments are the runtime state and the current process. This means that fn foo(a: Int) translates to essentially fn foo(state: Pointer[UInt8], process: Pointer[UInt8], a: Int).

This approach isn't great, as we're wasting up to two registers to pass this data around, and in many cases the data likely isn't used much.

To optimize this, I'm thinking of the following:

  • The state is the same for all methods and processes, so we can generate a global variable and store it in there. The runtime functions still take an explicit state argument, such that it doesn't need to depend on the global variable generated by the compiler.
  • The process could be stored as the last value in the stack (that we grow towards), and the stack range adjusted to not allocate into that data. This way we can obtain the process easily. I'm not sure though how feasible/cross-platform this is.
@yorickpeterse yorickpeterse added performance Changes related to improving performance compiler Changes related to the compiler labels Oct 16, 2023
@yorickpeterse
Copy link
Collaborator Author

Using external thread-local variables in Rust requires nightly, and probably will continue to require this for a long time: rust-lang/rust#29594

@yorickpeterse
Copy link
Collaborator Author

A tricky thing about using the stack is that LLVM doesn't seem to provide any intrinsics for obtaining any kind of stack information. This means we'd have to use raw assembly somehow to get the data from the stack.

@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Feb 17, 2024

For thread-local code, the following Rust code compiles to the same as regular/raw thread-locals:

thread_local! {
  static PTR1: Cell<*mut ()> = const { Cell::new(std::ptr::null_mut()) };
}

This can be seen at https://rust.godbolt.org/z/v16va86aq.

The problem is that I'm not sure if this is true for every platform. Some additional details are found at https://matklad.github.io/2020/10/03/fast-thread-locals-in-rust.html.

@yorickpeterse
Copy link
Collaborator Author

A quick dive through the current code reveals we don't use the current process value in all that many places, mostly to pass it as an implicit argument to methods. The few runtime routines that require it could instead just use a thread-local variable kept entirely on the runtime side of things.

The only instruction that really needs it is the Preempt instruction as it checks the process-local epoch against the global epoch. We could probably make that epoch counter a thread-local variable as well, as we only write to it when resuming the process. This would probably also reduce the process size a little bit.

@yorickpeterse
Copy link
Collaborator Author

It seems that when one uses #[no_mangle] in the thread_local! macro, mangling is still applied to the constant. This can be seen in https://rust.godbolt.org/z/qWzaq8qze where PTR1 is mangled as example::PTR1::__getit::VAL.0 but PTR2 is just PTR2.

@yorickpeterse
Copy link
Collaborator Author

Looking at the assembly, it also seems Rust uses LLVM's localdynamic for the thread_local! variable, while using generaldynamic for the #[thread_local] version.

yorickpeterse added a commit that referenced this issue Feb 20, 2024
The compiler generated code no longer passes the runtime state and the
current process as hidden arguments. Instead, the state is stored in a
global variable when the program starts up. For the current process we
change the stack layout to the following:

    ╭───────────────────╮
    │    Private page   │
    ├───────────────────┤
    │     Guard page    │
    ├───────────────────┤
    │     Stack data    │ ↑ Stack grows towards the guard
    ╰───────────────────╯

The private page stores extra data, such as a pointer to the process
that owns the stack and the epoch at which it started running.

This entire chunk of data is then aligned to its size. This makes it
possible to get a pointer to the private data page by applying a bitmask
to the stack pointer. The bitmask depends on the stack size, which is
runtime configurable and depends on the page size, and is loaded into a
global variable at startup.

This entire approach removes the need for more expensive thread-local
operations, which we can't use anyway due to Rust's "thread_local"
attribute not being stable (and likely not becoming stable for another
few years).

This fixes #617.

Changelog: changed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler Changes related to the compiler performance Changes related to improving performance
Projects
None yet
Development

No branches or pull requests

1 participant