Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calls and execution model #391

Open
josevalim opened this issue Jan 6, 2023 · 10 comments
Open

Calls and execution model #391

josevalim opened this issue Jan 6, 2023 · 10 comments

Comments

@josevalim
Copy link

Hi @tessi! Great project!

My initial suggestion was around the call_exported_function. At the moment, you are passing from and returning to the caller, but I would consider accepting two arguments instead pid and a opaque term called tag. And you will reply like this:

send(pid, {tag, result})

This way, you could do this inside the GenServer:

call_exported_function(self(), {:exported_return, from})

And still match on {{:exported_return, from}, result} in handle_info. But by breaking it apart, then you could also allow someone to call directly into the instance and get the reply directly, without going to the GenServer.

In fact, you could keep the same API as today, because from is already a {pid, tag} format, and reply from the NIF as:

send(elem(from, 0), {elem(from, 1), result})

And that means you would skip sending the result to the GenServer.


However, as I was thinking about this, I started wondering what happens on concurrent calls in an instance. Because calls are non-blocking, today it is possible to call the same instance more than once. What happens on concurrent calls? What happens if those calls read from stdin or write to stdout? Do they use the same buffer or is it per call?

Thanks! <3

@tessi
Copy link
Owner

tessi commented Jan 6, 2023

Very interesting thought @josevalim ! I like your idea of explicitly passing the answer-to pid down instead of just assuming self() to allow any Erlang process to receive the answer of a WASM function call. 💜 I want to make this happen, let's figure out how :)

Let me think out loud: Currently we send a hardcoded

{:returned_function_call, {:ok, return_values}, from}

tuple back to self() after a successful WASM function call. In fact, we don't touch the from arg at all and just forward it - it could be any term and we probably should rename it to something like forward_term to indicate it being a term forwarded untouched to the receiver. We could even make the :returned_function_call atom configurable.

We could change the signature like this

- call_exported_function(store_or_caller, instance, name, params, from)
+ call_exported_function(store_or_caller, instance, name, params, {answer_to_pid, tag, forward_term})

So, in my Wasmex GenServer I'd call call_exported_function(store_or_caller, instance, name, params, {self(), :returned_function_call, from) but you could replace all the things to your liking -- sending a configurable message (with a fixed format of {tag, wasm_result, forward_term}) to any process.

What do you think?

In fact, you could keep the same API as today, because from is already a {pid, tag} format, and reply from the NIF as:

send(elem(from, 0), {elem(from, 1), result})

True - I think! To be honest, I don't know how Genserver.reply() (or :gen.reply) works internally and if it's doing the same send you outlined. Probably it does - my Erlang reading skills aren't great 🤔
Even if it does, I feel the API outlined above is a little more explicit and discoverable compared to this being a hidden feature by manipulating from to send answers to a different pid. Or am I missing something?


However, as I was thinking about this, I started wondering what happens on concurrent calls in an instance. Because calls are non-blocking, today it is possible to call the same instance more than once. What happens on concurrent calls? What happens if those calls read from stdin or write to stdout? Do they use the same buffer or is it per call?

Good question! Your understanding is correct, you could call an instance more than once while it still processes old calls. Concurrent calls would access the same IO buffers and, maybe worse, the same memory. This is why the Wasmex Genserver blocks and waits for WASM to return. It's a safe API for general use only because we block.

To my understanding WASM operates under the assumption of single threads. It lacks multi-threading features (like atomics, to implement mutexes). I haven't tried, but I imagine one could safely call into WASM functions concurrently if they don't access shared state (e.g. they are purely functional). That's a pretty strict limitation, though, with a single shared memory and shared IO.

There is the WASM threads proposal to improve this situation. With our recent (still unreleased) work to expose wasmtimes compilation flags, it would be easy to provide a flag to enable wasmtimes implementation of the threads proposal. It would be the WASM binaries job to implement their functions thread safe. Unfortunately the thread proposal isn't fully implemented in wasmtime yet - they say:

Wasmtime does not implement everything for the wasm threads spec at this time, so bugs, panics, and possibly segfaults should be expected. This should not be enabled in a production setting right now.

@josevalim
Copy link
Author

it could be any term and we probably should rename it to something like forward_term to indicate it being a term forwarded untouched to the receiver. We could even make the :returned_function_call atom configurable.

That's the trick: we don't need the atom. If we allow self() and tag, then tag can be anything, including that atom. :) So if you want a more explicit API, I would go with:

call_exported_function(store_or_caller, instance, name, params, answer_to_pid, tag)

from in a GenServer is always {pid, tag}.

However, if the instance needs to be wrapped in a GenServer, then we cannot skip the GenServer. I think think it is still worth a change for a cleaner API, but not many benefits beyond that.


This is why the Wasmex Genserver blocks and waits for WASM to return. It's a safe API for general use only because we block.

I think we are blocking the client, but not the server. Therefore the current implementation would still allow multiple invocations. You would need a :queue as part of the GenServer state. Once a request comes in, if it is busy, you put in the queue. Once the call finishes, you dequeue. I can send a PR later (or I will be glad to review one!).


Oh, let me ask another question: I was wondering if it would be possible for someone to call a Phoenix.PubSub function directly from wasm. :) Can we pass strings around?

@tessi
Copy link
Owner

tessi commented Jan 7, 2023

Awesome! I like the proposed call_exported_function API.


I think we are blocking the client, but not the server.

Ha! [insert homer simpson d'oh! sound here]. Thanks, and I would love to review a PR implementing the proposed queue.

Actually, for both changes, I'm also happy to implement them. It really comes down to how much time you want to spend and if it's fun to you :) I don't want to waste your precious time - it's my bug after all.


I was wondering if it would be possible for someone to call a Phoenix.PubSub function directly from wasm. :)

I like where this goes! :) You totally can using imports! Let me try to write a rough outline:

bytes = File.read!("pubsub.wasm")
imports = %{env: %{
  "send_pubsub" => {:fn, [], [], fn _context -> elixir_fn_sending_things() end}
}}
{:ok, pid} = Wasmex.start_link(%{bytes: bytes, imports})
Wasmex.call_function(pid, "_start", [])

See https://hexdocs.pm/wasmex/Wasmex.Instance.html#new/3-example for the docs.

The WASM binary could call that imported function env.send_pubsub - which yields execution to the given elixir fn. That fn could do anything, including reading strings from WASM memory to send it around.

I'm happy to help with further examples or links to wasmex tests, collaborate on your test code, or have a call or whatever helps :)

Let me note that yielding execution from Rust to Elixir isn't the most performant thing on earth. Rust needs to send an Erlang message to the Wasmex GenServer, which looks up the fn, executes it and calls back into Rust so it can continue WASM execution. But there is a much nicer and much more performant future ahead :)

One of my long term goals with Wasmex is to provide helper functions written in Rust for Elixir interop. I want Elixir and WASM to work together as nicely as e.g. JS and WASM works. These helper functions would be provided as rust-based imported functions similar to how we have all WASI function written in rust being provided to WASM with a simple wasi: true when starting the Wasmex GenServer.

These rust elixir helper functions could do things like sending messages to pids (and much more, like creating elixir types, or serializing/deserializing elixir types from/to WASM). I'm currently building the fundamentals to that vision with Wasmex, but could give it an early start if you have a wish for such a helper function :)


Now I have a question :) What's your expectation on implementation speed on these things? Depending on stress at work and life I might not be the fastest implementor. But I'm happy to open the repo up to more collaborators or just do it myself at my own pace.

Don't get me wrong, I'm super hyped. Just want to get and set expectations.

@josevalim
Copy link
Author

Thanks, and I would love to review a PR implementing the proposed queue.

I did this 100x already. So it will be quick to draft something!

Now I have a question :) What's your expectation on implementation speed on these things?

No speed. I am just curious about exposing the Erlang VM and the Phoenix ecosystem to WASM developers, so they can leverage things like Distributed PubSub and Presence.

One of the things that would be nice is to make WASM preemptive, which I believe we can do with fuel. This way we could fully have WASM actors. However, this brings another question about the WASM implementation. I believe each call runs in a separate Rust thread. Is that a OS thread or a green thread? I think we can have three options here:

  1. Separate Rust thread
  2. Inside dirty CPU NIF threads (done by the VM)
  3. Inside the VM process (must be preemptive)

Something else to consider is if you can control how WASM allocates memory. If you can use enif_alloc, then the memory allocation is visible to the BEAM.

Finally, I am wondering if it is worth introducing the concept of ImportMap to Wasmer, so you can define the imports only once and have them pre-allocated in Rust. This way you don't have to allocate on each new instance.

Sorry if this is a lot but there is absolutely no rush. I am just excited about the model.

@tessi
Copy link
Owner

tessi commented Jan 7, 2023

I did this 100x already. So it will be quick to draft something!

💜

One of the things that would be nice is to make WASM preemptive, which I believe we can do with fuel.

correct! let me cite the relevant wasmtime doc

When a [Store] is configured to consume fuel [...] this method will configure what happens when fuel runs out. Specifically executing WebAssembly will be suspended and control will be yielded back to the caller. This is only suitable with use of a store associated with an async config because only then are futures used and yields are possible.

We don't run on an async config yet - to make that happen we must switch to async rust which might be a mid-sized rewrite, but sounds like a good next step.

But fuel consumption isn't the most efficient way to periodically yield to the host, because measuring fuel consumption introduces quite a runtime penalty. Instead, wasmtime offers epoch based interruption (docs here and here).
This would be the more performant approach to preemtive scheduling. But, again, requires an async rust runtime.

I believe each call runs in a separate Rust thread. Is that a OS thread or a green thread?

correct! currently it is a OS thread. Do you prefer another approach? an OS thread is "hidden" from Erlang instrumentation, where an Erlang-managed thread might be easier to debug if things go wrong 🤔
But then, if we go the async-rust way, we might not have a choice but taking OS threads :)

Something else to consider is if you can control how WASM allocates memory. If you can use enif_alloc, then the memory allocation is visible to the BEAM.

I believe I could change how (not how much - that already works using Wasmex.StoreLimits.memory_size) wasmtime creates memory by having a custom MemoryCreator, but I need to do more research here. Good point!

Finally, I am wondering if it is worth introducing the concept of ImportMap to Wasmer, so you can define the imports only once and have them pre-allocated in Rust. This way you don't have to allocate on each new instance.

I think this could be a nice and reasonable optimization. I haven't done much benchmarking and performance optimizations yet, so I bet there are many more such things. We should probably start by having benchmarks first, to be able to see the impact of such change.


I really love all your suggestions, questions, and ideas. I'll go through this discussion in a quiet minute and create tickets out of it.

@josevalim
Copy link
Author

correct! currently it is a OS thread. Do you prefer another approach? an OS thread is "hidden" from Erlang instrumentation, where an Erlang-managed thread might be easier to debug if things go wrong 🤔 But then, if we go the async-rust way, we might not have a choice but taking OS threads :)

Ok, so to recap:

  1. Separate Rust thread - done today. My biggest concern here is resource leakage and thread contention: what happens if we start 10k instances? That's why there are typically thread pools, which we would need to start manage (perhaps going back to async Rust anyway). Also, what happens if the process terminate and the thread/instance is running?

  2. Inside dirty CPU NIF threads (done by the VM) - we could move to this approach as long as we make "call exported function" blocking and it yields only on callbacks. So instead of async, as is, we would do something like: start running the exported function and yield control back to me only when you are done or on an import/callback. How doable is this?

  3. Inside the VM process (must be preemptive) - requires async Rust runtime, which requires more work

I am loving the back and forth too! <3

@tessi
Copy link
Owner

tessi commented Jan 7, 2023

  1. As you outlined, this approach is not scalable to many calls/threads. Async rust (depending on the async runtime) solved that with thread pools.

Also, what happens if the process terminate and the thread/instance is running?

The thread would eventually terminate and (as normal) attempt to send the Erlang message with its return value. I believe enif_send would just return false because the receiving process is not alive. Since we ignore the return value, no harm should be done.

  1. Great idea, I'd need to do more research on the yielding part. This was my very first attempt (years ago), but I remember I switched to option 1. back then because that was somehow difficult with imported functions and yielding. I just can't remember the details anymore 👴

  2. feels like the "correct" approach to me right now

@josevalim
Copy link
Author

Alright! So we should hold with the ideas in this PR (from and using :queue) because they may no longer be relevant depending on the execution model. But let me know and I can submit a proof of concept any time.

@tessi
Copy link
Owner

tessi commented Jan 8, 2023

Alright, I gave approaches 2. and 3. some thinking and both are doable with different tradeoffs. I'm very interested in your experience @josevalim on what is better - do you see pros/cons I missed? Which would you think scales better to ~10k concurrent calls?


Separate Erlang processes for blocking NIF calls

Every potentially blocking call (Module compilation, WASM function calls) run in their own Erlang process. They call into a dirty NIF function, blocking the Erlang process. To be able to call "imported functions" (Elixir implemented functions that WASM can call into) each WasmCall process comes with a "CallHelper" process which waits for :call_imported_function messages send from Rust.

Pros:

  • we can remove any multi-threading/async code from Rust, moving the complexity to Elixir
  • threads are visible from Elixir/Erlang tooling

Cons:

  • Erlang operations that need to communicate with the dirty NIF processes need to wait till the NIF returns, see https://www.erlang.org/doc/man/erl_nif.html#dirty_nifs
  • Without async Rust, we cannot use some wasmtime features (e.g. custom handlers for epoch based interruption or out_of_fuel behaviour)
  • we need to make sure all processes run on the same node, because they depend on c-allocated memory (the wasmtime structures for modules, memory, signals, ...)
  • requires rewriting both, our Elixir and Rust, codebases

wasmex-road-to-10k-calls_elixir

Async Rust

NIF calls never block because we handle any incoming call asynchronously in Rust. Requires our Rust code to "swallow the complexity"

Pros:

  • Elixir code is simple, because complexity is hidden in NIFs
  • we gain wasmtime features like custom epoch-based interruptions or out-of-fuel behaviour
  • no more dirty NIFs
  • the Elixir side is very close to what we have today

Cons:

  • threads are managed in Rust, not visible for Elixir/Erlang tooling
  • asnyc Rust NIFs are off the beaten path for rustler, see: Dealing with Async rust behavior rusterlium/rustler#409
  • when running 10k concurrent calls running "imported functions" on the main Wasmex Genserver might be a bottleneck - we might need to borrow the idea of "Helper processes" from above

wasmex-road-to-10k-calls-rust

@josevalim
Copy link
Author

Thanks for the analysis. Honestly, at the moment, I can't say one approach is better than the other. But I do have one question: is there any documentation of what is needed for the integration between wasmtime and fuel/epochs? Do they fully base themselves on the async Rust API? I am asking because we could perhaps make an async Rust backend based on enif_schedule_nif. I agree with @hansihe's comment on Rustler that enif_schedule_nif is not generally useful as Async Rust (because we really are not going any relevant IO work, for example), but it may be a perfect fit for preemptive scheduling in wasmtime?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants