Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow running the smoldot wasm node from multiple threads #494

Closed
wants to merge 51 commits into from

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented May 1, 2023

This substantial PR refactors the execution model of the wasm-node in order to allow executing it from multiple different threads at the same time.

cc #91

In terms of API, it is rather simple: a new Client.createBackgroundRunnable function has been added, which return an opaque object. The user can then send this object to a worker (doing so is entirely their responsibility), then do import { run } from 'smoldot/worker', then call run(object). This runs the CPU-heavy tasks of smoldot from within that worker.
Note that there's no obligation to actually call run from within a worker. run doesn't assume that it is running in a worker, instead it simply doesn't assume that it can access the networking.

Unfortunately, since we're using a SharedArrayBuffer, this comes with some restrictions: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer#security_requirements
It is unclear to me how this actually interacts with content scripts of browser extensions.

This branch has been tested on both NodeJS and the browser, and overall works surprisingly flawlessly (provided we replace rand::random() with constants, as explained below).

Implementation

It is implemented pretty much as described in #91:

  • There are now two tasks queues (instead of just one): one for the networking tasks, and one for the non-networking tasks.
  • Networking functions can only be called from within a networking task.
  • The plan is for tasks to be "non-networking" by default then switch to "networking" as soon as they call a networking function, but I haven't found a good way to do this, and we might have to explicitly pass a flag from within the Platform trait or something similar.
  • The execution of tasks is now done explicitly by the JS by calling an advance_execution function in a loop, rather than automatically whenever any function is called by the JS. advance_execution takes as input two flags indicating whether to run networking tasks and whether to run non-networking tasks. When in a worker, we run only non-networking tasks. When outside of a worker, we run networking tasks and also maybe non-networking tasks depending on whether a worker exists.
  • json_rpc_responses_peek has been refactored to use Atomics.wait in order to detect when a response arrives. This makes it possible to notify of a response from within a background thread.
  • The "current task" system has been removed.
  • The CPU rate limiting is now done entirely on the JavaScript side.
  • The "periodic yielding" system has been entirely removed.

Overall, this refactoring considerably simplifies the general flow of the code, especially thanks to the use of Atomics.waitAsync which is unfortunately not stable yet on Firefox.

This branch has the following caveats:

  • It requires some modifications to the source code of the Rust compiler to compile (see #91).
  • Because thread-local storage isn't implemented in the Rust compiler for wasi, I had to replace some uses of rand::random() (which uses thread_rng) with hardcoded values in order to make it work. This could however realistically be fixed by manually using a PRNG and seeding manually.
  • I don't know how to handle logging. Tasks running in the worker generate logs, and these logs are currently not displayed. I suppose that in the end these logs should be sent to the main thread rather than directly being passed to a callback.
  • It's not really clear to me how to detect when a task is a networking task and when it isn't, but I think this should be feasible with some thread-local storage magic. Since that magic would remain internal to the wasm-node, it would be acceptable. Since TLS don't work, I can't actually experiment here.
  • wasm-opt fails to parse the generated Wasm. It's unclear whether it's because it doesn't support atomics, or shared memory, or something like.
  • #![feature(stdsimd, simd_wasm64)] needs to be added due to x86/x86_64 intrinsics seem to be marked stable, but require feature flags rust-lang/rust#98253. It's unclear whether this is a Rust bug, or if the notify functions aren't stable.
  • We use Atomics.waitAsync, which is unfortunately not stable yet on Firefox. Using a replacement, while technically possible, is likely extremely hard.
  • I don't know how to handle panics properly, other than assume that they won't happen. We will likely have to switch to panic=unwind.

It is however overall very encouraging.

Even if in the end the two CORS headers end up being too annoying and this feature ends up being unused, I would still be in favor of merging most of the changes in this branch, as it considerably cleans up everything.

@tomaka
Copy link
Contributor Author

tomaka commented May 2, 2023

and overall works surprisingly flawlessly (provided we replace rand::random() with constants, as explained below).

I was actually completely mistaken. The worker immediately panics, after which the main thread takes the relay, which led me to believe that everything worked. But actually no, things are very racy.

This is because none of the per-thread "magic variables" that the libc and Rust compiler use are initialized properly, so the threads write on each other's stacks.

It turns out that their plan is to add a new host function named wasi_thread_spawn. The host must in response create a thread and call wasi_thread_start on the new instance. See https://github.com/WebAssembly/wasi-threads. The wasi+threads libc is assuming that this is done, and since we don't do it things don't work. In principle, I could make this work, but the Wasi version in the Rust compiler isn't aware of pthread_create, and I don't want to continue going down the rabbit hole.

@tomaka
Copy link
Contributor Author

tomaka commented May 5, 2023

Closing due to too many conflicts. See #91 (comment)

@tomaka tomaka closed this May 5, 2023
@tomaka tomaka deleted the wasm-threads branch May 5, 2023 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant