-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add light client platform WASM compatible #1026
Conversation
@@ -0,0 +1,161 @@ | |||
// Copyright 2019-2023 Parity Technologies (UK) Ltd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already asked this question to the Cumulus team, and I'm repeating it here: why rewrite this implementation of Platform
?
Smoldot provides an implementation, and #965 uses it. Why decide to copy-paste its code here (without caring for the licensing (which is incompatible with Parity's), but let's forget that) when there's already a fully working and maintained version provided by smoldot itself?
To me you (and Cumulus) are really shooting yourself in the foot by doing this, and I just don't understand it.
We all know that this code will rot over time. You have the choice between the simple option that will work forever, and the complicated option that will likely require effort to maintain, and you go for the complicated one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I've implemented the low level Platform
here to expose a pure-rust based implementation for wasm. While at it, I've also ported the platform for native (using tokio sockets) to allow us to reuse the same code for both platforms for better coverage. The differences are mainly in the connect
function which uses a custom websocket with minimal dependencies that also implements AsyncRead
/ AsyncWrite
.
Yep, I also felt the frustration with the wasm-bindgen
, fortunately, we are using it for the custom socket. The socket is an oversimplification of gloo-net (this is what we use by default with jsonrpsee for wasm environments).
I'm not quite sure if we could have a pure-rust based implementation without some types of bindings for the websocket 🤔 I would be up for transitioning to something different
I also want to point out that the code in this PR doesn't support WebRTC.
Yep, we'd need to maintain the platform in the meanwhile if we pick this road in the end and look for alternatives of webrtc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure if we could have a pure-rust based implementation without some types of bindings for the websocket
You can't. All that wasm-bindgen
does is generate these bindings automatically, but they are always bindings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pierre's platform works perfectly for native environments and there's also an alternative for WASM.
To make it fully compatible with our WASM needs the functions from lightclient/src/platform/wasm.rs are added to this PR.
For WASM we need: spawn
, now_from_unix_epoch
, now
, sleep
, and most importantly connect
together with a few types, while for native we (including Cumulus) need just the connect
to use a different socket.
Maybe we could leverage 90% of the smoldot Platform by having an extension trait.
The parts of the Platform
that handles buffers are common between environments (lightclient/src/platform/default.rs
), and the trickiest to implement right. Then we would need to implement just the connect for native and avoid duplicating the code here and in Cumulus.
// Extension trait to reuse the common parts of the Platform that handles buffers internally
trait PlatformRefBuffered {
type Instant;
type Delay;
fn spawn() ..
fn now() ..
}
impl<T: PlatformRefBuffered> Platform for T {
...
}
Smoldot Wasm Node
There's a crate in smoldot that implements this.
The crate requires the javascript realm to expose these bindings.
#[link(wasm_import_module = "smoldot")]
extern "C" {
pub fn buffer_copy(buffer_index: u32, target_pointer: u32);
...
These are expected to be placed under the linking module "smoldot".
Before manually implementing the Platform
here I also tried to create these bindings in rust, then reuse the wasm platform from this crate.
However, I was unable to expose any external functions under the smoldot
module.
The WAT binary would expect the bindings as:
(import "smoldot" "buffer_copy" (func $_ZN18smoldot_light_wasm8bindings11buffer_copy17hf21d3e4eee7463c1E (type 6)))
(import "smoldot" "advance_execution_ready" (func $_ZN18smoldot_light_wasm8bindings23advance_execution_ready17h100f38f424c0220dE (type 0)))
(import "smoldot" "connection_new" (func $_ZN18smoldot_light_wasm8bindings14connection_new17hab2362b6b1ccdf39E (type 28)))
While regardless of where I tried to implement the functions they would be exported from rust as
(export "buffer_copy" (func $buffer_copy))
(export "buffer_size" (func $buffer_size))
(export "json_rpc_responses_non_empty" (func $json_rpc_responses_non_empty))
We could overcome this eventually by modifying smoldot and removing #[link(wasm_import_module = "smoldot")]
.
However, all the binding functions are designed to work within the JavaScript world.
For example, the buffer_copy
function takes as parameter buffer_index: u32
which would mean we'd also need to implement in Rust a way to map a buffer's memory region to an index.
There's an example in timers.rs from the wasm-node, which uses some static variables to handle the mappings.
Just for reference, here is the JavaScript counterpart of the wasm platform.
An alternative to this would be writing the JavaScript code then binding to it in rust. Or trying to bind directly to the JavaScript code from wasm-platform.
That felt a bit more involved than having our slim socket bindings and using that common parts of the platform and I would be up for adjusting/modifying this to make things better for us 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parts of the Platform that handles buffers are common between environments (lightclient/src/platform/default.rs), and the trickiest to implement right
What I'm suggesting is to simply de-duplicate the code.
Sure the code is the same as in smoldot, but the choice here is between:
- Copy-pasting part of smoldot's
Platform
implementation into subxt and use it for both native and wasm sockets. - Copy-pasting part of smoldot's
Platform
implementation into subxt and use only for wasm sockets.
I don't really see how solution 1 (which you're going for) is in any way better than solution 2.
Before manually implementing the Platform here I also tried to create these bindings in rust,
That's not possible and I don't think it's a good approach nonetheless.
My personal intuition about the way forward is that I should try improve the Platform
trait to be easier to implement, and/or maybe allow easily reusing smoldot's JavaScript.
But I can't really guarantee in any way that I'll succeed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That totally makes sense! 🙏
Indeed removing the native Platform from here and keeping only the WASM part will simplify the code for us.
The main advantage here would be that most users will be using a fully tested Platform provided by smoldot that we can easily update.
I was considering the native Platform from this PR to lay the foundation for testing and give us a bit more confidence, but I think we could overcome that if we get manage to get a bit more WASM testing.
And since we'll release this under an experimental/unstable feature flag in the beginning, I think we are fine for this PR.
My personal intuition about the way forward is that I should try improve the Platform trait to be easier to implement, and/or maybe allow easily reusing smoldot's JavaScript.
Thanks! That would help us indeed! Just a crazy idea, maybe initially having a small platform extension for native only would also simplify the work from Cumulus side. I think Cumulus just needs a way to change the socket from fn connect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we get manage to get a bit more WASM testing
IMO you need WASM-specific testing anyway, because browser APIs have many weird corners. I don't think it's a good idea to assume that things "just work" if you just plug APIs together.
For example, browsers famously throw an exception if you try to connect to a non-secure WebSocket from an HTTPS website. I don't know how you can handle that properly with wasm-bindgen
but it should be handled.
Thanks! That would help us indeed! Just a crazy idea, maybe initially having a small platform extension for native only would also simplify the work from Cumulus side. I think Cumulus just needs a way to change the socket from fn connect
I'm planning to rework the connect
function quite a lot, for example to pass an already-parsed multiaddr, and to let smoldot do the WebSocket implementation on top of TCP (in other words the native implementation would just need to support TCP). But again these are just ideas and there are many more things on my to-do list that have priority.
While it's not my problem, I just want to warn you about Smoldot notably is capable of running itself in a worker thread (which is important when you have CPU-heavy operations), throttling itself to respect a CPU rate limit, compressing the .wasm file for faster loading, and runs perfectly well on all platforms and bundlers, which are things that are complicated (to say the least) with wasm-bindgen. Maybe the situation has improved nowadays, but I highly doubt it. I also want to point out that the code in this PR doesn't support WebRTC. As soon as WebRTC lands in Substrate (which, granted, is more than 6 months late already) and is working reasonably well, we want to deprecate the hell out of WebSockets. |
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
83af93c
to
2e660bb
Compare
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…support_with_wasm
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…modularity Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to have this as a separate crate!
lightclient/src/platform/default.rs
Outdated
'static, | ||
Result<PlatformConnection<Self::Stream, Self::Connection>, ConnectError>, | ||
>; | ||
type StreamUpdateFuture<'a> = future::BoxFuture<'a, ()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh GATs!
lightclient/src/platform/default.rs
Outdated
|
||
fn update_stream<'a>(&self, stream: &'a mut Self::Stream) -> Self::StreamUpdateFuture<'a> { | ||
Box::pin(future::poll_fn(|cx| { | ||
let Some((read_buffer, write_buffer)) = stream.buffers.as_mut() else { return Poll::Pending }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poll::Pending
here would get stuck forever (nothing would ever wake it up again except by chance); does it make more sense to return an error of some kind? Basically, when would None
be returned here and what does it mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the Poll::Pending
here will never wake up, since we don't store the context anywhere.
The stream's buffers can only be empty if the connection is in the Reset
state. The fn connect(&self, multiaddr: &str) -> Self::ConnectFuture
method is called first to establish the connection that initializes the buffers here.
Looking a bit at smoldot, my understanding is:
- A task is spawned to handle incoming connection attempts (here)
- The task checks in a loop that progress can be made on the connect; and updates some internal state machine
- The loop is ended when it cannot send a message back to the background task; or when the socket is closed
- The last step of that loop waits for one of the following to happen
- stream update (this future) to complete
- receive a message from the coordinator (some internal detail of smoldot)
- a timeout to happen
I believe that we drop this future after some period of time when we return Poll::Pending
here.
The smoldot's type StreamUpdateFuture<'a>: Future<Output = ()> + Unpin + Send + 'a;
forces us to return ()
and considering the above we might introduce a busy wait loop if we could return Poll::Ready(Err()))
// cc @tomaka would love to hear your thoughts on this 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to read smoldot's source code, you just have to read the documentation of the function you're implementing: https://docs.rs/smoldot-light/latest/smoldot_light/platform/trait.PlatformRef.html#tymethod.update_stream
It says:
Returns a future that becomes ready when “something” in the state has changed.
Once a connection is in the reset state, it state never changes, and thus returning an infinite future is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why the connections-related functions in the Platform
trait are so different from AsyncRead
and AsyncWrite
is that AsyncRead
and AsyncWrite
are full of weird corner cases.
What errors can actually be returned and what do they represent? For example, can you suddely get a PermissionDenied
error?
Can the reading side return errors and the writing side still work, or vice versa?
Are some errors recoverable? Can you continue reading/writing again despite an error being returned?
If you call poll_close
and it returns Pending
, can you start calling poll_write
again after?
And so on.
So to solve all of this I created my own abstraction which follows what I think are straight forward rules. There are no errors, and just 5 possibles states (open, read open write closed, read closed write open, both closed, and reset).
It's a bit annoying to plug into existing APIs, but I'm going to provide helpers in the future.
lightclient/src/platform/default.rs
Outdated
if update_stream_future_ready { | ||
Poll::Ready(()) | ||
} else { | ||
Poll::Pending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever Poll::Pending
is returned, the Context
(cx) must at some point be used to wake the future up again, so that it continues to make progress. in at least one case, it looks like we return Pending
here when nothing would be planning to ever call wake()
on the context to wake it up again.
I'd have another look at this whole function and see how it can be structured so that we only ever return Pending
if something like .poll_close
, .poll_flush
, .poll_read
etc returns Pending
(because these things each take in the cx
and are all expected to wake the future when progress can possibly be made again). You can never return Pending
"on your own" without it coming from some other thing, unless you also wake the future up appropriately etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, after looking a bit at the code I believe we might have some room for improvement here.
As you mentioned, when the read
, write
, close
or flush
cannot make progress, we simply return Poll:Pending
without waking this up.
And since in smoldot this call is coupled with:
- receive a message from the coordinator (some internal detail of smoldot)
- wait for a timeout to happen
We could potentially wake this future up again before receiving a msg from the coordinator, or waiting for the timeout. I'm not sure how complicated the code would look / if its worth doing.
I would keep it in sync with the platform's behavior for now (code from smoldot).
Have added some comments around it
// cc for reference @tomaka
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thankyou for adding the comments; as a reviewer it's scary to see the Pending
's without context wakers being used/stored/whatever, but I think I understand how it's working a bit better now!
cx: &mut Context<'_>, | ||
buf: &mut [u8], | ||
) -> Poll<Result<usize, io::Error>> { | ||
let mut inner = self.inner.lock().expect("Mutex is poised; qed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another Pending
/Waker
thing:
Because this WasmSocket
is behind an Arc<Mutex<..>>
, a question I have is; what happens if we make, say, 3 copies of this, and then for each copy, in some different task, we .read().await
?
Each copy would call poll_read
, and then the waker given to each poll_read
would be saved into inner
to be woken up. But the wakers would overwrite eachother; 2 of the wakers would therefore never be used, and 2 of the three tasks would never make progress.
I had to consider this in some personal thing where I implemented a small MPSC style channel (https://github.com/jsdw/asyncified/blob/main/src/channel.rs), and there I had each clone of the sender allocate itself a new ID, and then I stored all wakers in a HashMap<ID, waker>
) so that I would remove them at needed on cleanup, and wake them all up on progress.
That all said, pub struct WasmSocket
is not itself clonable, so, does it need the Arc
inside it at all? (or the mutex for that matter)? Because if you could clone the socket and read from it, different clones would be getting different parts of the output anyway which probably isn't what we want.
So maybe we can just remove the Arc<Mutex>
? Or at least the Arc
? And then the type system enforces that it can only be used in one place, with one waker, at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smoldot requires this object to be Send
, which may be a side-effect of the libp2p to have the Transport layer as send.
By default, the websys WASM socket is not Send
. To accommodate the interface I've used the SendWrapper
in a few places which should give us a bit of a stronger guarantees.
I've tried removing the Arc
but somehow we'd still need to wake up the waker
variable from inside the closures (similar to gloo-net) and have this Send.
There's also a plan to add support to libp2p for a rust based code for WASM that uses something similar to an Arc<Mutex<shared>>
(from transports/websocket-websys/Cargo.toml).
I've changed the Arc<Mutex
with an Rc<RefCell
+ the SendWrapper
. While at it, I have also moved the socket and callbacks out of the RefCell
(prev Mutex
).
I'm not quite sure how we could improve it further, would love to hear your thoughts 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oohhh sorry, so the reason for the Arc<Mutex<..>>
was because this thing needed sharing across some different closures below; I missed that bit! And when compiled to WASM the mutex will disappear anyway since it's single threaded.
I think given that the outer struct is not cloneable, I'd remove the SendWrapper
thing (and crate) and just put the Arc<Mutex<..>>
back (it feels cleaner than this SendWrapper<Rc<RefCell<..>>>
thing). Just maybe a dev comment on WasmSocket
that the thing cannot impl Clone
would do I think :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looking good; just a few questions and thoughts mainly around Poll::Pending
uses which could be problematic :)
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…support_with_wasm Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…support_with_wasm
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work @lexnv; thanks for looking into those Pending
things and checking that they are ok (ie smoldot is dropping the future or whatever anyway in those cases).
I'm happy to see this merge now and we can iterate on it from there if we run into things :)
This PR exposes a pure-rust code base for light-client interaction in WASM environments.
smoldot_light::PlatformRef
is implemented for both Native and WASM. Although the Native could use thesmoldot_light::DefaultPlatform
, this is reused with minimal code for better light-client code coverage. The main advantage of this approach is pure-rust code for WASM environments without gluing JavaScript code. A downside is that the common code of the platform needs maintenance.WebSocket
is implemented usingwasm-bindgen
WebSocket
allows us to efficiently implementAsyncRead
andAsyncWrite
by levering aVecDeque
To enable the light-client users need to enable the "unstable-light-client" feature flag.
For wasm support, the "web" feature flag is also expected, which keeps consistency across subxt flags.
Testing Done
Similarly to the native light-client, a WASM compatible client syncs with the
Polkadot
live chain and subscribes to a few finalized chain blocks.Builds upon: #965
Closes #964.