Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could a byte stream read directly into and out of a SAB (w/o transfer)? #757

Open
lukewagner opened this issue Jul 24, 2017 · 13 comments
Open
Assignees

Comments

@lukewagner
Copy link

It would be really great if wasm could read from a stream directly into its memory. The backing ArrayBuffer of a WebAssembly.Memory cannot be detached (nor would you want to since it holds the general program state needed for execution in the meantime) so what would be great is if the caller of a read or write could pass a typed array view of a buffer that is not detached. For a regular ArrayBuffer, this would mean some additional copying to avoid races. However, if the buffer is a SharedArrayBuffer, then perhaps it would be fine to expose the races (and no worse than anything else with SAB).

As an additional benefit, I think the implementation of "read into a view" from a file-backed stream could use mmap to make a copy-on-write mapping of the file directly into the target memory.

(cc @flagxor)

@lukewagner lukewagner changed the title Could a byte stream be read directly into and out of a SAB (w/o transfer)? Could a byte stream read directly into and out of a SAB (w/o transfer)? Jul 24, 2017
@ricea
Copy link
Collaborator

ricea commented Jul 25, 2017

There was some discussion of SAB with streams previously here: #495 (comment)

It seems likely that we will add a way to use a SAB without transferring. Unfortunately it doesn't look like we will get the nice memory protection semantics that would let us avoid data races. I'm not concerned about exposing data races to wasm, since C++ programmers already live in a world of races. I feel it's unfortunate to expose it to Javascript.

@lukewagner
Copy link
Author

Ah, I see, so should I keep this issue open for this specific request or is it a dup of #495?

@ricea ricea added this to the V2 milestone Jul 26, 2017
@ricea
Copy link
Collaborator

ricea commented Jul 26, 2017

I think this is useful as a separate issue. I've marked it with Milestone V2 to reflect that it's not something we'll be changing immediately.

@domenic
Copy link
Member

domenic commented Jul 26, 2017

I think we should change this sooner rather than later. In Blink we're behind on implementing byte streams, but others have done so, and should be able to get these benefits---especially if we have interested consumers in the wasm space. Plus, I believe right now the spec is incoherent, as it tries to transfer SharedArrayBuffers, which causes a spec-level assert failure.

I do think it's sad that this exposes data races, but I also agree that SharedArrayBuffers in general broke that seal, and we should integrate with them rather than wishing they didn't do that.

I'll try to work on this when I get done with TC39 and some other work items...

@ricea
Copy link
Collaborator

ricea commented Apr 23, 2018

If in future we'd like TextEncoderStream's readable side to be a byte stream then we have a conflict here. It's undesirable for user code to be able to view the encode in progress, because it may be using vector ops or other things that are browser-specific. There are probably going to be other cases of platform streams where we'd like to prevent intermediate values from being visible.

I think all solutions involve adding an extra copy when the output is a SharedArrayBuffer. The question is whether this copy becomes part of the ReadableStream machinery that is specified by a flag, or whether the underlying source has to detect that controller.byobRequest.view is an SAB and do the copy itself.

@lukewagner
Copy link
Author

Talking to @vasilvv today in the context of WebTransport using whatwg streams and how that kind of use case could perform optimally when used with wasm: in addition to the advantages identified above, the new API could potentially use a callback design (instead of promises) where the SAB-view and callback were registered once and then reused throughout the duration of the stream, so that there was, by design, without engine heroics, zero garbage generated per packet/chunk.

@ricea
Copy link
Collaborator

ricea commented Jul 2, 2019

I don't think we're likely to add callback-based APIs. Eliminating the promise from the external API wouldn't eliminate the promise returned from pull(), nor the other objects that are created internally. The objective of "zero garbage" isn't achievable without engine heroics anyway, so I don't see a compelling reason to add APIs that don't align with the rest of the standard.

@lukewagner
Copy link
Author

From what I've heard from @vasilvv, there's a lot of overhead with the current approach, enough to be problematic for APIs considering using them, so perhaps more cross-cutting changes could be justified with the goal of not creating garbage on each chunk, given that performance is the point of the lower-level API.

@vasilvv
Copy link

vasilvv commented Jul 2, 2019

I think the root cause here is not that promises are slow per se, but rather that there are too many of them. If we switch to callbacks, we will probably just run into the problem of the callback being called too often.

I am actually not sure I understand the proposal here. I can see two things we can do with SABs:

  1. Let the application provide the browser with its SAB, and let the browser write into said SAB. This is roughly like BYOB in terms of performance, but can be used without detaching the buffer.
  2. Let the browser return views into some shared buffer as the output of the stream. This would enable potential zero-copy access to stream data, since the views could point to whatever internal structure the browser uses to store that data in first place.

The description in the issue seems to indicate that (1) is proposed, but the mmap idea seems to imply (2), since I don't think that would work with (1).

@wanderview
Copy link
Member

If we switch to callbacks, we will probably just run into the problem of the callback being called too often.

FWIW, I did some prototyping to try to benchmark this before streams were standardized:

https://blog.wanderview.com/blog/2015/06/19/evaluating-streams-api-async-read/

I'm sure there are a lot of problems with what I did, but I think the conclusion was that promises were no worse than callbacks in the end. So I suspect you are correct better chunking is the answer.

@domenic
Copy link
Member

domenic commented Jul 8, 2019

I am actually not sure I understand the proposal here. I can see two things we can do with SABs:

I'm not sure those two are in conflict. (1) is what the proposal in this thread is, in that it has observable effects on the developer experience of using streams. (Object identity gets preserved, which is easier to work with than having to constantly switch what ArrayBuffer you're operating on, even if the backing memory stays the same.) Whereas (2) seems like something an implementation could do behind the scenes with no observable effects, as long as it has the ability to create JavaScript SharedArrayBuffer objects that are backed by existing chunks of memory.

Indeed, I think (2) is maybe doable even with the current spec, with just normal ArrayBuffers?

The only tricky thing I can imagine is that the contents of the AB/SAB need to only visibly change to JavaScript at well-defined points (essentially, after read(view) calls). That still seems doable with a sufficiently-advanced ArrayBuffer <-> backing memory mapping, but we're getting in to territory I'm not well versed in. Maybe (2) is about loosening that invariant?

@ricea
Copy link
Collaborator

ricea commented Mar 27, 2023

I've been thinking about this a bit more, and I have an idea. I propose adding an extra option to getReader(), unsafe. This would only be used with mode: 'byob'. Setting unsafe: true would disable the buffer detach behavior, regardless of whether it is shared or not.

The reason for disabling detaching the buffer even when it is not shared is for the convenience of wasm programs which would like to be able to use part of their memory as a buffer without losing access to the rest of it.

controller.byobRequest would gain an extra attribute, also called unsafe, which reflects which kind of reader this buffer was supplied by. This can be used by the underlying source to add an extra memory copy for safety if it does something more complex than a simple copy into the destination.

JavaScript underlying sources mostly won't need to pay attention to the unsafe attribute because we assume they trust the other JavaScript or wasm running on the same page. But it may be useful in environments where there's only "partial" trust, for example when authoring a third-party library.

For underlying sources implemented as part of the platform it is far more serious. When implemented in C++, we don't even know beyond doubt that we can safely memcpy() into a hostile SharedArrayBuffer, as the language provides no guarantees. We just have to assume it is safe because otherwise we wouldn't be able to do anything at all. What I'm thinking of doing in Chromium is making the C++ API do a copy by default when unsafe is true for the target buffer, and then require the underlying source implementation to explicitly opt-out if all it's doing is a memcpy() anyway.

If we do this we should also rethink the enqueue algorithm called by other standards to be safe-by-default.

@domenic
Copy link
Member

domenic commented Mar 27, 2023

Hmm. I think the idea of SharedArrayBuffer was to use the typesystem to encode this sort of guarantee. I.e., if you have a normal ArrayBuffer, you know you're safe; if you have a SharedArrayBuffer, you're unsafe.

What motivates the desire to let the safety vary independently of the type, for this particular web platform feature? Note that wasm memory can be made shraed using shared: true in the WebAssembly.Memory constructor, and I suspect most wasm thread-using programs already turn that on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants