Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

View versus buffer #35

Closed
annevk opened this issue Apr 13, 2015 · 28 comments
Closed

View versus buffer #35

annevk opened this issue Apr 13, 2015 · 28 comments

Comments

@annevk
Copy link

annevk commented Apr 13, 2015

In #34 (comment) I raised the question whether we should expose bytes as a buffer rather than a view. I believe that for the Encoding Standard we ended up concluding that we'd rather have exposed it as a buffer than a view as that would have been more consistent with other APIs, such as XMLHttpRequest and WebSocket.

Now streams are pretty close to the metal. Is there any reason they need a view?

@inexorabletash @wanderview

@wanderview
Copy link

I assume this was a convenience so that code didn't have to manually wrap the buffer in a view on each call. Just picking Uint8 does seem a bit opinionated.

It does not seem that limiting, though, given you can get Uint8Array.buffer to convert to another view.

I guess I'd be happy to see ArrayBuffer passed, but I haven't felt strongly enough to raise an objection over it.

@yutakahirano
Copy link
Owner

whatwg/streams#295

@annevk
Copy link
Author

annevk commented Apr 13, 2015

I think I sort of get the buffer reuse https://gist.github.com/domenic/dab9e8da245cb4fe6027 use case now. The view's buffer that's passed in gets detached, reused, and then reattached after getting the new bytes. That probably makes sense and is why adding one layer of abstraction here helps.

@wanderview
Copy link

It would be nice if the reader could specify what type of view they want to use, though. Is that possible or is it really locked to Uint8Array?

@annevk
Copy link
Author

annevk commented Apr 13, 2015

It depends on the input view as I understand it which seems okay?

@wanderview
Copy link

It depends on the input view as I understand it which seems okay?

For getByobReader().read() it does, but for getReader().read() just always gives you a Uint8Array I think.

@domenic
Copy link
Contributor

domenic commented Apr 13, 2015

For getByobReader().read() it does, but for getReader().read() just always gives you a Uint8Array I think.

Right, this. But I see no problem with adding an option to getReader, at least in theory... details to be worked out...

@domenic
Copy link
Contributor

domenic commented Apr 13, 2015

I'll update whatwg/streams#300 with this point

@wanderview
Copy link

Right, this. But I see no problem with adding an option to getReader, at least in theory... details to be worked out...

It seems simpler to me to have getReader().read() give you an ArrayBuffer and then you get views if you opt into that with the read(view) approach.

@domenic
Copy link
Contributor

domenic commented Apr 13, 2015

Hmm well I don't feel strongly. We discussed this only briefly in whatwg/streams#295 (comment)... I think the consistency is nice in balance.

Note that for BYOB we have to either require a view (don't accept ArrayBuffer) or if we do accept an ArrayBuffer pick a default for the returned view.

@wanderview
Copy link

Yea, making BYOB require a view seems reasonable to me.

I don't see how .read() returning a Uint8Array is consistent with .read(view), though. The .read(view) spec shouldn't explicitly mention Uint8Array at all, but rather the view passed in. So we're making up a type to be consistent with.

To me the consistency comes from ".read() returns the most general type possible without losing information".

Kind of a bikeshed, but it does seem like this could be quite an annoyance for anyone not using Uint8Array.

@domenic
Copy link
Contributor

domenic commented Apr 13, 2015

The only other argument is one I briefly alluded to in whatwg/streams#295 (comment) which is that you could imagine an implementation that returns multiple views onto the same ArrayBuffer with subsequent calls to read(). E.g. it pre-allocates a large buffer, and then does some I/O that fills only a small subset of it, so it gives back a view onto that subset. Then the next I/O fills another subset, and so it gives back a view there. This only really works for single-threaded scenarios, and probably even then causes observable data races. But I thought I'd expand on it a bit.

I hope we can get @tyoshino and @yutakahirano to weigh in. If they agree then we'll need to patch Chrome ASAP.

@wanderview
Copy link

The only other argument is one I briefly alluded to in whatwg/streams#295 (comment) which is that you could imagine an implementation that returns multiple views onto the same ArrayBuffer with subsequent calls to read(). E.g. it pre-allocates a large buffer, and then does some I/O that fills only a small subset of it, so it gives back a view onto that subset. Then the next I/O fills another subset, and so it gives back a view there. This only really works for single-threaded scenarios, and probably even then causes observable data races.

Does this really work given that Uint8Array.buffer allows content to access the underlying ArrayBuffer?

Personally, I expect to implement .read() for a binary stream to return a buffer sized to all the bytes currently available. I'm not sure why we would want to slice it up into smaller chunks arbitrarily.

@domenic
Copy link
Contributor

domenic commented Apr 13, 2015

Here is a contrived example:

  • Underlying source is implemented in JS and on the same thread
  • Underlying source pre-allocates a 10 MiB buffer
  • For each call to read(), the underlying source decides to dole out 1 MiB of the buffer via a view onto it.

Indeed you could see that view1.buffer === view2.buffer. But that's implementable anyway. I could do that with a normal ReadableStream. Heck I don't even have to use array buffers: const x = {}; c.enqueue({ chunkNumber: 1, backingObject: x }); c.enqueue({ chunkNumber: 2, backingObject: x }).

I can't come up with a non-contrived scenario where this is very sensible, though. You can try to make up stuff, like maybe it requires computation and you only want to do 1 MiB of computation at a time (but you have 10 MiB of memory at a time). But meh.

@wanderview
Copy link

I think slicing like this only makes sense if it aligns with the consumer of the data. That .read(view) seems like the way to do that.

Or we could add a .read(numBytes) that tries to give you exactly that sized buffer. It could also wait to resolve until that many bytes is available.

@annevk
Copy link
Author

annevk commented Apr 14, 2015

So the mental model of buffer reuse is that you have an underlying buffer that represents 10 MiB. You keep passing the ArrayBuffer representing the underlying buffer around to indicate you want to reuse that same 10 MiB. And you have a view upon that ArrayBuffer that tells you what part of the ArrayBuffer you can inspect.

Both the ArrayBuffer and view instances are created fresh each read(), right?

Wouldn't another possible model be that read() always reuses the buffer? It keeps a pointer to the ArrayBuffer it returns, which is the size of whatever was the maximum it could give. When read() is invoked again that ArrayBuffer gets detached. The streams implementation could preserve the same underlying buffer and just hand out slices through the ArrayBuffer class. But I guess that's not flexible enough.

@yutakahirano
Copy link
Owner

I don't have a strong opinion, but I want a strong consensus so that we will not reverse the decision again...

@tyoshino
Copy link
Contributor

@wanderview #35 (comment): Even without .read(numBytes), it could make sense to have an ability to return views to the same big buffer if the underlying source generates chunks with large memory regions while the stream want to return a little more fine-grained chunks so that the reader can exert back-pressure more precisely. This is also kinda contrived example. If we really want to provide this kind of flexibility over back pressure exertion, we should have .read(numBytes) as you said to accept the number of bytes to consume from the user explicitly (I also discussed this a little here whatwg/streams#320 (comment)).

So, I think if we add .read(numBytes), it should use ArrayBufferView, but for .read() which is common for byte and non-byte stream, I think it's ok to use the ArrayBuffer.

@annevk

When read() is invoked again that ArrayBuffer gets detached

As you said, this is not so flexible. With this requirement, we need to ask the user not to touch the ArrayBufferViews returned for previous read() calls once it issues a new read() call.

@annevk
Copy link
Author

annevk commented Apr 14, 2015

@tyoshino in the alternative model I put forward (that's just exploration for what it's worth) read() would always return an ArrayBuffer. And whenever you invoke read() again that would neuter it (to allow for reuse of the underlying buffer the stream controls). Developers would have to transfer bytes and add views.

@tyoshino
Copy link
Contributor

Yeah. OK. I just meant that the following restriction is very different from what we have now.

And whenever you invoke read() again that would neuter it

@tyoshino
Copy link
Contributor

To be clear, I'm fine with either ArrayBuffer or ArrayBufferView.

@wanderview
Copy link

To be clear, I'm not advocating for .read(numBytes) right now. That was a tangent. Sorry!

If .read() returned ArrayBufferView, how would the calling code know what type of view it actually is to use it? Or would they have to use chunk.buffer, chunk.byteOffset, and chunk.length to create their own view?

@tyoshino
Copy link
Contributor

Sorry. By #35 (comment), I didn't mean that it should be unknown what kind of ArrayBufferView is read.

So, either of

@domenic
Copy link
Contributor

domenic commented Apr 14, 2015

I don't have a strong opinion, but I want a strong consensus so that we will not reverse the decision again...

Yes, this...

If .read() returned ArrayBufferView, how would the calling code know what type of view it actually is to use it? Or would they have to use chunk.buffer, chunk.byteOffset, and chunk.length to create their own view?

I don't really understand this question. For "how would the calling code know what type of view it actually is to use it" I think just looking at view.constructor works. For "would they have to use chunk.buffer, chunk.byteOffset, and chunk.length to create their own view" I think they can just do new Float32Array(anyOtherView).

@wanderview
Copy link

Ok, I did not realize the typed array constructors could convert another typed array that easily. That's the same as what they would need to do with an ArrayBuffer.

So I think Uint8Array as a default is a bit odd, but its not really a problem. I personally don't think its worth triggering a fire drill to try to patch something in the blink implementation.

@annevk what do you think?

@domenic
Copy link
Contributor

domenic commented Apr 14, 2015

@annevk that's an interesting model. A couple disadvantages I can think of:

  • Requires fixing ahead of time the maximum buffer size (since the same underlying buffer is reused), so does not allow for the flexibility of saying "my destination needs 10 MiB right now" if the maximum buffer size is 5 MiB. (It might be possible to work around this if implementers are able to back the same ArrayBuffer by multiple backing buffers, but that seems a bit sketchy.)
  • Doesn't allow more complicated patterns where the app might be using more than one chunk at once (trading memory for throughput). For example the ping-pong pattern in https://gist.github.com/domenic/dab9e8da245cb4fe6027#file-ping-pong-js (or extrapolation of it to pools of sizes above 2).

@annevk
Copy link
Author

annevk commented Apr 15, 2015

I'm convinced about the existing model and with the Uint8Array default given the arguments provided. Using ArrayBuffer as output elsewhere (Fetch and Push API come to mind) still seems like a good idea as long as there are no reusing buffer concerns.

The Encoding Standard using Uint8Array is odd in that respect but does set a precedent for using that as the default view for bytes.

I'll leave it to @wanderview to sign off and close this.

@wanderview
Copy link

Still looks good with Uint8Array as default to me! I don't have perms to close issues in this repo, though.

@annevk annevk closed this as completed Apr 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants