-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define chunk size for ReadableStream
created by blob::stream()
#144
Comments
Is this with BYOB streams? Could you explain the issue in some more depth perhaps? I would have expected the chunk size to be implementation-dependent and perhaps to also depend on the hardware in use, but maybe that's not ideal. |
So here is an in-depth explanation. I'm making a wasm component that deals with downloading files asynchronously. When the file is downloaded I then need to process the entire file inside the browser. The wasm module handles the process however wasm has limited memory, so files as large as 5GB must remain in javascript's memory space (or where ever downloaded file data is stored, I'm not sure). In order to process the data -- for example, the aforementioned 5GB -- I must stream it through the wasm module.
Now, none of this is a problem (except the fact that I am forced to use recursive-async engineering for a simple read command). However in step 4., I'm assuming that my wasm module will only ever need a buffer that's 0x10000 bytes in length. That number is not specified anywhere. It would be handy if it was... then I -- as a WASM developer -- would know exactly how much memory I need to allocate for all applications, making my wasm very efficient |
cc @ricea |
This is under the FileAPI's jurisdiction. It's implementation-defined, and difficult to put tight constraints on without forcing implementations to do inefficient things. I hope Firefox and Chromium arrived at the same size by coincidence rather than reverse-engineering. An implementation that returned 1 byte chunks would clearly be unreasonably inefficient. An implementation which returned the whole blob as a single chunk would be unreasonably unscalable. So it clearly is possible to define some bounds on what is a "reasonable" size. I would recommend using dynamic allocation to store the chunks in wasm if possible, and assume that implementations will behave reasonably. In the standard, it would probably be good to enforce "reasonable" behaviour by saying that no chunk can be >1M in size and no non-terminal chunk can be <512 bytes. Maybe that second constraint can be phrased more carefully to allow for ring-buffer implementations that may occasionally produce small chunks but mostly don't. Alternatively, the standard could be extremely prescriptive and require 65536 byte non-terminal chunks, based in the assumption that any new implementation can be made to comply without too much loss of efficiency. |
What's the chance of us regretting such limits in 10 years? On the other hand, if applications are already depending on existing limits, maybe this will eventually require a toggle of sorts. |
To express my opinion, I think defining a maximum of 0x10000 bytes would be an extremely acceptable idea, and certainly future proof. For comparison, the max size of a UDP packet is around that size as well (0x10000-1 to be exact), and no one has complained about it since its definition in 1980. However, keep in mind, for my particular application, I would only need a maximum defined. As that maximum alone would allow me to optimize heap allocation. Another solution outside of defining a maximum for implementations would be to allow developers like me to pass an argument that would define the maximum in run-time... but at that point, we'd be redefining the BYOB implementation. |
I agree with @ricea that it's better to write your application code to be resilient to larger (or smaller) chunk sizes, e.g. slicing the buffers as appropriate. If you need control over the buffer sizes, then BYOB readers are the way to go, and we should not change the behavior of the default reader just because folks haven't implemented BYOB readers yet. Instead, we should take this as a potential signal to up the priority of BYOB. |
Theoretically I agree, but practically if people are going to write code assuming limits and don't do the due diligence of checking whether that is future proof we might well be stuck and have to define such a limit in the future. |
Is the
re a Cannot the chunks be split into the desired size using |
The WASM module will need to handle to last bytes of the file anyway, therefore defining a chunk size for
particularly where the input is not always the same file. Encountered a similar case where the value ( There is also an edge case where if Disable cache is checked at Network tab at DevTools at Chromium operations that slice and splice the input into specific FWIW, the solution that am currently using https://github.com/guest271314/AudioWorkletStream/blob/master/audioWorklet.js#L10 to handle input to
is thrown - which happens to occur when Disable cache is checked at Network tab at DevTools. |
Another option to handle arbitrary |
Isn't this enough? new Response(
Uint8Array.from(new Array(1024)).buffer
).body.pipeThrough(
new TransformStream({ type: "byte", transform: (chunk,c) => chunk.map((byte)=>c.enqueue(byte||"1"))})
).pipeThrough(
new TransformStream({ type: "byte" },new ByteLengthQueuingStrategy({ highWaterMark: 512,size:(c)=> console.log(512,c) || 512 }))
).pipeTo(new WritableStream({ write(c) { console.log(c.length) }})) i just ask for a frind :) why isn't the default behavior that TransformStream which is a passthrough stream converts it to byte |
Currently regarded as 'Issue 1' I have recently come across a project that forces me to assume the chunk size for the stream. Testing with chromium and firefox, the chunk size appears to be
0x10000
or65536
. I cannot find a reason why it's this particular number.As I'm making a wasm module, I must allocate memory space when going through files 5GB+ in size. I will allocate
0x10000
bytes for now, but only out of assumption... if there's a browser out there that does not follow this assumption then there will be fatal bugs.I'm not sure if this is w3c/FileAPI/'s or streams.spec.whatwg.org's jurisdiction. But neither of them has an exact number.
The text was updated successfully, but these errors were encountered: