Skip to content

Latest commit

 

History

History
211 lines (186 loc) · 8.35 KB

README.md

File metadata and controls

211 lines (186 loc) · 8.35 KB

remote-web-streams

Web streams that work across web workers and <iframe>s.

Problem

Suppose you want to process some data that you've downloaded somewhere. The processing is quite CPU-intensive, so you want to do it inside a worker. No problem, the web has you covered with postMessage!

// main.js
(async () => {
  const response = await fetch('./some-data.txt');
  const data = await response.text();
  const worker = new Worker('./worker.js');
  worker.onmessage = (event) => {
    const output = event.data;
    const results = document.getElementById('results');
    results.appendChild(document.createTextNode(output)); // tadaa!
  };
  worker.postMessage(data);
})();

// worker.js
self.onmessage = (event) => {
  const input = event.data;
  const output = process(input); // do the actual work
  self.postMessage(output);
}

All is good: your processing does not block the main thread, so your web page remains responsive. However, it takes quite a long time before the results show up: first all of the data needs to be downloaded, then all that data needs to be processed, and finally everything is shown on the page. Wouldn't it be nice if we could already show something as soon as some of the data has been downloaded and processed?

Normally, you'd tackle this with by reading the input as a stream, piping it through one or more transform streams and finally displaying the results as they come in.

// main.js
(async () => {
  const response = await fetch('./some-data.txt');
  await response.body
    .pipeThrough(new TransformStream({
      transform(chunk, controller) {
        controller.enqueue(process(chunk)); // do the actual work
      }
    }))
    .pipeTo(new WritableStream({
      write(chunk) {
        const results = document.getElementById('results');
        results.appendChild(document.createTextNode(chunk)); // tadaa!
      }
    }));
})();

Now you can see the first results as they come in, but your processing is blocking the main thread again! Can we get the best of both worlds: process data as it comes in, but off the main thread?

Solution

Enter: remote-web-streams. With this libray, you can create pairs of readable and writable streams where you can write chunks to a writable stream inside one context, and read those chunks from a readable stream inside a different context. Functionally, such a pair behaves just like an identity transform stream, and you can use and compose them just like any other stream.

Basic setup

RemoteReadableStream

The basic steps for setting up a pair of linked streams are:

  1. Construct a RemoteReadableStream. This returns two objects:
    • a MessagePort which must be used to construct the linked WritableStream inside the other context
    • a ReadableStream which will read chunks written by the linked WritableStream
// main.js
import { RemoteReadableStream } from 'remote-web-streams';
const { readable, writablePort } = new RemoteReadableStream();
  1. Transfer the writablePort to the other context, and instantiate the linked WritableStream in that context using fromWritablePort.
// main.js
const worker = new Worker('./worker.js', { type: 'module' });
worker.postMessage({ writablePort }, [writablePort]);

// worker.js
import { fromWritablePort } from 'remote-web-streams';
self.onmessage = (event) => {
  const { writablePort } = event.data;
  const writable = RemoteWebStreams.fromWritablePort(writablePort);
}
  1. Use the streams as usual! Whenever you write something to the writable inside one context, the readable in the other context will receive it.
// worker.js
const writer = writable.getWriter();
writer.write('hello');
writer.write('world');
writer.close();

// main.js
(async () => {
  const reader = readable.getReader();
  console.log(await reader.read()); // { done: false, value: 'hello' }
  console.log(await reader.read()); // { done: false, value: 'world' }
  console.log(await reader.read()); // { done: true, value: undefined }
})();

RemoteWritableStream

You can also create a RemoteWritableStream. This is the complement to RemoteReadableStream:

  • The constructor (in the original context) returns a WritableStream (instead of a readable one).
  • You transfer the readablePort to the other context, and instantiate the linked ReadableStream with fromReadablePort inside that context.
// main.js
import { RemoteWritableStream } from 'remote-web-streams';
worker.postMessage({ readablePort }, [readablePort]);
const writer = writable.getWriter();
// ...

// worker.js
import { fromReadablePort } from 'remote-web-streams';
self.onmessage = (event) => {
  const { readablePort } = event.data;
  const reader = readable.getReader();
  // ...
}

Examples

Remote transform stream

In the basic setup, we create one pair of streams and transfer one end to the worker. However, it's also possible to set up multiple pairs and transfer them all to a worker.

This opens up interesting possibilities. We can use a RemoteWritableStream to write chunks to a worker, let the worker transform them using one or more TransformStreams, and then read those transformed chunks back on the main thread using a RemoteReadableStream. This allows us to move one or more CPU-intensive TransformStreams off the main thread, and turn them into a "remote transform stream".

To demonstrate these "remote transform streams", we set one up to solve the original problem statement:

  1. Create a RemoteReadableStream and a RemoteWritableStream on the main thread.
  2. Transfer both streams to the worker. Inside the worker, connect the readable to the writable by piping it through one or more TransformStreams.
  3. On the main thread, write data to be transformed into the writable and read transformed data from the readable. Pro-tip: we can use .pipeThrough({ readable, writable }) for this!
// main.js
import { RemoteReadableStream, RemoteWritableStream } from 'remote-web-streams';
(async () => {
  const worker = new Worker('./worker.js', { type: 'module' });
  // create a stream to send the input to the worker
  const { writable, readablePort } = new RemoteWritableStream();
  // create a stream to receive the output from the worker
  const { readable, writablePort } = new RemoteReadableStream();
  // transfer the other ends to the worker
  worker.postMessage({ readablePort, writablePort }, [readablePort, writablePort]);

  const response = await fetch('./some-data.txt');
  await response.body
    // send the downloaded data to the worker
    // and receive the results back
    .pipeThrough({ readable, writable })
    // show the results as they come in
    .pipeTo(new WritableStream({
      write(chunk) {
        const results = document.getElementById('results');
        results.appendChild(document.createTextNode(chunk)); // tadaa!
      }
    }));
})();

// worker.js
import { fromReadablePort, fromWritablePort } from 'remote-web-streams';
self.onmessage = async (event) => {
  // create the input and output streams from the transferred ports
  const { readablePort, writablePort } = event.data;
  const readable = fromReadablePort(readablePort);
  const writable = fromWritablePort(writablePort);

  // process data
  await readable
    .pipeThrough(new TransformStream({
      transform(chunk, controller) {
        controller.enqueue(process(chunk)); // do the actual work
      }
    }))
    .pipeTo(writable); // send the results back to main thread
};

With this set up, we achieve the desired goals:

  • Data is transformed as soon as it arrives on the main thread.
  • Transformed data is displayed on the web page as soon as it is transformed by the worker.
  • All of the data processing happens inside the worker, so it never blocks the main thread.

The results are shown as fast as possible, and your web page stays snappy. Great success! 🎉

Behind the scenes

The library works its magic by creating a MessageChannel between the WritableStream and the ReadableStream. The writable end sends a message to the readable end whenever a new chunk is written, so the readable end can enqueue it for reading. Similarly, the readable end sends a message to the writable end whenever it needs more data, so the writable end can release any backpressure.