Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OffscreenCanvas new commit() and DedicatedWorker.requestAnimationFrame #288

Closed
fserb opened this issue Jun 1, 2018 · 19 comments
Closed
Assignees
Labels
Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review

Comments

@fserb
Copy link

fserb commented Jun 1, 2018

Hello TAG!

This is a follow up from: #141

I'm requesting a TAG review of:

This change addresses concerns pointed out at #141, namely:

  • the previous Promise commit() solution doesn't exist anymore, instead requestAnimationFrame has been added to DedicatedWorkerGlobalScope, and a render model has been described on it. This is semantic similar to what happens on Window (although not attached to a Document animation frame).
  • There's still a need for a commit() function, to support the new WebAssembly cases of a busy loop execution (that doesn't finish the context task) but that still wants to commit a render frame. This new commit() is a blocking function that returns as soon as the UA is ready to receive a new rendering of the OffscreenCanvas.
  • Apart from the request above, and since this is relevant to OffscreenCanvas, there's still a minor follow up PR on whatwg that enables (https://github.com/whatwg/html/pull/3708)[text rendering on OffscreenCanvas].

We'd prefer the TAG provide feedback as leave review feedback as a comment in this issue and notifying myself (@fserb) if needed.

@travisleithead travisleithead self-assigned this Jun 19, 2018
@plinss plinss added this to the 2018-06-26-telcon milestone Jun 19, 2018
@annevk
Copy link
Member

annevk commented Jun 20, 2018

The one thing of note here is that requestAnimationFrame() and cancelAnimationFrame() will only function in dedicated worker agents that are part of an agent cluster which includes a similar-origin window agent. Otherwise they'll throw. At least, as currently proposed.

An alternative could be that we don't expose them in agent clusters that contain a service worker agent or shared worker agent, but that would require more IDL infrastructure.

Noting this here as it sets some architectural precedent. (At least, I'm not aware of other APIs needing this.)

@RByers
Copy link

RByers commented Jun 28, 2018

Note that there's now a blink intent-to-ship thread for this feature

@greggman
Copy link

greggman commented Jul 6, 2018

So where is the up to date spec? The spec linked above says

When its canvas context mode is placeholder , a canvas element has no rendering context. It serves as a placeholder for an OffscreenCanvas object, and the content of the canvas element is updated by calling the commit() method of the OffscreenCanvas object's rendering context.

and

canvas . transferControlToOffscreen ()

Returns a newly created OffscreenCanvas object that uses the canvas element as a placeholder. Once the canvas element has become a placeholder for an OffscreenCanvas object, its intrinsic size can no longer be changed, and it cannot have a rendering context. The content of the placeholder canvas is updated by calling the commit() method of the OffscreenCanvas object's rendering context.

But talking to others (and in testing) these are no longer the case. For an OffscreenCanvas the contents of the placeholder is updated automatically after the current event exits just like a normal canvas. In other words, just like a normal canvas if you call gl.clear or gl.drawXXX then a task is queued to copy/swap the drawing buffer into the canvas. commit is only needed for spin loops

Example: This renders to the canvas. no call to commit needed

in worker

self.onmessage = function(evt) {
  const canvas = evt.data.canvas;
  const gl = canvas.getContext("webgl");
  gl.clearColor(0,1,0,1);
  gl.clear(gl.COLOR_BUFFER_BIT);
};

In main thread

  const canvas = document.querySelector("canvas");
  const offscreen = canvas.transferControlToOffscreen();
  worker.postMessage({canvas: offscreen}, [offscreen]);

https://jsfiddle.net/greggman/w3t0gmvz/

The same is true for Canvas2DRenderingContext. Changing the code above to use canvas 2d instead of webgl also renders which the spec linked requires it NOT to render.

in worker

self.onmessage = function(evt) {
  const canvas = evt.data.canvas;
  const ctx = canvas.getContext("2d");
  ctx.fillStyle = 'red';
  ctx.fillRect(50, 100, 70, 50);
};

https://jsfiddle.net/greggman/qmn2e78h/

Also the spec linked says

and

If an OffscreenCanvas object whose dimensions were changed has a placeholder canvas element , then the placeholder canvas element 's intrinsic size will only be updated via the commit() method of the OffscreenCanvas object's rendering context.

But that is also false.

Change the code above to

self.onmessage = function(evt) {
  const canvas = evt.data.canvas;
  const ctx = canvas.getContext("2d");
  ctx.canvas.width = 2;
  ctx.canvas.height = 2;
  ctx.fillStyle = 'red';
  ctx.fillRect(0, 0, 1, 1);
};

and we see the intrinsic size is updated without calling commit

https://jsfiddle.net/greggman/j3pgLs0o/

Where is the correct spec and has it been reviewed?

@travisleithead
Copy link
Contributor

@greggman part of this review process is to help align implementations with the spec and the spec to implementations. As you point out, clearly there are some discrepancies between the two. It also looks like this is not shipping [yet] in any release version of a browser (e.g., not behind a flag), so there's still an opportunity to get alignment. :)

@torgo torgo added the Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review label Jul 10, 2018
@travisleithead
Copy link
Contributor

@fserb some initial feedback from TAG review today:

This looks like great progress, and is addressing the high-level feedback about having a consistent requestAnimationFrame from our prior feedback. In general, we'd like to see more code examples in the describing document, especially code examples that highlight (at a high-level) the various use-cases outlined at the top (it is an "explainer" document, right?).

With our privacy hat on, we were disappointed to not see a Privacy considerations section :-( It would be great to add one, even if you believe there isn't much to put in it--it helps us know you've given it some thought.

@annevk left some feedback above about availability in agent clusters... the behavior of which is a side-effect of integration into the spec via the PR. My personal view is that the updated AnimationFrameProvider be limited to DedicatedWorker and Window (for now, until details about how to bind its timing for a SharedWorker or ServiceWorker). I'd rather not expose an API that will simply throw in most cases. I thought this can be done with [Exposed=(Window,DedicatedWorker)], but I'm not really up-to-speed on agent clusters or how to differentiate that in spec-ese.

Also as-noted, the PR has nothing for the commit() API. We expect to see that in the future.

A nit: the WebIDL description of Offscreen canvas' commit cannot have multiple inheritance :-)

Use case question: is composition a desired scenario? E.g., Worker+Offscreen canvas used to render the background of a 3D scene, while foreground rendering done on Window's browsing context and the two scenes are composited together? (Or foreground paints a player HP/MP/Score overlay, while offscreen canvas renders the game world?)

In the description of the processing model: "atomic update" text is used, but there's a lack of clarity on what it means. Does it mean a blocking synchronization point for the window's browsing context? There may be a definition of this in the 'Previously Considered Solutions' section, but I don't have confidence that that definition wasn't thrown out with the previous solution. :) Also, processing of the Offscreen Canvases is done after other animation callbacks are executed in window's browsing context. Is there a specific reason for that (other than arbitrary)? Is it to avoid blocking more of the event loop?) Also, I assume this processing is somehow recursive over the tree of potentially nested workers (as workers can spawn workers).

Relationship of commit() to ImageBitmap... are these duplicating effective behavior? ImageBitmap can allow multiple frames to accumulate in memory before being rendered (these pinning GPU memory until collected). Is ImageBitmap still a good pattern to use when commit() is available and can seemingly do the same transfer-and-display logic without using intermediate objects and without allowing for the possibility of unbounded GPU memory backpressure?

commit() + rAF behavior: Is the expectation that commit() must be used at the end of each worker's rAF? (in order to get the timing synchronized and send the frame at the appropriate time to the GPU?). If not, is there some assumption that all offscreen canvases implicitly commit at the end of the worker's rAF call? This was put in question by @greggman above and does need to be clarified.

From the PR:

Note: this PR seems incomplete to me... maybe because the explainer document is incomplete? Based on what I've seen, I certainly don't think this is ready to ship...

line: 89072 - seems to loose to just say "user agent decides" for the spec--doesn't it need to match the framerate of the associated window's browsing context? Must have missed where that was set-up...

@annevk
Copy link
Member

annevk commented Jul 25, 2018

@travisleithead I don't think you understood my point. It's not about exposing it outside of dedicated workers, it's about which dedicated workers. E.g., dedicated workers spawned from service workers. Are they exposed there or do they throw?

@greggman
Copy link

I don't know who's reading what where but on top of the issues mentioned above with commit are a host of others. Sorry to repeat this stuff but I have no idea if there is one official place to discuss this stuff or if I should just keep repeating myself in every place it's bought up.

The biggest issue is that commit (AFAIK) effectively creates a new kind of environment in the browser. It's use case is supposedly for native ports using spin loops

 while(true) { 
    render();
    gl.commit();
 }

Once you do that all access to all other APIs stops working in that worker because all other APIs are event based (onmessage, fetch, websockets, rAF, setTimeout, setInterval, etc...)

Is that fact well under by the people reviewing this proposal?

The people pushing for commit know that implication and are suggesting that a worker using a commit spin loop will be communicated with via shared arraybuffers since other communication becomes impossible.

It seems like there are a bunch of issues with that direction. For example it seems to make throttling hard. You can block the commit indefinitely when the page using it is not the front tab BUT, if you block then any other workers or the main thread will suddenly stop having their shared arraybuffer exchanges respond. While it might be possible to correctly program for that case I have a feeling it will lead to tons of subtle race conditions for most apps trying to use it. It was also suggested in that case that instead of blocking the browser might just throttle (eg. only process commit once a second) but that also seems like a bad idea as it will likely make using the browser extremely unresponsive until the tab using the spin loop is closed.

@plinss plinss added extra time and removed Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review labels Jul 25, 2018
@kenrussell
Copy link

@greggman, @fserb and I are considering your and others' feedback regarding commit(). We have put it behind a flag in Chromium's implementation and will do more experimentation with multithreaded WebAssembly applications before aiming to both specify and ship it.

@juj
Copy link

juj commented Aug 12, 2018

The last I understood was that .commit() and while(true){} custom main loops are orthogonal concepts. A Worker that is using OffscreenCanvas and .commit() to present frames, may be rendering asynchronously event-based in the Worker, or synchronously via while(true) {} custom main loop. The wording in https://github.com/junov/OffscreenCanvasAnimation/blob/master/OffscreenCanvasAnimation.md#the-commit-processing-model suggests that would still be the case?

Browsers should not block .commit() to wait until page resumes from background, since that would indeed deadlock Workers and possibly cause event queues to build up, and a barrage of flooded computation when the tab finally resumes. If browsers want to throttle background tabs, they should do that at a global Worker/tab level, after all nothing prevents pages from wasting CPU via a for(;;) {} in a Worker today, already without OffscreenCanvas.

The issue with multithreaded applications is that only one Worker in such an application might be doing OffscreenCanvas WebGL rendering, and many other Workers in an app may be doing some other continuously running computation, that can produce and wait for events from the OffscreenCanvas WebGL rendering Worker. If just the OffscreenCanvas Worker paused in .commit() as a response to the tab going to background, but other Workers did not, they might keep piling up events to the OffscreenCanvas Worker queue, or they might also pause waiting for something from the OffscreenCanvas Worker. That is why throttling should be considered a) globally or b) cooperatively instead, and not on .commit(). If a browser globally throttles the whole tab, then the event queue build-up should be more moderate, since essentially the whole app would suspend, and not just parts of it.

If we think that a computation consists of two separate logical parts: game/app logic simulation+animation rendering, and would want to reduce the processing of an application to only doing the simulation portion when on the background since the animation bit is redundant, then we should provide APIs that make it possible for application developers to easily opt in to. Giving requestAnimationFrame() to Workers makes that bit easier. Emscripten provides uniform API for either main thread or pthread to register to get visibilitychange events into the calling thread, which allows applications to cooperatively stop rendering, whether they are in main thread or pthread using OffscreenCanvas, or direct WebGL.

If we wanted something more automatic on the cooperative front, it could be a property bool suspendMeIfMyTabGoesToBackground and/or bool graduallyStartThrottlingMeIfMyTabGoesToBackground or similar on WorkerGlobalScope, which developers would be able to set for Workers they know do not need to do anything if the page is on the background. That would allow developers to eagerly opt in to power saving.

For slightly easier rendering throttling management, it might be nice for .commit() to give back an enum return value 0: visible, 1: hidden (while at it, 2: contextlost?) that caller would be able to investigate right there in order to do cooperative throttling in a non-event-based manner, without having to set up visibilitychange listeners. And/or perhaps we should make Page Visibility API with is document.visibilityState readily available in Web Workers?

But apart from that, I don't think browsers have any other option than to globally throttle all Workers/the whole tab, independent of what they are doing. It is the same problem as how should browser react to for(;;){} background Workers already today, the rest is about providing developers with APIs how to easily manage cooperative throttling to ensure that the computation they are generating will be the most reasonable/optimized one for the particular situation at hand?

@greggman
Copy link

greggman commented Sep 5, 2018

I'm glad alternative solutions are being considered.

I'd like to take a step back and suggest that maybe the goal of allowing unmodified native apps to run um ... unmodified is maybe not such a great goal for the web?

Native apps have all kinds of issues when ported to the web. Some issues off the top of my head

  • The kinds of apps we're talking about (non-event driven, ie, games) are usually resource intensive.

    The web doesn't have a good solution here. In order for users to have a good experience those apps need to be redesigned to stream resources so they are usable quickly. Refactoring so say the first level can be played while the other 29 levels download is likely not a smaller amount of work. They may also need extra work to cache their resources so on second visit they don't take another 5-30 minutes to start as they redownload assets.

  • Most native apps (mobile - there are 10x more mobile apps than PC) are not mouse friendly as they are designed for touch only. That means in order to be useful on the web they need their interfaces redesigned.

  • Most native apps except to have different versions generated at compile time per platform whereas most web apps are expected to run everywhere.

    This has issues like for example a cut/copy/paste where a web app will leave it to the browser to handle cut/copy/paste. A native app will hard code checking for Ctrl-C on PC and Cmd-C on Mac. This doesn't work on the web. Worse, a native app expects to just be able to read the clipboard on demand but the web requires the user to paste and the app to wait for a paste event. Even games benefit from being able to paste names, paste into chat, paste passwords into login screens, etc...

What these and other issues mean is that bringing native apps to the web and having the experience be good for the user requires lots of work. If that work is not spent then all users get is really poor experiences turning them off to the web. The amount of work to refactor these apps to be event driven is tiny compared to the amount of work required to make them good experiences on the web.

In other words, this goal of making spin loops magically work is basically saving native apps only a few percent of the work they really need while at the same time encouraging bad experiences for users since native app people can then just barf out their apps, no thought given.

  • oh hey, it runs but only on windows! Sucks to be web

  • oh hey, it runs but takes several minutes to start every few days. Web sucks doesn't it

That doesn't seem like a win for anyone.


Just for fun I went looked up the code for Lumberyard and Unreal

main calls RunMainLoop

https://github.com/aws/lumberyard/blob/0b34452ef270f6b27896858dc7899c9796efb124/dev/Code/Launcher/WindowsLauncher/Main.cpp

RunMainLoop

https://github.com/aws/lumberyard/blob/e881f3023cc1840650eb7b133e605881d1d4330d/dev/Code/Launcher/LumberyardLauncher.h

Checking Unreal

WinMain calls GuardedMain

https://github.com/EpicGames/UnrealEngine/blob/release/Engine/Source/Runtime/Launch/Private/Windows/LaunchWindows.cpp

GuardedMain

https://github.com/EpicGames/UnrealEngine/blob/f509bb2d6c62806882d9a10476f3654cf1ee0634/Engine/Source/Runtime/Launch/Private/Launch.cpp

In either case I doubt it would take more than 1-4 hours to throw in a few statics and/or singletons to take things off the stack and expose the main loop to be event driven. That matches my own experience of making a few native apps event driven. In my case it took 20-40 minutes.

It seems arguable that if native devs aren't willing to put in that small amount of time to make their apps event driven then they aren't likely to do any of the other work required to bring their app to the web and so this path of supporting spin loops is only really enabling bad experiences from thoughtless quick and dirty recompiles instead of thoughtful ports.

Anyway, just a thought.

@kenrussell
Copy link

There are many scenarios where native applications drive their threads using internal synchronization (mutexes / futexes, lockless algorithms, work stealing) where it is either impossible or impractical to change the threading model to be event driven. Being able to bring applications like these to the sandboxed and portable web environment is a benefit to users because it makes them easier to run, and Emscripten. asm.js, and WebAssembly have successfully brought many apps to the web's ecosystem already. We plan to continue prototyping the ability to process pending tasks as a compatibility and interoperability mechanism with native codebases.

@travisleithead
Copy link
Contributor

CC issue #141

Hey all,

Took this up again at today's F2F in Paris.

Thanks for patiently explaining a complex space to us.

While we recognize the scenarios for having an explicit blocking commit() (e.g., porting native code to the web with near-zero change), we don't think that the case is yet sufficiently compelling to offset the architectural concerns raised by blocking commit.

Shared Array Buffers and atomics now allow developers to guard critical sections, and that may seem in conflict with a rAF-based solution, but the extent to which that is true isn't clear from discussion or the examples we've been able to read.

One of the most complex scenarios we've been pondering has been the need to run multiple displays at different rates, driven from the same GL context. This arises in VR and XR scenarios and we're very happy to see the progress made in providing requestAnimationFrame fused to the XRSession object, rather than an implicit higher frame rate (which is what commit appeared to enable in a world where the worker itself might only get a 60fps rAF).

This progress in XR takes a lot of the pressure off of the necessity of commit() (in our view) and we'd like to see the community move forward with this model if possible. If not, we'd like to discuss further, perhaps on a future call, to help clarify the (perhaps glaring!) gaps in our understanding.

Cheers -- Alex & Travis

@slightlyoff
Copy link
Member

Per the above, to clarify, we want to close this out, noting that we're happy that DedicatedWorker::requestAnimationFrame has moved forward. Hopefully our feedback about commit() is coherent. If there are questions, we'd love to have @fserb and interested parties join a call to discuss.

Looking forward to hearing back about what you'd like to do next.

@slightlyoff slightlyoff added the Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review label Oct 31, 2018
@kenrussell
Copy link

We're actively working on a port of a large multithreaded game engine to the WebAssembly environment. @kainino0x and @nickshinpho are driving this effort.

Once we have this working as expected, we will compare the approaches of commit() vs. an API which provides broader compatibility of web APIs to WebAssembly's threads (http://crbug.com/869569). One of these two approaches will definitely be needed, and the determining factor will be performance of the overall system. We'll come back to this thread after we've done these measurements and indicate which direction we would like to pursue in collaboration with the TAG.

@torgo
Copy link
Member

torgo commented Dec 19, 2018

Bumped this into the new year to come back and check it then.

@travisleithead travisleithead removed their assignment Feb 5, 2019
@torgo
Copy link
Member

torgo commented Feb 5, 2019

We discussed briefly at the f2f and agreed to close this for now. Can be re-opened if required.

@torgo torgo closed this as completed Feb 5, 2019
jdarpinian added a commit to jdarpinian/WebGL that referenced this issue Apr 5, 2019
The commit() method of OffscreenCanvas contexts was controversial and
has not been shipped by any browser. Rewrite the OffscreenCanvas tests
to not require it.

For background, see:
w3ctag/design-reviews#288
https://groups.google.com/a/chromium.org/d/topic/blink-dev/hRZ_P2o-aEk/discussion
kenrussell pushed a commit to KhronosGroup/WebGL that referenced this issue Apr 5, 2019
The commit() method of OffscreenCanvas contexts was controversial and
has not been shipped by any browser. Rewrite the OffscreenCanvas tests
to not require it.

For background, see:
w3ctag/design-reviews#288
https://groups.google.com/a/chromium.org/d/topic/blink-dev/hRZ_P2o-aEk/discussion
@juj
Copy link

juj commented Dec 17, 2019

Searching for word "commit" on this page https://html.spec.whatwg.org/multipage/canvas.html#the-offscreencanvas-interface still finds a ton of hits, so I wonder, should this bug be reopened, or a new one created, to remind to update the OffscreenCanvas spec to remove the commit() text?

The spec being out of date on commit() makes me wonder if there are other parts in the OffscreenCanvas spec that are out of date?

According to https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas/OffscreenCanvas Chrome already says they would be shipping OffscreenCanvas, but if the spec is still in flux, what exactly is Chrome shipping?

@chrishtr
Copy link

According to https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas/OffscreenCanvas Chrome already says they would be shipping OffscreenCanvas, but if the spec is still in flux, what exactly is Chrome shipping?

OffscreenCanvas was shipped in Chromium, without the commit() method, because incompatibilities with the event loop of workers was not resolved.

I agree it should be removed from the spec, will follow up with @fserb .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review
Projects
None yet
Development

No branches or pull requests