Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Allow addEventListener/removeEventListener for 'fetch' to called at anytime #1195

Closed
bgirard opened this issue Sep 13, 2017 · 23 comments

Comments

@bgirard
Copy link

bgirard commented Sep 13, 2017

As both Google and Facebook has found, there’s a high measurable overhead to routing fetches through the service worker.
 
#718 implemented a great partial solution and I ran an A/B control experiment at Facebook and confirmed it showed a large startup benefit for our push-only service worker. This change requires the Service Worker to decide on first evaluation to register a fetch handler or not. This is saved on the SW registration and this value is retained. After this it’s impossible to change your mind so the work around is to force installing a new service worker to get this new first run.
 
However, this leaves out a lot of practical use case where a service worker may change its mind. For instance, a service worker may choose to disable ‘fetch’ caching if a user is logged out if the caches contain sensitive information. Or the caches may simply expire and be incompatible with the current server version. I don’t think it’s reasonable to lock-in a service worker to listen to fetch event for its full lifetime and pay the performance overhead.
 
Right now, the work around is to force install a new service worker, bypassing HTTP caches, use skipWaiting to force the new service worker to activate. This is very complicated to get right compared to simply maintaining addEventListener and removeEventListener for your fetch event.

This current approach passes on a lot of complexity to the service worker developer and establishes a bad precedent to addEventListener. See whatwg/dom#371 for discussion.
 
As I see it, the service worker shutting down is, for the most part, done as an optimization. If the service worker is shutdown while there’s no ‘fetch’ event then there’s no point in waking up the service worker since it wouldn’t be able to observe the fetch had it not been shutdown. If there’s a current ‘fetch’ listener or there was one at the time of the shutdown then the fetch should be handled by the service worker. I see no point in disallowing that to change after first run.
 
The service worker should continue to block on the ‘activate’ event completion before deciding if the navigate event should be handled or not. That gives an appropriate chance for a service worker to activate and decide if it wishes to handle all requests without missing the first navigate. It also lets the service worker a chance to change its mind and removeEventListener(‘fetch’, handler) because it decides that caching is no longer viable and is unlikely to overcome the performance overhead of having the fetch handler.
 
From Facebook’s experience the ‘fetch’ handler adds significant overhead. When caches and the server are in the right state then the ‘fetch’ handler is able to overcome the overhead of listening to all web request and provide an overall win. However, in some cases we may deactivate the cache in which case it would be preferable to remove the ‘fetch’ listener and only listen to push events. Updating addEventListener and removeEventListener would allow this naturally and it would resolve whatwg/dom#371 by not requiring any change there.

@jakearchibald
Copy link
Contributor

Once you remove the fetch event listener, how do you get it back? Are you seeing a delay on all fetches, or mainly navigations?

@bgirard
Copy link
Author

bgirard commented Sep 22, 2017

I'm suggesting that addEventListener and removeEventListener for fetch would update the internal service worker registration information to track has_fetch_handler. During this in-flight fetches may or may not go through the fetch handler so it's a bit racy.

Right now, at least as implemented in Chrome, fetch requests block on the completion of the activate handler. If a service workers wants to guarantee that navigate/all fetches are handled it would only need to call addEventListener before the completion of activate. This would provide strictly better behavior that what is offered from #718 and would resolve the concerns in whatwg/dom#371 by not requiring that change.

Facebook is seeing that requests for non-navigation resources that are served from the HTTP cache are significantly slower when there's an activate fetch handler. We've also seen a large performance improvement by replacing navPreload fetch handler for push-only SW with the install time has_fetch_handler tracking introduced by #718. This issue would let us fix this at run-time and wouldn't require this weird error throw behavior for addEventListener.

@jakearchibald
Copy link
Contributor

Taking a step back from the proposed solution for a moment, when is Facebook looking to enable/disable the fetch? What are the conditions? Is it always before activate completes?

@bgirard
Copy link
Author

bgirard commented Sep 22, 2017

I've mentioned some of this in the 3rd paragraph but let me elaborate. The key point is that there's valid use cases to disable the fetch listener significantly after the activate event. The user can log out, the SW cache may expire, cache can be explicitly busted by some signal. Then the fetch event could be re-registered after something occurs like user login or cache update/re-validation.

In general fetch is bad for performance so we only want it enabled if we're able to benefit from it to overcome the ~5% startup overhead we see which is when we have valid cache that isn't disabled for the above reasons. Recent improvements to the implementation, particularly background fetch, will reduce the overhead but I believe it will remain significant.

I think this situation is similar to wheel and non-passive listener. Improvements in the implementation allowed better scrolling but there was always going to be significant overhead to having listener that blocked the scroll or fetch in this case. For wheel events, authors can opt to having a non-passive listener when they need to act on the event or they can un-register the listener when they don't which sites will do to minimize the time scroll blocking listeners are active and slowing down scroll latency. (There's no use case for a passive fetch listener at this point.)

Having finer control over the fetch handler allows us to easily turn off fetch without any coupling to the state of push functionality of the service worker.

@jakearchibald
Copy link
Contributor

the SW cache may expire

That isn't something SW caches can do. I guess you mean you just delete them manually?

We have a couple of options. We could provide an API to enable/disable fetch interception, eg swRegistration.fetch.disable(), kinda like navigation preload. Or we could extend navigation preload to preload subresources too. I went for fetchEvent.preloadResponse figuring that we'll probably end up preloading other kinds of requests too at some point.

@wanderview
Copy link
Member

If we are going to go down this path I'd like to see us come up with a system that will work for all event types that might wake up the service worker. FetchEvent is a bit of a unique problem because it has special treatment other events don't get. It would be nice to unify things so we don't have to keep adding exceptions in the future.

@jakearchibald
Copy link
Contributor

jakearchibald commented Sep 25, 2017

I'd say the "message" event is currently the same as "fetch". You don't have to opt into either.

@wanderview
Copy link
Member

Well, I just filed issue #1200 about message event. Is the browser allowed to disable it if there is no event handler at install time?

@bgirard
Copy link
Author

bgirard commented Sep 25, 2017

That isn't something SW caches can do. I guess you mean you just delete them manually?

I meant the stored resources are now all stale since the server revision updated and thus any stored resources will never be used. If the FB SW does a load a week later then the SW knows everything in the SW cache is stale and nothing will be used. Ideally for this a SW would want to start negotiating an update with the server which can take time to run in the background. During that period the fetch handler should be disabled since it is not providing any benefit and is just extra overhead.

Before we discuss adding a new API I'd really like to have an explicit public discussion ruling out this proposal. I haven't seen a good argument so far for why we can't just allow add/remove at anytime and let the browser optimize based on the current state?

How is swRegistration.fetch.disable() superior to removeEventListener('fetch', handler)?

@jakearchibald
Copy link
Contributor

jakearchibald commented Sep 25, 2017

@bgirard

Before we discuss adding a new API I'd really like to have an explicit public discussion ruling out this proposal.

I'd rather we sorted out the requirements before we started bikeshedding an API, but ok. Consider:

function fetchHandler() {
  // …
}

addEventListener('fetch', fetchHandler);

How are you going to remove that event listener? I guess it's not too hard if you want to remove it from the activate event:

addEventListener('activate', async event => {
  event.waitUntil(async function() {
    if (await shouldRemoveFetchListener()) {
      removeEventListener('fetch', fetchHandler);
    }
  }());
});

But what if you want to remove it from the page? I guess you'd have to use postMessage:

// In the page:
navigator.serviceWorker.controller.postMessage('remove-fetch-event');
// In the service worker:
addEventListener('message', ({ data }) => {
  if (data == 'add-fetch-event') {
    addEventListener('fetch', fetchHandler);
  }
  else if (data == 'remove-fetch-event') {
    removeEventListener('fetch', fetchHandler);
  }
});

If you want to know when that action is complete, I guess you'd have to post a message back. What if you wanted to find out the current state of the event listening? I guess you'll have to store that data in IDB and hope you keep it in sync.

But what if you have one of these?

addEventListener('push', event => );

Or what if you do this?

// In the page:
navigator.serviceWorker.controller.postMessage('do-something');

If the service worker isn't running, either of the above will cause it to run. And as it runs it'll come to this line again:

addEventListener('fetch', fetchHandler);

And you've added your fetch listener again. Although if the service worker was already running, it wouldn't re-execute right now, so the outcome is time dependent.

Something like serviceWorker.fetch.disable() or swRegistration.fetch.disable() doesn't have any of these problems. The state is entirely independent of service worker execution, and the API is available from the page so you don't need to hack around with postMessage to change state.

Putting it on serviceWorker rather than swRegistration means it would reset when the service worker updates. Depending on use-cases, that may be a good or bad thing.

Changing navigation preload into a general preload may solve this particular problem, but it also may solve some other issues, such as speeding up stale-while-revalidate type of behaviours.

@jakearchibald
Copy link
Contributor

I want to stress that the above isn't a dismissal of the feature you're proposing. Like I said, I'd rather we concentrated on figuring out the requirements before we blocked discussion on a particular API.

@bgirard
Copy link
Author

bgirard commented Oct 3, 2017

I didn't mean to dismiss discussion on requirements, I was just afraid the proposal could get side tracked. I'm happy to discuss requirements more. I wanted to think about this a little bit so sorry for the delayed response.

Changing navigation preload into a general preload may solve this particular problem, but it also may solve some other issues, such as speeding up stale-while-revalidate type of behaviors.

A general preload is interesting and potentially useful to us but I think it's add a lot of complexity and doesn't address this particularly problem well:

  1. It doesn't allow a running SW to stop handling fetch events and avoiding its overhead.
  2. It doesn't solve the SW overhead of running the respondWith. The SW could be handling other fetch events in the queue, running GC, fetch response has to be routed through another process leading to extra context switches and scheduling overhead which are costly particularly if the user is CPU bound. The SW overhead will still be felt event even with the general preload response.
  3. It will cause preload fetch to go out to network for resources that will be clearly cached consuming valuable bandwidth. This is something that HTTP2 push has not managed to solve yet i.e. avoiding pushing HTTP2 resources that are cached. Pages that opt into navigation/document preload will know that their document is not entirely cacheable but for general resources there will be a mix of fully/partially/not cached resources. So you run the risk of causing a net regression by enabling it for all resources. Canceling the preload after the fact would minimize but wouldn't bring that cost to zero.

General preload is useful and something I'd like to explore but I don't think it solves this particular problem well.

I think the root of the problem with my proposal, like you mention, is how to deal with the SW restarting and correctly deciding if a fetch handler should be registered or not. Ideally there could be something like this:

// At global scope
if (self.fastSyncStorage.get('does_client_want_fetch')) {
   addEventListener('fetch', fetchHandler);
}

Now the problem is there's no sync storage API AFAIK and even if there was then this pattern would add a sync dependent disk read which would slow down SW startup. Another option is to have the client always register the fetch handler when the SW is restarted and as soon as async storage returns a read that indicates that the fetch handler is not required then it can be unregistered. This means that any network fetch around the time a SW is restarted will be slower. That's not a good design in my opinion.

serviceWorker.fetch.disable() isn't great either even if we also added enable(). I think it's bad practice to not deliver a fetch event if there's any remaining fetch listener(s) that have not been unregistered. It's also a bad API if we ask users to call serviceWorker.fetch.disable() rather then removeEventListener. I guess we could require that all fetch liseners have been unregistered or otherwise disable() would warn/fail/throw but that's a bit cumbersome. It also means that all the users of fetch have to coordinate to call disable() unlike other event listeners where you only have to worry about register/unregistering yourself.

How about if we, in addition to my proposal, added an API that returns undefined on the first run and then true/false on subsequent restarts of the same version, if there was no registered fetch handler serviceWorker.fetch.wasRegistered (better naming?) when the SW was shutdown. Then we could suggest the following:

// At global scope
if (!serviceWorker.fetch.wasUnregistered) {
   addEventListener('fetch', fetchHandler);
}

Implementors would be explicitly allowed to not restart the SW on a fetch event when the value of wasUnregistered is true meanings that the SW was shutdown with no fetch handler active meaning that fetch are not observable until some other events re-registers a fetch handler.

This would solve the following:

  1. It would let add/removeEventListener work naturally and not require any special throwing logic in addEventListener, solving Nerfing all events on all event targets in a serviceworker after initial script evaluation seems odd whatwg/dom#371.
  2. It wouldn't temporarily re-register the fetch handler on startup until async storage is queried.
  3. It would let implementors only pay the overhead of SW startup and fetch handler when required.
  4. It would let the SW user decide when to pay the fetch overhead at any point in the SW life cycle.
  5. Doesn't require independent fetch listeners to coordinate when they have all un-registered to call disable(). (I'm not aware of why you would have multiple independentfetch listeners and if there's any other issues around that so that point might be moot point.)

@jakearchibald
Copy link
Contributor

jakearchibald commented Oct 3, 2017

It doesn't allow a running SW to stop handling fetch events and avoiding its overhead.

So you're saying the overhead of the fetch is greater than the network request itself? What kind of delay are you seeing? I'm asking this because I'd like to try and recreate is issues you're seeing, so we can figure out if this is a Chrome problem, or a fundamental spec issue that needs addressing. We have https://github.com/jakearchibald/service-worker-benchmark – does this show the kind of issue you're seeing?

It will cause preload fetch to go out to network for resources that will be clearly cached consuming valuable bandwidth

I don't see how this is different in your proposal.

serviceWorker.fetch.disable() isn't great either even if we also added enable(). I think it's bad practice to not deliver a fetch event if there's any remaining fetch listener(s) that have not been unregistered.

I'm not really sure what this means. Can you provide some more details?

It's also a bad API if we ask users to call serviceWorker.fetch.disable() rather then removeEventListener

In terms of discussing the merits of APIs, can we do better than "it is bad"? I think in #1195 (comment) I used code examples to show disadvantages in the API you proposed rather than just declaring it "bad". Could you extend the same courtesy to me? In fact, I held off on pointing out flaws in your API at all until you demanded it – instead I wanted to bypass that and discuss the problem.

In terms of wasUnregistered and fastSyncStorage… before we start proposing new APIs to solve problems that are only created by other proposed APIs, can we stick to defining the source and extent of the issue?

whatwg/dom#371 should be solved by limiting the error throwing to events on the service worker global. This rule is in place to alert the developer to the footgun of adding listeners async, which will result them missing events. Your proposal doesn't change this, so it's a bit of a red herring.

@vdjeric
Copy link

vdjeric commented Oct 5, 2017

Hey all, I'm Vlad, I work on Facebook's WebSpeed team with Benoit.

The fundamental motivation for this request for new APIs is that a do-nothing Fetch handler is pretty slow today.

It's slow because:

  • Responding to a Fetch request can require launching a ServiceWorker. And launching a ServiceWorker often takes 200ms+ on my MacBook Pro. I double-checked the numbers using @jakearchibald 's ServiceWorker benchmark above and it's consistent with numbers from our ServiceWorker. In Chrome, some of that time goes toward spawning a new process, a new thread, setting up the execution environment etc. And all this happens before any ServiceWorker JavaScript gets to execute.
  • Even if the needed ServiceWorker is already running, no-op Fetch events still take tens of milliseconds from Fetch request to Fetch response on my machine. I haven't looked closely enough at why this is, but I would assume IPC overhead, overhead of switching contexts, etc. The numbers from the field are even worse.

So when a ServiceWorker's Fetch handler is no longer needed, we end up with non-negligible overheads on every Fetch request without getting any benefit in return.

Additionally, I am assuming that just optimizing the browser's implementation can't significantly reduce these overheads (e.g. make them consistently < 10ms) because spawning processes and threads and JS execution contexts just seems like a non-trivial amount of work for a CPU so it would be hard to make it consistently fast (especially with lots of CPU contention during Facebook.com load).

Benoit wrote a pretty exhaustive list of situations when a ServiceWorker's Fetch handler would no longer be needed, but I think the main use-case (for us) is when main-thread Fetch requests can no longer be responded to from the ServiceWorker's cache because the cache contents are out of date, e.g. JS code files were added to the SW cache on Oct 1st, but it's currently Oct 4th and we don't want the ServiceWorker serving cached JS code files from anytime before October 2nd because of severe bugs in the old cached client code or because of incompatibilities between the old cached client code and newer server code.

So the requirement really is to make "pass-through" Fetches very cheap somehow.

@vdjeric
Copy link

vdjeric commented Oct 5, 2017

Just to clarify some answers to your questions:

Are you seeing a delay on all fetches, or mainly navigations?

We're seeing slow fetches of subresources (dozens of .js and .css files) through the do-nothing Fetcher during the initial main document load. I can look into whether do-nothing Fetches for resources loaded after initial page load are slow as well if you like.

When is Facebook looking to enable/disable the fetch? What are the conditions? Is it always before activate completes?

We would disable fetches after client-side code notices the SW cache has become too old or when getting a signal from the server-side to invalidate the cache.
This could happen in the activation event or the Fetch handler could detect the issue at some arbitrary point in time well after initial page load.

@wanderview
Copy link
Member

Would #1026 be a solution for the problems here?

For example, if you could do the following:

  • Call fetch(url, { serviceworker: 'none' })
  • Add an attribute to elements like <img src="foo.jpg" serviceworker="none">.
  • Perhaps set a default policy via a <meta> tag, etc.

I guess I'm wondering if we expose the bypass primitive, perhaps then the page can manage the state about whether requests should get a FetchEvent or not.

@bgirard
Copy link
Author

bgirard commented Nov 1, 2017

No it's orthogonal. But the problem in #1026 is a lot more urgent and impactful to fix. This issue mentioned here can be worked around by forcing a new service worker to install with skipWaiting which allows us to re-validate the cache or avoid registering the fetch handler to use the optimization from #718. I had assumed this issue here would be a quick fix but devoting resources to #1026 is much more important.

The reason they are orthogonal is imagine that the SW is being restarted and a preload request goes out. The server is generating markup but doesn't know the state that the SW is in so it cannot annotate tags with serviceworker="none" since the SW has not been started yet and hasn't communicated with the server yet by the time the page is being generated (assuming a slow SW startup and a high latency network with is typical). The server doesn't know the SW needs to be bypassed or not when generating the markup since the SW isn't running and hasn't established the state of the its cache yet.

I don't think I did the best job at explaining the motivation for this issue. But this issue is sufficiently complex that I'm not sure what is miscommunicated here. A live discussion might be more effective so I'm happy to get on a call if anyone would like to discuss and get clarification on the issue.

@wanderview
Copy link
Member

I had assumed this issue here would be a quick fix but devoting resources to #1026 is much more important.

In some ways implementing #1026 is much easier to implement. At least in gecko we have a flag we can add to network requests to bypass service worker logic completely. I imagine other browsers must have something similar given current features. Implementing #1026 would simply be exposing an API to add that existing flag.

This issue mentioned here can be worked around by forcing a new service worker to install with skipWaiting which allows us to re-validate the cache or avoid registering the fetch handler to use the optimization from #718.

Honestly this doesn't seem like a bad way to solve the problem to me. I like that it uses our stateful versioning based on SW script identity.

A live discussion might be more effective so I'm happy to get on a call if anyone would like to discuss and get clarification on the issue.

TPAC is next week.

@jakearchibald
Copy link
Contributor

F2F: This is lower priority than #1026, but we could do this with some kind of async toggle on the registration.

@jakearchibald
Copy link
Contributor

AI: @jakearchibald to spec this and find out if this breaks the web.

@annevk
Copy link
Member

annevk commented Oct 26, 2018

So I'm not particularly happy with

I'm suggesting that addEventListener and removeEventListener for fetch would update the internal service worker registration information to track has_fetch_handler.

as that makes those methods even more magical. Or is a different approach the plan now?

@bgirard
Copy link
Author

bgirard commented Oct 26, 2018

I had originally proposed this thinking it would be a much easier solution than #1026 but since it's not based on this discussion I think we should focus on #1026 since it will likely be sufficient for performance.

@mfalken
Copy link
Member

mfalken commented Oct 26, 2018

Yes I think #1195 (comment) was meant for another issue as we didn't discuss this particular proposal at the F2F (though we did talk about #1026 and static routes, etc).

@jakearchibald I think you meant your comment for #913 ?

I think we can go ahead and close this issue as #1026 or static routes are more likely solutions, and given bgirard's comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants