-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High resolution timing for events #23
Comments
@birtles, what is the timeline for this? |
I expect to finish converting our Linux UI event timestamps over on non-release channels this quarter. Still need to find someone to do the Mac work so we can switch this on for release channels. |
We (@majido) are working on this in blink too. There are two completely different things coupled together here:
It's 1 that's most important IMHO, giving us two key properties:
WebKit has done 1 without 2 for a couple years now, but it's a kludge. Given the lack of use cases for an epoch-based event timestamp and the existing lack of interoperability here, I agree we should also try to change the semantics of |
This might not be web-compatible: https://bugzilla.mozilla.org/show_bug.cgi?id=1231619#c3. But perhaps browsers are willing to push it through anyway... |
Yep, @majido says the compat impact seems manageable, so we're still pushing ahead with shipping in Chrome 49 (currently in beta, set to go stable in early March). If we have non-trivial compat issues we'll revert and revisit adding a new property instead. |
Some fallout of the change: https://code.google.com/p/chromium/issues/detail?id=574514 |
I find it quite surprising that this is considered an acceptable web-compatible change, for the reasons that @chancancode discussed on the Bugzilla thread. The jQuery API docs and MDN both define Of course, That said, I am shocked that decoupling the meaning of A more conservative solution to the problem would be to introduce a new attribute ( Given the kinds of things that have blocked shipping recently for web-compat reasons, and the number of affected properties, I just don't understand how we're full steam ahead here. (In the linked case, the problematic feature had already shipped in IE11 and Edge years earlier, which made the number of affected properties quite small.) To quote @chancancode:
I would quote him in full, but I'd recommend just reading the linked Bugzilla thread. |
"Lots and lots of sites" has not been our experience. But as with anything compat, data trumps anecdote here. @majido, can you share your analysis from httparchive data? Anyone else have more examples of specific sites impacted? If we find a pattern that, say, we find used on >0.01% of the top sites in httparchive then we should definitely reconsider. FWIW I argued ages ago for a new Event So attempting to ship a breaking change in Chrome was our compromise. So far it's hit Chrome beta with millions of users and very little fallout, so we have no reason to pull the plug yet. We're still willing for new data to change our mind, but the time for arguing-absent-hard-data (after more than 3 years of such arguing blocking solving the use cases here) is done IMHO. |
The problem with timing bugs is that they don't necessarily manifest as hard errors, by definition. Given how flaky some websites can be already, it may take some time before people report the bug, even to the original author, and before that bug gets reported upstream. In this case, we rapidly discovered a (rather subtle) bug in a very popular animation library (Angular) that is easy to reproduce. How is that not sufficient? (the fact that Angular has addressed the issue in 1.5 doesn't really do much -- some existing content will upgrade, but plenty won't) |
Is there a way we can test uses of:
I'd be very very surprised if those patterns did not impact a non-trivial number of the top sites. |
Yeah I agree that's cause for extra concern here. Also there may be non-visual cases where the breakage is hidden from us, eg. where the client sends the timestamp to the server for some analytics.
We believe the impact in practice is small (due to low usage of the particular affected feature from our httparchive digging and as a result of being cosmetic). @majido can provide the concrete data from his analysis.
Yes, the first two are, I believe, the analysis @majido has done on httparchive data for the top 470k websites. I'll let him provide details. Such code has always been broken on Firefox (except for the specific event that Angular was using, due to a bug in Firefox where it's timestamp was always 0), so it does make sense that developers haven't been relying on this. I should add that unfortunately I'm out on vacation without internet for the next 1.5 weeks. Please continue discussing the tradeoff with Majid - he owns this feature on blink. If you like you're welcome to follow-up on the Intent to ship thread, it's possible other blink API owners with disagree with our judgement here and request the change be un-shipped. |
BTW, some devrel outreach around this here |
Here's another example of a kind of pattern that could be affected: https://github.com/cubiq/iscroll/blob/v4/src/iscroll-lite.js#L207 that.startTime = e.timeStamp || Date.now(); This is from v4 of Cubiq iscroll, which was both a pretty popular solution for a little while, and used in a lot of websites that are no longer maintained. I'm not sure if this is actually a problem (I can't tell if the associated method is ever invoked synthetically, perhaps for initialization), but neither can simple static analysis. Related: "it's often broken in Firefox" doesn't affect mobile websites much, which heavily use |
I think I'm convinced we should do this via a new property instead given the reported subtle breakage. However, #23 (comment) mentions that |
Hi, I think @RByers summarized our thinking and the background pretty well in #23 (comment) It is difficult (impossible?) to actually measure the impact before hand in this case and our hope has been that once we hit beta we receive better feedback which we are getting. However, I also did an analysis using very recent (Jan 15, 2016 dataset) httparchive data which shows the impact is likely to be < 0.02%. I looked for the following patterns in top 477k top sites (both mobile and desktop) and tried to categorize and understand if they get broken and I have assumed breakage if the usage was not obvious.
Clearly this is not the complete picture but gives a sense that the usage of this pattern is quite limited. In cases where I have seen the issue in a library and have outreached (eg, 1 and 2) the fix has been trivial and has address an error or a workaround in the logic for Firefox. |
Given how timeStamp cannot really be used compatibly across browsers anyway, and the fact that this has hit Chrome 49 without needing to be reverted, I think we should stay the course and fix timeStamp instead of inventing a new property and also incurring all the risk involved in fixing timeStamp to be cross-browser compatible. |
At the moment I tend to agree with @domenic. Based on current data and our beta experience until now the impact appears to be limited in scope (although subtle) and within our expected range. Not sure if we should change our strategy yet. To be fair, most of the code that tries to compare Event.timeStamp with Date.now() is already broken in subtle and not so subtle ways:
|
I want to reiterate a few things (excuse me for being brief as I am on mobile):
That appears to be higher than @RByers's proposed threshold of 0.01% (which is 1 in 10,000 sites):
Also, the analysis:
That does not cover any of the reported cases, including Angular's use case.
None of the reported cases hard errors and only break in ways that cannot be easily detected.
Unfortunately, fixing a library going forward doesn't affect existing deployments. It by definition is irrelevant to web compat. So, I don't agree that we have data to prove that this doesn't matter, even if it has shipped in Chrome beta. |
It's worth reiterating that the breakage in Angular (which was in code in the main angular.js), would not have been detected by the already-done static analysis. It also would not have detected the bug that @chancancode and I found in our demo code (we stored e.timestamp in an object). It also seems that this change breaks at least some part of YouTube, which is still not fixed as of 3 days ago, per comment 24 on chrome bug 574514:
Even thought it didn't detect these cases, the number of cases detected by simplistic static analysis still exceeds the speculative threshold of 0.01. While 0.01% sounds like a small number, it means that it affects 1 / 10,000 sites. The static analysis also breaks out "bugsnag". Bugsnag is an error reporting tool that the static analysis suggests accounts for another 0.01% on its own. Again, this is just a case that happened to be detected using the static analysis. Timing-related bugs are subtle and hard to track down, and this particular change will affect old sites which aren't even maintained anymore. Given that a number of high-profile breakages have already been reported, and that it doesn't manifest as a hard error, I'm having a hard time understanding why the experiment has not conclusively failed. |
Part of the reason is because those usages already failed in Firefox (in many cases) and Safari (in fewer cases, but more subtly). They're clearly survivable. |
I also wanted to add that this is simply not correct. While The fact that some events do not fit in this subset does not mean we can break the events that do. I would like someone to tell me what rules I should apply in the future when dealing with web compat issues, because frankly, this thread has made me question whether I understand the rulebook we're supposed to be following. |
"clearly survivable" is overstating the case. The fact that some cases of e.timeStamp didn't work in Firefox didn't stop Angular animations or YouTube from working reliably on Firefox. |
This seems somewhat self-prepatuating, especially since issues have been (and continue to be raised). So by ignoring issues, yes chrome has not needed to revert...
A path of success was possible, but that changed with Chrome 49. It seems transitioning to high-precision timer since posix epoch may address the concerns being raised, with what seems like little cost? Maybe I'm missing something... |
In that case the timeStamp value will look similar to Date.now() but only on surface. The actual value is coming from a different monotonic clock which unlike Date.now() is not impacted by NTP time skew. This means that the developers are more likely to compare it with Date.now() which is going to break in subtle ways in random machine which makes the bug hard to detect. To quote dbaron from here
If we decide not to change Event.timeStamp then a better solution is to actually use a different attribute. |
Sorry, if I was not clear. In this case I meant that if I was not fairly sure about "o.timeStamp - value" usage I assumed it gets broken (i.e., o.timeStamp is an event value and values is a date). To be honest based on the codes I check during the process most of the time this pattern end up being safe, e.g., either timeStamp is coming from Date.now() or value is another timeStamp but I erred on the side of over estimating the breakage in this case.
In fact, I think 0.01% of the issue being attributed to bugsnag is a promising sign (not to discount other issues). Their usage is essentially what I would categorize under "clearly survivable" category. They usage pattern is to compute a
I think you meant would NOT have been detected.
Correct. So the simplistic analysis over counts in some cases and under counts in others. Which is why I don't like to rely too much on it. Just to get a sense of things in combination with other evidence. |
One point of record: @dbaron's quote above wrt to a more-breaking change was a response to a suggestion that we define |
Ok let's take the rhetoric down a notch, we're all trying to do the best thing for the web platform. There was general agreement that (in order to resolve the multi-year debate about whether a new API is necessary or not) Chrome should attempt to ship this change. Once we started down that path and were getting concrete data, our judgement (vocal opposition notwithstanding) was that the compat issues we were seeing were minor and the short-term cost caused was well below the long-term cost to the platform of introducing a confusing and largely redundant API. So we did not see a need to reverse course. But I recognize this is entirely a judgement call where opinions of reasonable engineers will differ. If Mozilla and Apple now feel that, despite not having concrete examples of real sites that are broken today in Chrome, there still needs to be a new API for this, then by all means please spec such an API, ship it in at least one of your engines, and we'll be happy to follow suit. We really do want to see the problem of reliably measuring input event latency solved once and for all in an interoperable way. |
It seems that from #23 (comment) the current requirement in DOM for It's not clear if we want to store two different times for each event, forever (different in both origin and resolution). (Though Apple seems okay with this given @rniwa's comment.) It's also not clear what the compatibility impact is of changing It seems unlikely other vendors will ship something fast, given #23 (comment) and #23 (comment), so maybe the best approach here is to just wait it out a bit and see what actually happens. |
Actually @rniwa didn't say anything about the priority they'd give to landing a new API in WebKit here. @rniwa thoughts? Note that with passive event listeners now shipping, we're starting to actively encourage the measurement of scroll latency using this API. But all our examples will show a pattern that works for both the WebKit and blink implementations (the only two implementations today exposing the original driver time), so it'll probably still be possible for us to switch back to the WebKit behavior in the future without causing too much disruption (it'll just introduce the NTP skew problem in scenarios that were previously more stable in Chrome, but we don't know how bad that tends to be in practice). So if collecting more data on the impact in Chrome over a few months would be valuable to this debate, then that's fine with me. But if no amount of such additional data would change other implementor's minds then there's no benefit to waiting. |
I apologize for the rhetoric, and I agree fully that everyone here is trying to do what they believe is best for the platform and the browsers they work on.
I think more data would indeed be useful, if we measure the right things. Here are some things that (dynamic) telemetry might be helpful if unearthing, if you're willing to run it:
I'd be happy to discuss the exact telemetry that would be most helpful here, if you find this line of reasoning useful. I'd also add that finding "hits" with this kind of telemetry does not prove that a site is broken, but it will give us a corpus of sites to evaluate manually. Personally, I'm especially concerned about the vast number of "iOS lookalike" sites that were hand-crafted early in the days of iOS, because those sites were forced to reimplement scrolling, implement high-level touch events, implement animations and more, without the benefit of Also, because those sites are largely "webkit only" in practice, they may well not have noticed the Firefox breakage at all, but these sites are on the Internet and most browsers have been attempting to support them. |
* Add more legacy event types for createEvent() Approximately as requested at https://bugzilla.mozilla.org/show_bug.cgi?id=1251198#c7. This is the list of events supported in createEvent() by at least two of Firefox, Chrome, and IE 11. The one exception is I omitted MutationEvent, which all three support, because apparently spec-land has decided to deny its existence in the hope that it will go away. Bikeshed does not know about all of the added interface names, hopefully that will sort itself out over time. * Meta: improve pull request instructions for DOM See whatwg/fetch#276 for context. * Enable an event listener to be invoked just once * Editorial: web compatibility typically remains relevant Fixes whatwg#210. * Shadow: define attachShadow() for custom elements * Meta: make it easier to reference participate in a tree PR: whatwg#216 * Define node document for new Text nodes Fixes whatwg#224 and part of whatwg#212. Also fix part of whatwg#209 by stating these algorithms can rethrow exceptions. * Use a single concept for attribute changes This setup is still a little sketchy I think, but not more so than the insertion and removing steps. * SVGEvent is only Gecko, Blink also has SVGEvents As pointed out by zcorpan in whatwg#227. * Set createDocument()'s content type based on namespace Fixes whatwg#217. PR: whatwg#218 * Make document.createEvent("touchevent") sometimes throw Browsers typically disable touch events on non-touch devices, and there exists web content that detects this difference using document.createEvent(). Fixes whatwg#227. * Shadow: define slotchange event Shadow: define slotchange event Fixes WICG/webcomponents#288. This defines the slotchange event in response to remove/insert operations, changes to a slot’s name attribute, and changes to an element’s slot attribute. This also introduces the assigned nodes concept for slots, and assigned slot concept for slotables. Slotables now also have a name, rather than a “get name”. The slotchange event dispatches just after mutation observers have their callbacks invoked. The slotchange event is not yet marked scoped as scoped events are not yet defined in the DOM Standard. Note: although the original issue did not mention it, this also suppresses signaling when slots are removed from a tree. Not just when they are inserted. * Editorial: update custom element cross-spec references * Editorial: fix a few cross-linking missteps * Editorial: make "is" and "prefix" optional in "create an element" * Use "create an element" in createHTMLDocument Takes care of part of whatwg#212. * Editorial: align exception language with IDL * Editorial: introduce more shadow-including terms for CSS Fixes whatwg#225. * Editorial: distributed -> flattened * Meta: export more terms Fixes whatwg#233. * Editorial: add "shadow host" and "assigned" as terms This makes a couple of non-null checks read better and enshrines a term we had already been using. * Editorial: flip non-null/otherwise conditions PR: whatwg#234 * Shadow: <slot> is now defined in HTML * Remove passive as event listener key This changes makes passive no longer contribute to the uniqueness of an event listener. It therefore also no longer needs to be supported as part of removeEventListener(). Fixes WICG/EventListenerOptions#27. PR: whatwg#236 * Meta: point out event's timeStamp is likely to change See whatwg#23 for details. * Add [CEReactions] annotations to mutating methods Part of WICG/webcomponents#186, and furthering whatwg/html@27aa7bc. Linking [CEREactions] will happen once speced/bikeshed#677 is fixed. * Editorial: check stop propagation flag at start of invoke * Editorial: deduplicate Veli Şenol * Editorial: define defaults for EventListenerOptions Although this is also done in prose, this nonetheless simplifies the prose a bit and makes it clearer to those skimming the standard what is going on (although skimming is not recommended). Fixes whatwg#239. * Meta: link to Japanese translation See triple-underscore/triple-underscore.github.io#1 for more details.
Just some data, jQuery used to try patching this problem by filling in a |
I had a good chat with @majido off-thread and we came up with some good telemetry that we agree would identify the impact of this change. I'm looking forward to the results! |
It was a good chat indeed. :) I am trying to see if I can gather more data using telemetry to better help make progress on the debate about the potential impact. This should help address some cases where our earlier static analysis of httparchive would have missed. I can only do this mainly thanks to @RByers recent efforts to enable cluster telemetry for Blink that allows such measurement for a set of top sites (e.g, top 10K). Following @wycats suggestions I think we should measure the following cases:
Where timestamp is a value that came from Event.timeStamp ** As pointed out earlier, this will over count because there is no guarantee that any of the operations above will actually lead to breakage but we can followup with a manual check on this smaller subset. I am discussing with our V8 experts to see if this is actually possible without a huge amount of work. I will update once I have an answer. In any case, before spending too much effort on this I like to have some agreement that this is an acceptable experiments and an acceptable threshold (e.g., 0.02% of pages?) for seeing this type of usage leading to breakage. Also particularly interested on what @rniwa think of the effectiveness of this measurements given his strong concerns on web-compat. ** I have not yet figured exactly how to tag timestamp and Date values as I believe they get converted into a numbers before being operated on. Perhaps I will use a special value (may impact program flow) or specific fractions (A lot less intrusive) |
Generally speaking, I have concerns about treating any specific percentage of pages as evidence that breakage is ok. Because the web is so huge, "real" applications are a very small percentage. Maybe we could categorize breakage by things like presence of certain globals (Ember, Backbone, React), usage of other "tracer" features like If you're interested, I would love to help beef up Blink's usage analysis in this way ("application" is one useful category, but there are probably others, like "mobile site", etc.). It would make me feel a lot more confident that the identified features aren't heavily used in some pocket of the web. |
But I would also like to add that I think we can come up with a measurement that I could agree ahead of time would be enough (just not "total usage over the entire web"). |
If I use cluster telemetry, the corpus will be pretty well defined (e.g. top 10K sites) rather than the entire web and I can better control for mobile vs desktop. Though it also means I can only test a pre-defined set of interactions but has the benefit of being able to get the URL of the page. If I end up being able to get some type of safe measurement in Canary then it will be website used by Chrome Canary users. This is more like the entire web but page URL will not be available. |
Yeah this is a problem the blink API owners (especially @foolip and I) have spent some time worrying about, but don't yet have any super-promising practical ideas around. We should think about it further and we'd love your help! We're ramping up our investments in our compat tooling, deprecation process, interop efforts, etc. I filed this chromium bug to track. |
@dmethvin Thanks for the jQuery background. Current Event.timeStamp is PITA indeed. Hopefully we get it fixed one way or another soon. Even today despite my experience using Event.timeStamp I got surprised seeing negative values (up to -300ms) for |
There was some discussion of this at TPAC:
|
FYI, bug 1026804 got fixed two days ago. Sebastian |
After almost three years of testing with Firefox Nightly and Dev Edition users, Mozilla plans to ship high-resolution Event.timeStamp in Firefox 54 (2017-06-13). See @birtles' Intent to Ship announcement. |
Okay, so with Chrome and Firefox shipping, these things still need to be done:
I'm happy to help folks if they want to take on one or more of these tasks. |
|
This is in anticipation of DOM spec patch landing [1]. Upstreaming two tests: - verify constructed events get a high-resolution time from perfomance.now() - verify the minimum resolution is larger that 5µs. [1] whatwg/dom#23 [2] whatwg/dom#420 BUG=538600 Review-Url: https://codereview.chromium.org/2744553007 Cr-Commit-Position: refs/heads/master@{#485966}
This is in anticipation of DOM spec patch landing [1]. Upstreaming two tests: - verify constructed events get a high-resolution time from perfomance.now() - verify the minimum resolution is larger that 5µs. [1] whatwg/dom#23 [2] whatwg/dom#420 BUG=538600 Review-Url: https://codereview.chromium.org/2744553007 Cr-Commit-Position: refs/heads/master@{#485966}
This is in anticipation of DOM spec patch landing [1]. Upstreaming two tests: - verify constructed events get a high-resolution time from perfomance.now() - verify the minimum resolution is larger that 5µs. [1] whatwg/dom#23 [2] whatwg/dom#420 BUG=538600 Review-Url: https://codereview.chromium.org/2744553007 Cr-Commit-Position: refs/heads/master@{#485966}
https://lists.w3.org/Archives/Public/public-web-perf/2014Jun/0013.html
https://bugzilla.mozilla.org/show_bug.cgi?id=1026804
The text was updated successfully, but these errors were encountered: