Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cssom-view] {element,elements,nodes}FromPoint but without restricting to the viewport clip? #4122

Open
emilio opened this issue Jul 17, 2019 · 14 comments

Comments

@emilio
Copy link
Collaborator

emilio commented Jul 17, 2019

Current spec is: https://drafts.csswg.org/cssom-view/#extensions-to-the-document-interface

There was this very old request from an Apple engineer here: https://lists.w3.org/Archives/Public/www-style/2012Oct/0683.html

This week I was asked for some help with performance of an add-on, which was doing something particularly expensive to see if an element was visible. I suggested that, given a lot of the bottleneck was walking the DOM since it's an extension and it has security wrappers, using elementFromPoint-like APIs should speed up stuff, and indeed it did quite a lot. See mozilla/fathom#91. But that misses the element if it's outside of the viewport.

We have similar non-web-exposed APIs that avoid this clip for various Firefox features, and apparently the best work-around this problem if you're an author resides in using window.scrollTo and such. See for example: https://stackoverflow.com/questions/19715620/javascript-get-element-at-point-outside-viewport

The original thread suggests new APIs altogether. I think it'd be better to add some dictionary argument to the existing APIs, something like:

dictionary FromPointOptions {
  boolean ignoreViewportClip = false;
}

Or such?

While working on this, I could also try to do a bit of spec maintenance, since there's multiple stuff we've resolved on that hasn't been incorporated to the spec. If only, moving these to DocumentOrShadowRoot, the node/nodesFromPoint APIs discussed in here.

This also fits nicely for the "include display: contents parents" option that people discussed in #556.

Does an API like this make sense to y'all? :)

/cc @tabatkins @zcorpan @smfr

@emilio
Copy link
Collaborator Author

emilio commented Jul 17, 2019

Note that there are further issues with these APIs that we should also put in a spec, like #1378.

@emilio emilio added the Agenda+ label Jul 17, 2019
@tabatkins
Copy link
Member

Without further context, I'm fine with the idea. It does mean you have to run full layout outside of the viewport, which is something we're trying to allow skipping in some circumstances (a sufficiently contain'd element, for example).

Can you explain how this helps the cited use-case, tho? I don't quite understand how eFP() working outside the viewport would help them determine whether an element is visible.

@emilio
Copy link
Collaborator Author

emilio commented Jul 17, 2019

Basically, what that page is doing (AIUI) is scanning the DOM for potentially interesting elements, and needs to determine whether they're "visible". getElementsFromPoint is a nice way to ensure that the element is displayed and potentially-visible.

Another use-case (which I read yesterday but forgot the source) is checking whether two things intersect at an arbitrary point. elementsFromPoint() at that point works, but only if inside the viewport.

@tabatkins
Copy link
Member

For the first, presumably returning null if something's not in the viewport is actually a good behavior to tell if something is displayed? It's not ideal, as part of the element might be in the viewport, just not the point you specified, but that's what IntersectionObserver is for, right?

For your second, that's a v good use-case for this.

biancadanforth added a commit to mozilla/fathom that referenced this issue Jul 25, 2019
Compared to a baseline profile[1] of Price Tracker's current 'isVisible' implementation (on which Fathom's 'isVisible' method is based), this clickability approach offers a 53% (146 ms) reduction in Fathom-related jank[2] for the same locally hosted sample page.

This is largely due to removing the use of the 'ancestors' Fathom method in 'isVisible'[3], which was performing a lot of redundant layout accesses (and triggering a lot of layout flushes) for the same elements. Also, at least in an extension application, DOM accesses (e.g. repeatedly getting the next 'parentNode' in 'ancestors') are very expensive due to X-Rays[4].

Notes:
* If the proposal to the W3C CSS Working Group[5] is implemented, this clickability approach could forgo the workaround and see as much as 81% (374 ms) reduction in Fathom-related jank[3].
* This implementation can still benefit from memoization, as the same element (e.g. 'div') could be considered for multiple different 'type's[6].

[1]: https://perfht.ml/30wkWT7
[2]: https://perfht.ml/2Y5FCQ1
[3]: mozilla/price-tracker#319
[4]: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/Xray_vision"
[5]: w3c/csswg-drafts#4122
[6]: https://mozilla.github.io/fathom/glossary.html
biancadanforth added a commit to mozilla/fathom that referenced this issue Jul 29, 2019
…rmance

Compared to a baseline profile[1] of Price Tracker's current 'isVisible' implementation (on which Fathom's 'isVisible' method is based), this clickability approach offers a 53% (146 ms) reduction in Fathom-related jank[2] for the same locally hosted sample page.

This is largely due to removing the use of the 'ancestors' Fathom method in 'isVisible'[3], which was performing a lot of redundant layout accesses (and triggering a lot of layout flushes) for the same elements. Also, at least in an extension application, DOM accesses (e.g. repeatedly getting the next 'parentNode' in 'ancestors') are very expensive due to X-Rays[4].

Notes:
* If the proposal to the W3C CSS Working Group[5] is implemented, this clickability approach could forgo the workaround and see as much as 81% (374 ms) reduction in Fathom-related jank[3].
* This implementation can still benefit from memoization, as the same element (e.g. 'div') could be considered for multiple different 'type's[6].

[1]: https://perfht.ml/30wkWT7
[2]: https://perfht.ml/2Y5FCQ1
[3]: mozilla/price-tracker#319
[4]: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/Xray_vision"
[5]: w3c/csswg-drafts#4122
[6]: https://mozilla.github.io/fathom/glossary.html
@chrishtr
Copy link
Contributor

Just like Firefox, Chromium has non-web-exposed use-cases for skipping the clip. Our use cases are:

  • Visually positioning carets and selection ranges
  • Detecting colors underneath elements in DevTools
  • IntersectionObserver occlusion testing

I'd like to understand better the required web-exposed use cases before we agree to this API, because as Tab mentioned there are some performance downsides, and this API could end up being a new footgun in practice.

In particular, for the use case you mentioned @emilio of measuring on-screen-ness: is IntersectionObserver awkward for that use-case?

Regarding detecting whether two elements intersect at an arbitrary point, or overlap that point: I'm not sure what that's for?

@emilio
Copy link
Collaborator Author

emilio commented Jul 29, 2019

Here's the intersection code that I thought of: https://stackoverflow.com/questions/9011668/get-element-at-point-in-entire-page-even-if-its-not-visible

Upon re-reading it I think what that code is trying to do would be better done by something like flex or grid, but I haven't thought out too much how, it seems it's sort of like the masonry layout stuff.

Anyhow... The usecase that prompted me to propose it was some investigation in mozilla/fathom#91. But I've seen similar code elsewhere in sites like Facebook and such (see https://bugzilla.mozilla.org/show_bug.cgi?id=1381071#c1 for example).

The facebook code is:

function isVisible(element: HTMLElement): boolean {
  // Look upward in the DOM until we either reach the document or
  // find something hidden.
  let elem = element;
  while (
    elem &&
    elem !== document &&
    Style.get(elem, 'visibility') != 'hidden' &&
    Style.get(elem, 'display') != 'none') {
    elem = elem.parentNode;
  }

  // If we reached the document, the element is visible.
  return elem === document;
}

The code in the Mozilla add-on was similar too:

export function isVisible(fnodeOrElement) {
    const element = toDomElement(fnodeOrElement);
    for (const ancestor of ancestors(element)) {
        const style = getComputedStyle(ancestor);
        if (style.visibility === 'hidden' ||
            style.display === 'none' ||
            style.opacity === '0' ||
            style.width === '0' ||
            style.height === '0') {
            return false;
        }
    }
    return true;
}

(I skipped some branch which was doing some viewport stuff because it's not clear what it's trying to do, I think it's trying to check elements scrolled out of view towards the top, or something).

The question that code is trying to answer is whether a given node is "visible", for some definition of visible. Both functions are bogus in multiple ways:

  • They don't work with anything that looks like shadow DOM in any way.
  • They assume visibility works differently than how it works (visibility: visible inside visibility: hidden is definitely visible).
  • The style.width and style.height checks in the add-on code is wrong (should be 0px instead), they're meaningless is overflow is visible, etc...
  • They don't handle other ways to make an element "hidden".

The add-on use-case is even worse because walking the DOM from an add-on is slower due to security wrappers / proxies.

In the add-on use-case at least, the complaint against elementsFromPoint() was that it returned false for out-of-viewport elements. But using it actually improved performance compared to all the DOM walking and layout querying (mostly because of the reduced number of DOM calls, actually).

It seems to me like a good improvement over those two functions but, that being said, chances are that a more thought-out proposal may want a bit more flexibility. For example, people may want to ignore all scroller clips, or maybe only those with user-scrollable overflow (that is, don't ignore overflow: hidden clips but ignore overflow: scroll clips)... In any case ignoring the viewport clip is a pretty good improvement over the situation now.

The Facebook code doesn't need to deal with all of the web, but libraries / frameworks and extensions do, this could allow them to write actually correct code :)

@chrishtr
Copy link
Contributor

I 100% agree that developers should not write their own code to detect visibility, as they will inevitably get it wrong, and the performance will be much worse.

Why can't they use IntersectionObserver for those though? (The "v1" spec doesn't check for visibility, but the "v2" one does.)

@emilio
Copy link
Collaborator Author

emilio commented Jul 29, 2019

Ah, I was not familiar at all with IOv2. It seems to me that it'd miss a bunch of hard cases. It seems that IOv2 has a very concrete use-case that may not be the use-case everybody wants or needs.

Also it's unclear to me if the "completely unoccluded" part of the visibility algorithm includes the viewport or not. I assume per your description it does not include the viewport clip, right?

If so I guess it does make sense to use IOv2 for some of these cases. Probably there are use-cases that still want or need a fast and sync answer for this, maybe...

Also, IntersectionObserver only allows to observe elements, so this probably won't fly for the Shadow DOM / display: contents use cases that prompted the introduction of nodesFromPoint in #556

@biancadanforth
Copy link

Why can't they use IntersectionObserver for those though? (The "v1" spec doesn't check for visibility, but the "v2" one does.)

I wanted to respond here as one of the motivating use cases for Emilio bringing this issue up for discussion. I did try using the Intersection Observer API (I believe v2 based on your description). I saw about a 20% performance improvement compared to the 80% improvement using elementsFromPoint described in the same thread.

My use case differs in a couple of key ways from what seems to be the optimal use case for IntersectionObserver:

  1. We want to check visibility for every single element on the page, not just a handful of say, ads. My sample page had upwards of 6000 elements, which means 6000 observers.
  2. We only want to check once, not monitor changes over time. It seems like overkill to walk the DOM and attach an observer to each element only to remove it after the first event.

@chrishtr chrishtr reopened this Jul 30, 2019
@chrishtr
Copy link
Contributor

I wanted to respond here as one of the motivating use cases for Emilio bringing this issue up for discussion. I did try using the Intersection Observer API (I believe v2 based on your description). I saw about a 20% performance improvement compared to the 80% improvement using
elementsFromPoint described in the same thread.

Interesting. Do you know what the difference is that led to the worse performance? (I'm not necessarily saying it must have been the same, just trying to understand your use-case better.)

  1. We want to check visibility for every single element on the page, not just a handful of say, ads. My sample page had upwards of 6000 elements, which means 6000 observers.
  2. We only want to check once, not monitor changes over time. It seems like overkill to walk the DOM and attach an observer to each element only to remove it after the first event.

Could you explain in more detail what you're doing with this information? Again, just trying to learn and understand the use-case.

@biancadanforth
Copy link

Interesting. Do you know what the difference is that led to the worse performance? (I'm not necessarily saying it must have been the same, just trying to understand your use-case better.)

For reference:

I have some ideas:

  • While IntersectionObserver is async, the framework that calls isVisible is sync. Therefore I had to do the observer work as pre-processing outside of the isVisible method. One thing the framework does is filter out some elements that we don’t care about. In the preprocessing step with IntersectionObserver, I am looking at all elements rather than a subset.
  • Also, this preprocessing work occurs as soon as the script is loaded, rather than being called as part of the framework. This means it gets executed almost a full second earlier. Based on the profile, this is before the page has finished loading. This means thousands of observers are being added during page load.
    • I acknowledge that one option here is to do this work during idle times. While this is an option for some applications, it is not for others.
  • With the IntersectionObserver approach, I am caching information in the global scope (using a Set) in an unbounded way (e.g. for the sample page, my set has over 6000 HTML elements in it).

While it’s possible I could eventually make IntersectionObserver work for my use case, it doesn’t seem intended for this purpose, whereas elementsFromPoint seems much more closely aligned.

Could you explain in more detail what you're doing with this information? Again, just trying to learn and understand the use-case.

Sure. I am working on an open-source framework called Fathom that tries to extract semantic information from web pages.

@astearns astearns removed the Agenda+ label Aug 6, 2019
@chrishtr
Copy link
Contributor

chrishtr commented Aug 9, 2019

For the Fathom use-case, it does seem to make sense. However, can't Fathom be done on the server? In that setting, the page could be rendered in a "viewport" that has the entire document.

@biancadanforth
Copy link

For the Fathom use-case, it does seem to make sense. However, can't Fathom be done on the server? In that setting, the page could be rendered in a "viewport" that has the entire document.

There are certainly applications that could run on a server, but most of the applications we have encountered thus far would need Fathom to run in the client. For example, we could use Fathom to help password managers better identify login forms, or perhaps we could build a more complete accessibility tree by using Fathom to identify non-semantic HTML elements. We recently built a prototype shopping assistant that uses Fathom to extract product information from e-commerce pages for users to track price changes over time.

Also from a privacy perspective, it's much easier to protect user data if we perform all of the work in the client and have little to no server component.

@chrishtr
Copy link
Contributor

Makes sense. These are good examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants